Systems and methods for controlling switched-mode power supplies. One system includes a converter including a switch and an inductor and processor to control operation of the converter. The processor is configured to determine whether a predicted value of current flowing through the inductor is greater than zero. The processor is further configured to determine the converter is operating in continuous conduction mode (CCM) when the predicted value of the current is greater than zero and control the switch using a first duty cycle when the converter is operating in CCM. The processor is further configured to determine the converter is operating in discontinuous conduction mode (DCM) when the predicted value of the current is less than zero and control the switch using a second duty cycle when the converter is operating in DCM.
H02M 3/156 - Conversion of DC power input into DC power output without intermediate conversion into AC by static converters using discharge tubes with control electrode or semiconductor devices with control electrode using devices of a triode or transistor type requiring continuous application of a control signal using semiconductor devices only with automatic control of output voltage or current, e.g. switching regulators
G06F 1/26 - Power supply means, e.g. regulation thereof
Systems and methods for calibrating display levels created using dual modulation digital micromirror devices. In one example, multi-modulation display system includes a light source, a first modulator including a first plurality of mirrors to modulate light from the light source, and a second modulator including a second plurality of mirrors to modulator light from the first modulator. A first image is captured while the first modulator is off and the second modulator is on, creating a first display level. A second image is captured while the first modulator is on and the second modulator is on, also creating the same first display level. A difference between the first image and the second image is determined, and control signals for controlling the second modulator may be adjusted based on the difference.
The present disclosure relates to a method and system (1) for suppressing wind noise. The method comprises obtaining an input audio signal (100, 100′) comprising a plurality of consecutive audio signal segments (101, 102, 103, 101′, 102′, 103′) and suppressing wind noise in the input audio signal with a wind noise suppressor module (20) to generate a wind noise reduced audio signal. The method further comprises sing a neural network (10) trained to predict a set of gains for reducing noise in the input audio signal (100, 100′) given samples of the input audio signal (100, 100′), wherein a noise reduced audio signal is formed by applying said set of gains to the input audio signal (100, 100′) and mixing the wind noise reduced audio signal and the noise reduced audio signal with a mixer (30) to obtain an output audio signal with suppressed wind noise.
The present disclosure relates to a method and system for processing audio for source separation. The method comprises obtaining an input audio signal (A) comprising at least two channels and processing the input audio signal (A) with a spatial cue based separation module (10) to obtain an intermediate audio signal (B). The spatial cue based separation module (10) is configured to determine a mixing parameter of the at least two channels of the input audio signal (A) and modify the channels, based on the mixing parameter, to obtain the intermediate audio signal (B). The method further comprises processing the intermediate audio signal (B) with a source cue based separation module (20) to generate an output audio signal (C), wherein the source cue based separation module (20) is configured to implement a neural network trained to predict a noise reduced output audio signal (C) given the intermediate audio signal (B).
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
5.
Scalable systems for controlling color management comprising varying levels of metadata
Several embodiments of scalable image processing systems and methods are disclosed herein whereby color management processing of source image data to be displayed on a target display is changed according to varying levels of metadata.
H04N 19/30 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
G09G 5/02 - Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the way in which colour is displayed
H04N 9/69 - Circuits for processing colour signals for controlling the amplitude of colour signals, e.g. automatic chroma control circuits for modifying the colour signals by gamma correction
H04N 21/235 - Processing of additional data, e.g. scrambling of additional data or processing content descriptors
H04N 21/4402 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
09 - Scientific and electric apparatus and instruments
41 - Education, entertainment, sporting and cultural services
42 - Scientific, technological and industrial services, research and design
Goods & Services
Advertising services; Advertising services, namely, promoting the goods and services of others; Advertising services, namely, server guided ad insertion into broadcast and multicast networks; Online marketing and advertising services, namely, providing personalized advertising, content recommendations and product offerings; Arranging and conducting online auctions; Arranging and conducting auctions, namely, live auctions Communication services, namely, digital and electronic transmission of voice, data, sound, music, graphics, images, audio, video, information, and messages; Video on demand (VOD) transmission services; Digital media streaming services; Transmission of network and satellite television programming; Teleconferencing and video conferencing services; Communications services, namely, interactive streaming and broadcasting services; Data streaming; Video, audio and video streaming in the fields of entertainment, films, sports, gaming and music via global and local computer networks; Streaming of audio and video via the Internet featuring music, movies, news, and sports; Streaming audio and video materials about music, movies, television shows, music videos, and news accessible on websites via global and local computer networks; Broadcasting of film and television features via the Internet and social media platforms; Broadcasting of television news programs; Providing streaming audio and video such as music, movies, television shows, music videos, news and sports webcasts via a website and social media platforms; Streaming services via the internet and social media platforms of podcasts; Providing access to a platform for real-time multimedia communications via a website on the internet and social media platforms; Broadcasting and streaming of audio and video recordings of live events via a website and social media platforms Downloadable and recorded software for the collection, managing, editing, organizing, modifying, transmission, sharing, and storage of data and information; Downloadable and recorded software for processing images, graphics, audio, video, and text; Digital media streaming devices; Apparatus for recording or transmission of images, sound or data; Wireless data capture and communications apparatus for transmission of data, images and sound; Interactive data transmission apparatus; Downloadable and recorded video display software Visual effect reproduction for videos, DVDs, television and for internet websites, and 3D sound recording and projection; Providing a website that provides information, audio, and video in the field of sports; Providing an Internet website portal featuring entertainment news and information specifically in the fields of music, sports and gambling; Entertainment services, namely, providing online casino-style games and games of chance; Entertainment services, namely, providing online slot machine-style games; Entertainment services, namely, online casino-style gaming; Gaming services in the nature of casino gambling; Entertainment services, namely, providing games of chance via the Internet; Betting, gambling, igaming in the nature of online sports betting and wagering services; Providing online betting, gambling, igaming in the nature of online sports betting and wagering services via the internet; Providing a website featuring online betting, gambling, igaming in the nature of online sports betting and wagering services; Entertainment services in the nature of sports betting, and igaming in the nature of online sports betting via the internet Computer services, namely, providing an interactive web site featuring technology that allows users to access, consolidate and manage accounts and connections to application programming interfaces; Software as a service (SAAS) services featuring software for developing, building, and operating applications that are used to collect, publish, manage, and transform video, sound, text, visual information, and graphic works; Software as a service (SAAS) services, namely, hosting software for use by others for use for developing, building, and operating applications that are used to collect, publish, manage, and transform video, sound, text, visual information, and graphic works; Platform as a service (PAAS) featuring computer software platforms for developing, building, and operating applications that are used to collect, publish, manage, and transform video, sound, text, visual information, and graphic works; Providing temporary use of non-downloadable software for hosting and conducting live auctions for the sale of products and services via a global computer network
7.
METHOD AND DEVICE FOR DERIVING INTER-VIEW MOTION MERGING CANDIDATE
The present invention provides a method and a device for deriving an inter-view motion merging candidate. A method for deriving an inter-view motion merging candidate, according to an embodiment of the present invention, can comprise the steps of: on the basis of encoding information of an inter-view reference block derived by means of a variation vector of a current block, determining whether or not inter-view motion merging of the current block is possible; and, if inter-view motion merging of the current block is not possible, generating an inter-view motion merging candidate of the current block by using encoding information of an adjacent block that is spatially adjacent to the inter-view reference block.
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/105 - Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
H04N 19/139 - Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
Several embodiments of scalable image processing systems and methods are disclosed herein whereby color management processing of source image data to be displayed on a target display is changed according to varying levels of metadata.
H04N 19/30 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
G09G 5/02 - Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the way in which colour is displayed
H04N 9/69 - Circuits for processing colour signals for controlling the amplitude of colour signals, e.g. automatic chroma control circuits for modifying the colour signals by gamma correction
H04N 21/235 - Processing of additional data, e.g. scrambling of additional data or processing content descriptors
H04N 21/4402 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
The present disclosure relates to a method and audio processing arrangement for extracting a target mid (and optionally a target side) audio signal from a stereo audio signal. The method comprises obtaining (S1) a plurality of consecutive time segments of the stereo audio signal and obtaining (S2), for each of a plurality of frequency bands of each time segment of the stereo audio signal, at least one of a target panning parameter (Θ) and a target phase difference parameter (Φ). The method further comprises extracting (S3), for each time segment and each frequency band, a partial mid signal representation (211, 212) based on at least one of the target panning parameter (Θ) and the target phase difference parameter (Φ) of each frequency band and forming (S4) the target mid audio signal (M) by combining the partial mid signal representations (211, 212) for each frequency band and time segment.
A method and system for separating a target audio source from a multi-channel audio input including N audio signals, N>=3. The N audio signals are combined into at least two unique signal pairs, and pairwise source separation is performed on each signal pair to generate at least two processed signal pairs, each processed signal pair including source separated versions of the audio signals in the signal pair. The at least two processed signal pairs are combined to form the target audio source having N target audio signals corresponding to the N audio signals.
Approaches for generating metadata for content to be composited and rendered are described. These approaches can be used with the development and distribution of one or more web pages or other graphical user interfaces. For example, a web page developer can collect content to be composited together into a web page and invoke a set of APIs to generate the metadata for the content of the web page that will be composited; a metadata generation system receives the calls through the API and generates the metadata. The web page can then be distributed with the generated metadata which can be used to create the display of the web page with content that is perceptually modified based on the metadata about the individual elements on the web page and their spatial proximity.
There are two representations for Higher Order Ambisonics denoted HOA: spatial domain and coefficient domain. The invention generates from a coefficient domain representation a mixed spatial/coefficient domain representation, wherein the number of said HOA signals can be variable. An aspect of the invention further relates to methods and apparatus decoding multiplexed and perceptually encoded HOA signals, including transforming a vector of PCM encoded spatial domain signals of the HOA representation to a corresponding vector of coefficient domain signals by multiplying the vector of PCM encoded spatial domain signals with a transform matrix and de-normalizing the vector of PCM encoded and normalized coefficient domain signals, wherein said de-normalizing comprises. The methods may include combining a vector of coefficient domain signals and the vector of de-normalized coefficient domain signals to determine a combined vector of HOA coefficient domain signals that can have a variable number of HOA coefficients.
H04S 3/00 - Systems employing more than two channels, e.g. quadraphonic
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
13.
SCALABLE SYSTEMS FOR CONTROLLING COLOR MANAGEMENT COMPRISING VARYING LEVELS OF METADATA
Several embodiments of scalable image processing systems and methods are disclosed herein whereby color management processing of source image data to be displayed on a target display is changed according to varying levels of metadata.
H04N 19/30 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
G09G 5/02 - Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the way in which colour is displayed
H04N 9/69 - Circuits for processing colour signals for controlling the amplitude of colour signals, e.g. automatic chroma control circuits for modifying the colour signals by gamma correction
H04N 21/235 - Processing of additional data, e.g. scrambling of additional data or processing content descriptors
H04N 21/4402 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
14.
SCALABLE SYSTEMS FOR CONTROLLING COLOR MANAGEMENT COMPRISING VARYING LEVELS OF METADATA
Several embodiments of scalable image processing systems and methods are disclosed herein whereby color management processing of source image data to be displayed on a target display is changed according to varying levels of metadata.
H04N 19/30 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
G09G 5/02 - Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the way in which colour is displayed
H04N 9/69 - Circuits for processing colour signals for controlling the amplitude of colour signals, e.g. automatic chroma control circuits for modifying the colour signals by gamma correction
H04N 21/235 - Processing of additional data, e.g. scrambling of additional data or processing content descriptors
H04N 21/4402 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
Techniques for generating audio streams for immersive audio content are provided. In some embodiments, the techniques involve obtaining a first set of audio channel recordings from a first audio capture device and a second set of audio channel recordings from a second audio capture device. The techniques may involve performing beamforming using the first set of audio channel recordings to generate a spatially-processed set of audio channel recordings. The techniques may involve performing spatial matching and timbre matching on the spatially-processed set of audio channel recordings using information associated with the second set of audio channel recordings to generate a matched set of audio channel recordings. The techniques may involve combining the matched set of audio channel recordings with the second set of audio channel recordings to generate a perceptually-matched audio stream.
In one example, a method of generating binaural audio includes transforming input audio signals representing an audio scene into corresponding decorrelated audio objects and applying a head related transform filter (HRTF) to each of the decorrelated audio objects to generate respective left and right audio components. The method also includes summing the left audio components and summing the right audio components to generate left and right output binaural signals, respectively. In some examples, the decorrelation of audio objects is performed using an all-pass filter serially connected with a bank of comb filters whose delay parameters are selected based on different respective prime numbers. In some examples, different HRTFs are applied to low- and high-frequency components. In some examples, the HRTFs are normalized with respect to an HRTF corresponding to a reference head-rotation angle.
H04S 5/00 - Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation
Optical filters for projection assemblies. One optical filter includes a transmissive portion configured to transmit modulated light toward a downstream optical element and a reflective portion. The reflective portion is configured to receive unmodulated light from a modulator at a first angle, and reflect the unmodulated light toward a light dump at a second angle, wherein an angle different between the first angle and the second angle is between 90° and 180°. The optical filter is disposed at a Fourier plane of the modulated light.
G02B 26/08 - Optical devices or arrangements for the control of light using movable or deformable optical elements for controlling the direction of light
The present invention is directed to systems, methods and apparatus for processing media content for reproduction by a first apparatus. The method includes obtaining pose information indicative of a position and/or orientation of a user. The pose information is transmitted to a second apparatus that provides the media content. The media content is rendered based on the pose information to obtain rendered media content. The rendered media content is transmitted to the first apparatus for reproduction. The present invention may include a first apparatus for reproducing media content and a second apparatus storing the media content. The first apparatus is configured to obtain pose information indicative and transmit the pose information to the second apparatus; and the second apparatus is adapted to: render the media content based on the pose information to obtain rendered media content; and transmit the rendered media content to the first apparatus for reproduction.
A63F 13/428 - Processing input control signals of video game devices, e.g. signals generated by the player or derived from the environment by mapping the input signals into game commands, e.g. mapping the displacement of a stylus on a touch screen to the steering angle of a virtual vehicle involving motion or position input signals, e.g. signals representing the rotation of an input controller or a player's arm motions sensed by accelerometers or gyroscopes
A63F 13/213 - Input arrangements for video game devices characterised by their sensors, purposes or types comprising photodetecting means, e.g. cameras, photodiodes or infrared cells
G06F 3/01 - Input arrangements or combined input and output arrangements for interaction between user and computer
H04L 67/131 - Protocols for games, networked simulations or virtual reality
H04S 3/00 - Systems employing more than two channels, e.g. quadraphonic
H04S 7/00 - Indicating arrangementsControl arrangements, e.g. balance control
19.
SIGNAL RESHAPING AND CODING FOR HDR AND WIDE COLOR GAMUT SIGNALS
In a method to improve the coding efficiency of high-dynamic range (HDR) images, a decoder parses sequence processing set (SPS) data from an input coded bitstream to detect that an HDR extension syntax structure is present in the parsed SPS data. It extracts from the HDR extension syntax structure post-processing information that includes one or more of a color space enabled flag, a color enhancement enabled flag, an adaptive reshaping enabled flag, a dynamic range conversion flag, a color correction enabled flag, or an SDR viewable flag. It decodes the input bitstream to generate a preliminary output decoded signal, and generates a second output signal based on the preliminary output signal and the post-processing information.
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
G06T 5/90 - Dynamic range modification of images or parts thereof
H04N 19/186 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
H04N 19/30 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
H04N 19/85 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
20.
CODED SPEECH ENHANCEMENT BASED ON DEEP GENERATIVE MODEL
A system for generating enhanced speech data using robust audio features is disclosed. In some embodiments, a system is programmed to use a self-supervised deep learning model to generate a set of feature vectors from given audio data that contains contaminated speech and is coded. The system is further programmed to use a generative deep learning model to create improved audio data corresponding to clean speech from the set of feature vectors.
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
Some disclosed methods involve: receiving multi-channel audio data including unlabeled multi-channel audio data; extracting audio feature data from the unlabeled multi-channel audio data; applying a spatial masking process to a portion of the audio feature data; applying a contextual encoding process to the masked audio feature data, to produce predicted spatial embeddings in a latent space; obtaining reference spatial embeddings in the latent space; determining a loss function gradient based, at least in part, on a variance between the predicted spatial embeddings and the reference spatial embeddings; and updating the contextual encoding process according to the loss function gradient until one or more convergence metrics are attained.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
22.
Neutral color preservation for single-layer backward compatible codec
Novel methods and systems for processing a single-layer backward compatible codec with multiple-channel multiple regression coefficients either provided in or pointed to in metadata such that the coefficients have been biased to prevent a shift in neutral colors. Pseudo neutral color patches are used along with a saturation weighting factor to bias the coefficients.
G06T 7/90 - Determination of colour characteristics
H04N 19/186 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
On the basis of a bitstream (P), an n-channel audio signal (X) is reconstructed by deriving an m-channel core signal (Y) and multichannel coding parameters (α) from the bitstream, where 1≤m
G10L 19/24 - Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
E21B 21/00 - Methods or apparatus for flushing boreholes, e.g. by use of exhaust air from motor
E21B 33/138 - Plastering the borehole wallInjecting into the formation
E21B 41/00 - Equipment or details not covered by groups
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
Volume leveling of an audio signal using a volume leveling control signal. The method comprises determining a noise reliability ratio w(n) as a ratio of noise-like frames over all frames in a current time segment, determining a PGC noise confidence score XPGN(n) indicating a likelihood that professionally generated content, PGC, noise is present in the time segment, and determining, for the time segment, whether the noise reliability ratio is above a predetermined threshold. When the noise reliability ratio is above the predetermined threshold, the volume leveling control signal is updated based on the PGC noise confidence score, and when the noise reliability ratio is below the predetermined threshold, the volume leveling control signal is left unchanged. Volume leveling is improved by preventing boosting of e.g. phone-recorded environmental noise in UGC, while keeping original behavior for other types of content.
Systems and methods for dynamic video timing in display devices. A display control system includes an active matrix, a column driver, a row driver, and a controller. The active matrix includes a plurality of pixels forming a plurality of rows and a plurality of columns. The column driver is configured to control the plurality of columns of pixels. The row driver is configured to control the plurality of rows of pixels. The controller is configured to receive a plurality of scanlines forming a video frame, receive an indication of a scanline order, reorder the plurality of scanlines according to the scanline order, and control the row driver according to the reordered plurality of scanlines. The scanline order is indicative of an order at which the plurality of rows of pixels are controlled.
A dual-modulation laser projection system includes (a) a polarizing beamsplitter for splitting laser light into first and second polarized beams having mutually orthogonal polarizations, (b) a phase spatial light modulator (SLM) for beam steering the second polarized beam, (c) a mechanical amplitude SLM for amplitude modulating a combination of the first polarized beam and the second polarized beam as beam steered by the phase SLM, and (d) a filter for removing, from the amplitude modulated combination of the first and second polarized beams, one or more of a plurality of diffraction orders introduced by the mechanical amplitude SLM, to generate filtered, modulated output light.
Systems and methods for generating downsampled textured maps. One example method includes training a neural network by providing a three-dimensional model including an original texture map to the neural network, downsampling the original texture map, and iteratively differentiably rendering the original and downsampled texture maps and using differences in renderings of the original and downsampled texture maps as feedback for training the neural network. After reaching a training completion condition, the trained neural network provides a downsampled texture map that has a lower resolution than the texture map.
Systems and methods are described for controlling adaptive filtering components. A far end signal may be filtered by each of a foreground filter and a background filter, where both filters are adaptive echo cancellation filters that output echo estimations. Control logic may halt adaptation by the background filter based on a deviation signal. To determine the deviation signal, cross-correlation coefficients for the echo estimations produced by both filters may be determined for each frequency bin of the far end signal. The determined cross-correlation coefficients may be summed across a plurality of the frequency bins. A hysteresis function may then be applied to the sum associated with the filter associated with the echo estimation used to generate a filtered result. The deviation signal may be activated in response to the hysteresis function outputting a high value, which the control logic uses to turn off adaptation by the background filter.
H04M 9/08 - Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
29.
IMAGE ENCODING AND DECODING APPARATUS, AND IMAGE ENCODING AND DECODING METHOD USING CONTOUR MODE BASED INTRA PREDICTION
According to the present invention, an adaptive scheme is applied to an image encoding apparatus that includes an inter-predictor, an intra-predictor, a transformer, a quantizer, an inverse quantizer, and an inverse transformer, wherein input images are classified into two or more different categories, and two or more modules from among the inter-predictor, the intra-predictor, the transformer, the quantizer, and the inverse quantizer are implemented to perform respective operations in different schemes according to the category to which an input image belongs. Thus, the invention has the advantage of efficiently encoding an image without the loss of important information as compared to a conventional image encoding apparatus which adopts a packaged scheme.
H04N 19/109 - Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
H04N 19/11 - Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
H04N 19/117 - Filters, e.g. for pre-processing or post-processing
H04N 19/12 - Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
H04N 19/132 - Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
H04N 19/136 - Incoming video signal characteristics or properties
H04N 19/14 - Coding unit complexity, e.g. amount of activity or edge presence estimation
H04N 19/159 - Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
H04N 19/17 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/61 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
H04N 19/82 - Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
30.
METHODS, APPARATUS AND SYSTEMS FOR POSITION-BASED GAIN ADJUSTMENT OF OBJECT-BASED AUDIO
The positions of a plurality of speakers at a media consumption site are determined. Audio information in an object-based format is received. Gain adjustment value for a sound content portion in the object-based format may be determined based on the position of the sound content portion and the positions of the plurality of speakers. Audio information in a ring-based channel format is received. Gain adjustment value for each ring-based channel in a set of ring-based channels may be determined based on the ring to which the ring-based channel belongs and the positions of the speakers at a media consumption site.
Methods and apparatus for representing multimedia signals using neural field networks. According to an example embodiment, a method of jointly representing an audio signal and a video signal using a neural field includes applying positional encoding to a time stamp corresponding to a video frame of the video signal and further corresponding to an associated segment of the audio signal to generate a respective higher-dimensional embedding. The method further includes reconstructing the video frame of the video signal and generating a corresponding set of audio samples representing the associated segment of the audio signal by applying the higher-dimensional embedding to a neural-field network trained to represent a sequence of video frames of the video signal and an associated plurality of segments of the audio signal.
H04N 19/46 - Embedding additional information in the video signal during the compression process
H04N 19/68 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using error resilience involving the insertion of resynchronisation markers into the bitstream
H04N 19/85 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
H04N 21/8547 - Content authoring involving timestamps for synchronizing content
Methods, systems, and media for determining sound field rotations are provided. In some embodiments, a method for determining sound field rotations involves determining an activity situation of a user. The method may involve determining a user head orientation using at least one sensor of the one or more sensors. The method may involve determining a direction of interest based on the activity situation and the user head orientation. The method may involve determining a rotation of a sound field used to present audio objects via headphones based on the direction of interest.
A multi-view input image covering multiple sampled views is received. A multi-view layered image stack is generated from the multi-view input image. A target view of a viewer to an image space depicted by the multi-view input image is determined based on user pose data. The target view is used to select user pose selected sampled views from among the multiple sampled views. Layered images for the user pose selected sampled views, along with alpha maps and beta scale maps for the user pose selected sampled views are encoded into a video signal to cause a recipient device of the video signal to generate a display image for rendering on the image display.
Diffuse or spatially large audio objects may be identified for special processing. A decorrelation process may be performed on audio signals corresponding to the large audio objects to produce decorrelated large audio object audio signals. These decorrelated large audio object audio signals may be associated with object locations, which may be stationary or time-varying locations. For example, the decorrelated large audio object audio signals may be rendered to virtual or actual speaker locations. The output of such a rendering process may be input to a scene simplification process. The decorrelation, associating and/or scene simplification processes may be performed prior to a process of encoding the audio data.
H04S 7/00 - Indicating arrangementsControl arrangements, e.g. balance control
G10L 19/00 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
G10L 19/018 - Audio watermarking, i.e. embedding inaudible data in the audio signal
G10L 19/20 - Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding
H04S 3/00 - Systems employing more than two channels, e.g. quadraphonic
36.
EFFICIENT ORIENTATION TRACKING WITH FUTURE ORIENTATION PREDICTION
The present disclosure relates to a method and system for predicting a future orientation of an orientation tracker (100). The method comprising obtaining a sequence of angular velocity samples, each angular velocity sample indicating an angular velocity at a point in time and obtaining a sequence of angular acceleration samples, each angular acceleration sample indicating an acceleration or deceleration of the angular velocity at each point in time. Wherein said method further comprises determining (S5a), for each point in time where the angular velocity is accelerating, a predicted orientation of the orientation tracker (100) based on a first order prediction of an accumulated rotation of the orientation tracker (100) and determining (S5c), for each point in time where the angular velocity is decelerating, a predicted orientation of the orientation tracker (100) based on a second order prediction of the accumulated rotation of the orientation tracker (100).
Systems and method for generating color correction matrices for converting raw red-green-blue (RGB) signals to a standard color space. One example system includes an image sensor and an electronic processor. The image sensor is configured to capture a scene and generate a raw image, the raw image including a raw RGB signal. The electronic processor is configured to receive the raw RGB signal, determine white balance coefficient ratio values of the raw image, provide the white balance coefficient ratio values to a neural network, and receive color correction matrix values from the neural network. The color correction matrix values are based on the white balance coefficient ratio values. The electronic processor is configured to generate a color correction matrix using the color correction matrix values and apply the color correction matrix to the raw RGB signal to generate a corrected RGB signal.
H04N 23/88 - Camera processing pipelinesComponents thereof for processing colour signals for colour balance, e.g. white-balance circuits or colour temperature control
38.
METHODS AND DEVICES FOR RENDERING AN AMBISONICS AUDIO SIGNAL
The present document describes a method (400) for rendering an ambisonics signal using a loudspeaker arrangement comprising S loudspeakers. The method (400) comprises converting (401) a set of N ambisonics channel signals (111) into a set of unfiltered pre-rendered signals (211), with N>1 and S>1. Furthermore, the method (400) comprises performing (402) near field compensation, referred to as NFC, filtering of M unfiltered pre-rendered signals (211) of the set of unfiltered pre-rendered signals (211) to provide a set of S filtered loudspeaker channel signals (114) for rendering using the corresponding S loudspeakers.
Systems and methods for an entropy coding system are described. The entropy coding systems include an encoding apparatus and a decoding apparatus. The encoding apparatus is configured to receive an original input stream comprising a plurality of symbols having a known entropy characteristic according to a probability distribution of each of the symbols appearing in the original input stream, determine an input and respective state for each symbol read from the original input stream, append the determined input to the encoded output stream, and provided the encoded output stream to the decoding apparatus. The decoding apparatus is configured to receive the encoded output stream, process the encoded output stream, and for each read input: determine an output symbol and a respective output, persist the respective output state to the encoded output stream, and append the determined output symbol to the results output stream.
A system for detecting speech from reverberant signals is disclosed. The system is programmed to receive spectral temporal amplitude data in the modulation frequency domain. The system is programmed to then enhance the spectral temporal amplitude data by reducing reverberation and other noise as well as smoothing based on certain properties of the spectral temporal spectrogram associated with the spectral temporal amplitude data. Next, the system is programmed to compute various features related to the presence of speech based on the enhanced spectral temporal amplitude data and other data in the modulation frequency domain or in the (acoustic) frequency domain. The system is programmed to then determine an extent of speech present in the audio data corresponding to the received spectral temporal amplitude data based on the various features. The system can be programmed to transmit the extent of speech present to an output device.
A system for managing user-generated content (UGC) and professionally generated content (PGC) is disclosed. The system is programmed to receive digital audio data having two channels from a social media platform. The system is programmed to extract spatial features that capture differences in the two channels from the digital audio data. The system is programmed to also extract temporal features, spectral features, and background features from the digital audio data. The system is programmed to then use the extracted features to determine whether to process the digital audio data as UGC or PGC before playback.
Tensor-Product B-splines (TPB) have been shown to improve video quality when used to represent reshaping functions to map reshaped standard dynamic range content into high dynamic range (HDR) content; however, TPB prediction is computationally intensive and may not be supported by legacy devices. Methods and systems for backwards-compatible signaling of TPB-related metadata and a fast TPB prediction method are presented to overcome both of these limitations. Computation overhead for a TPB-based 3D look-up table is reduced by using temporary two-dimensional arrays. A remapping of the most significant bits of a legacy bit-depth parameter allows for backwards compatibility.
G06T 5/90 - Dynamic range modification of images or parts thereof
H04N 19/186 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
H04N 19/46 - Embedding additional information in the video signal during the compression process
A sequence of base layer images for a base reference image display and a set of one or more sequences of beta scale maps corresponding to one or more non-base reference image displays are generated. A subset of one or more sequences of beta scale maps is determined in the set of one or more sequences of beta scale maps based at least in part on display capabilities of a target image display. The sequence of base layer images, along with the subset of one or more sequences of beta scale maps, is encoded into a video signal to cause a recipient device of the video signal to generate a sequence of display images from the sequence of base layer images and the subset of one or more sequences of beta scale maps for rendering on the image display.
H04N 19/30 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
G06T 3/04 - Context-preserving transformations, e.g. by using an importance map
G06T 3/4007 - Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
G06T 5/50 - Image enhancement or restoration using two or more images, e.g. averaging or subtraction
G06T 5/92 - Dynamic range modification of images or parts thereof based on global image properties
H04N 19/46 - Embedding additional information in the video signal during the compression process
44.
GENERATING CHANNEL AND OBJECT-BASED AUDIO FROM CHANNEL-BASED AUDIO
A method of audio processing includes generating a detection score based on the partial loudnesses of a reference audio signal, extracted audio objects, extracted bed channels, a rendered audio signal and a channel-based audio signal. The detection score is indicative of an audio artifact in one or more of the audio objects and the bed channels. The extracted audio objects and extracted bed channels may be modified, in accordance with the detection score, to reduce the audio artifact.
Methods for generating an object based audio program, renderable in a personalizable manner, and including a bed of speaker channels renderable in the absence of selection of other program content (e.g., to provide a default full range audio experience). Other embodiments include steps of delivering, decoding, and/or rendering such a program. Rendering of content of the bed, or of a selected mix of other content of the program, may provide an immersive experience. The program may include multiple object channels (e.g., object channels indicative of user-selectable and user-configurable objects), the bed of speaker channels, and other speaker channels. Another aspect is an audio processing unit (e.g., encoder or decoder) configured to perform, or which includes a buffer memory which stores at least one frame (or other segment) of an object based audio program (or bitstream thereof) generated in accordance with, any embodiment of the method.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
A metadata-aided film-grain removal method and corresponding apparatus. An example embodiment enables a video decoder to substantially fully remove the film grain from a digital video signal that has undergone lossy video compression and then video decompression. Different embodiments may rely only on spatial-domain grain-removal processing, only on temporal-domain grain-removal processing, or on a combination of spatial-domain and temporal-domain grain-removal processing. Both spatial-domain and temporal-domain grain-removal processing may use metadata provided by the corresponding video encoder, the metadata including one or more parameters corresponding to the digital film grain injected into the host video at the encoder. Different film-grain-injection formats can be accommodated by the video decoder using signal preprocessing directed at supplying, to the film-grain removal module of the video decoder, an input compatible with the film-grain removal method implemented therein.
H04N 19/86 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness
H04N 19/44 - Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
H04N 19/80 - Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
Audio content coded for a reference speaker configuration is downmixed to downmix audio content coded for a specific speaker configuration. One or more gain adjustments are performed on individual portions of the downmix audio content coded for the specific speaker configuration. Loudness measurements are then performed on the individual portions of the downmix audio content. An audio signal that comprises the audio content coded for the reference speaker configuration and downmix loudness metadata is generated. The downmix loudness metadata is created based at least in part on the loudness measurements on the individual portions of the downmix audio content.
H04S 7/00 - Indicating arrangementsControl arrangements, e.g. balance control
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
A method may include generating a hybrid image associated with a first interpretation corresponding to a first value of a media parameter and a second interpretation corresponding to a second value of the media parameter. The hybrid image may include a first visibility ratio between the first interpretation and the second interpretation. The method may include refining the hybrid image to create a refined hybrid image that includes a second visibility ratio different than the first visibility ratio. The method may include displaying the refined hybrid image, and receiving a user input related to a first perception of the refined hybrid image by a user. The method may include determining, based at least in part on the user input, an optimized value of the media parameter, and providing output media for display to the user to a playback device according to the optimized value of the media parameter.
There is provided encoding and decoding methods for representing spatial audio that is a combination of directional sound and diffuse sound. An exemplary encoding method includes inter alia creating a single- or multi-channel downmix audio signal by downmixing input audio signals from a plurality of microphones in an audio capture unit capturing the spatial audio; determining first metadata parameters associated with the downmix audio signal, wherein the first metadata parameters are indicative of one or more of: a relative time delay value, a gain value, and a phase value associated with each input audio signal; and combining the created downmix audio signal and the first metadata parameters into a representation of the spatial audio.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
H04R 1/32 - Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
H04S 7/00 - Indicating arrangementsControl arrangements, e.g. balance control
Image data is received for rendering an image on an image display to a viewer (402). The image data specifies a pixel value of the image for a pixel of the image display to render. The pixel value for the pixel includes multiple component pixel values corresponding to multiple color components of a color space. A color gamut locational value of the pixel value is computed based on two or more component pixel values in the multiple component pixel values of the pixel value specified for the pixel (404). The color gamut locational value is used to determine whether bandwidth broadening is to be applied to image rendering light produced by the pixel of the image display to render the pixel value (406). The image rendering light is directed to the viewer.
G09G 3/20 - Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes for presentation of an assembly of a number of characters, e.g. a page, by composing the assembly by combination of individual elements arranged in a matrix
G09G 3/36 - Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes for presentation of an assembly of a number of characters, e.g. a page, by composing the assembly by combination of individual elements arranged in a matrix by control of light from an independent source using liquid crystals
51.
GENERATING HDR IMAGE FROM CORRESPONDING CAMERA RAW AND SDR IMAGES
Guided filtering is applied, with a camera raw image as a guidance image, to a first image to generate an intermediate image. A dynamic range mapping is performed on the intermediate image to generate a second image of a different dynamic range. The second image is used to generate specific local reshaping function index values for selecting specific local reshaping functions. The specific local reshaping functions are applied to the second image to generate a locally reshaped image.
Novel methods and systems for locally adapting (modifying) an image to compensate for ambient conditions is realized. An adjusted cone response determining based on a minimum target cone response and a delta cone response. A target luminance is then calculated from a local adaptation pooling and the adjusted cone response. Then the image is modified by the target luminance to produce an adapted image.
Optimization methods are described for filtering using neural networks in image and video processing. Given two consecutive (and neighboring) neural network (NN) blocks, each one with each own skip connection and two or more convolutional neural network blocks between the skip start and skip end of the skip connection, a sliding window method allows to move the skip start and skip end positions into new positions, thus allowing fusion of identical neighboring blocks and reducing computational cost. An iterative method to efficiently quantize a trained NN from a floating-point implementation to a fixed-point implementation is also presented. Additional methods to reduce operational complexity and improve training are also presented.
H04N 19/82 - Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
54.
Alias cancelling during audio coding mode transitions
An apparatus for processing an audio signal and method thereof are disclosed. The present invention includes receiving, by an audio processing apparatus, an audio signal including a first data of a first block encoded with rectangular coding scheme and a second data of a second block encoded with non-rectangular coding scheme; receiving a compensation signal corresponding to the second block; estimating a prediction of an aliasing part using the first data; and, obtaining a reconstructed signal for the second block based on the second data, the compensation signal and the prediction of aliasing part.
G10L 19/00 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
G10L 19/005 - Correction of errors induced by the transmission channel, if related to the coding algorithm
G10L 19/022 - Blocking, i.e. grouping of samples in timeChoice of analysis windowsOverlap factoring
G10L 19/04 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
G10L 21/00 - Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
G10L 25/45 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of analysis window
Input texture images are received. Depth maps are determined for the input texture images. Each depth map in the depth maps includes depth values of pixels represented in a corresponding input texture image in the input texture images. Depth difference maps are generated from the depth maps. The depth difference maps include at least one depth difference map that is generated from two successive depth maps in the depth maps. A video signal encoded with a compressed version of the input texture images and the depth difference maps is outputted. The video signal causes a recipient device of the video signal to generate display images from the compressed version of the input texture images and the depth difference maps for rendering on an image display.
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/119 - Adaptive subdivision aspects e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
H04N 19/139 - Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/177 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]
H04N 19/52 - Processing of motion vectors by encoding by predictive encoding
A communication client device operated by a first user in a communication session receives a viewing direction tracking data portion indicating a view direction of a second user in the communication session. It is determined that the view direction of the second user is towards a third user at a first time point in the communication session. The view direction of the second user is used to modify a pre-adapted visual depiction of the second user into an adapted visual depiction of the second user. The adapted visual depiction of the second user is rendered, to the first user, on an image display operating with the communication client device.
Methods and systems for the super resolution of high dynamic range (HDR) video are described. Given a sequence of video frames, a current frame and two or more neighboring frames are processed by a neural-network (NN) feature extraction module, followed by a NN upscaling module, and a NN reconstruction module. In parallel, the current frame is upscaled using traditional up-sampling to generate an intermediate up-sampled frame. The output of the reconstruction module is added to the intermediate up-sampled frame to generate an output frame. Additional traditional up-sampling may be performed on the output frame to match the desired up-scaling factor, beyond the up-scaling factor for which the neural network was trained.
G06T 3/40 - Scaling of whole images or parts thereof, e.g. expanding or contracting
G06T 3/4046 - Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
G06T 3/4053 - Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
A method, an apparatus, and logic to post-process raw gains determined by input processing to generate post-processed gains, comprising using one or both of delta gain smoothing and decision-directed gain smoothing. The delta gain smoothing comprises applying a smoothing filter to the raw gain with a smoothing factor that depends on the gain delta: the absolute value of the difference between the raw gain for the current frame and the post-processed gain for a previous frame. The decision-directed gain smoothing comprises converting the raw gain to a signal-to-noise ratio, applying a smoothing filter with a smoothing factor to the signal-to-noise ratio to calculate a smoothed signal-to-noise ratio, and converting the smoothed signal-to-noise ratio to determine the second smoothed gain, with smoothing factor possibly dependent on the gain delta.
The present disclosure relates to reverberation generation for headphone virtualization. A method of generating one or more components of a binaural room impulse response (BRIR) for headphone virtualization is described. In the method, directionally-controlled reflections are generated, wherein directionally-controlled reflections impart a desired perceptual cue to an audio input signal corresponding to a sound source location. Then at least the generated reflections are combined to obtain the one or more components of the BRIR. Corresponding system and computer program products are described as well.
H04S 3/00 - Systems employing more than two channels, e.g. quadraphonic
G10K 15/08 - Arrangements for producing a reverberation or echo sound
H04S 5/00 - Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation
H04S 7/00 - Indicating arrangementsControl arrangements, e.g. balance control
60.
CREATING AUDIO-VISUAL OBJECTS THROUGH MULTIMODAL ANALYSIS
Devices, methods, and user interfaces for processing signals related to multimedia content. An example method includes capturing, with an electronic device, media data including video data and contemporaneously captured audio data associated with the video data. The method includes analyzing, with an electronic processor, the video data to identify one or more visual objects and to assign an object class to each of the one or more visual objects and analyzing, with the electronic processor, the audio data to identify one or more audio objects and to assign an object class to each of the one or more audio objects. The method includes generating, with the electronic processor, one or more audio-visual objects by associating each of the one or more audio objects with a respective visual object of the one or more visual objects.
An input image to be coded into a video signal and a target image are received. The input image and the target image depict same visual content. One or more beta scaling method indicators and one or more sets of one or more beta scale parameters are generated. The one or more beta scaling method indicators indicate one or more beta scaling methods that use the one or more sets of beta scale parameters to perform beta scaling operations on the input image to generate a reconstructed image to approximate the target image. The input image, along with the one or more beta scaling method indicators and the one or more sets of beta scale parameters. is encoded into the video signal for allowing a recipient device of the video signal to generate the reconstructed image.
H04N 19/186 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
H04N 19/105 - Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
H04N 19/117 - Filters, e.g. for pre-processing or post-processing
H04N 19/34 - Scalability techniques involving progressive bit-plane based encoding of the enhancement layer, e.g. fine granular scalability [FGS]
H04N 19/42 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
Encoding/decoding an audio signal having one or more audio components, wherein each audio component is associated with a spatial location. A first audio signal presentation (z) of the audio components, a first set of transform parameters (w(f)), and signal level data (β2) are encoded and transmitted to the decoder. The decoder uses the first set of transform parameters (w(f)) to form a reconstructed simulation input signal intended for an acoustic environment simulation, and applies a signal level modification (α) to the reconstructed simulation input signal. The signal level modification is based on the signal level data (β2) and data (p2) related to the acoustic environment simulation. The attenuated reconstructed simulation input signal is then processed in an acoustic environment simulator. With this process, the decoder does not need to determine the signal level of the simulation input signal, thereby reducing processing load.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
G10L 19/00 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
63.
SPATIAL CODING OF HIGHER ORDER AMBISONICS FOR A LOW LATENCY IMMERSIVE AUDIO CODEC
Described herein is a method of encoding Higher Order Ambisonics, HOA, audio, the method including: receiving an input HOA audio signal having more than four Ambisonics channels; encoding the HOA audio signal using a SPAR coding framework and a core audio encoder; and providing the encoded HOA audio signal to a downstream device, the encoded HOA audio signal including core encoded SPAR downmix channels and encoded SPAR metadata. Further described are a method of decoding Higher Order Ambisonics, HOA, audio, respective apparatuses and computer program products.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
G10L 19/025 - Detection of transients or attacks for time/frequency resolution switching
G10L 19/032 - Quantisation or dequantisation of spectral components
G10L 19/06 - Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
64.
LOW LATENCY AUDIO FILTERBANK WITH IMPROVED FREQUENCY RESOLUTION
A filterbank, suitable for modifying audio signals with dynamic gains in each band, is constructed so that the perceived latency is small, while a larger group delay is applied at low frequencies to enable higher frequency resolution in the lower frequency bands. The higher group delay at low frequencies is achieved by inserting an all-pass filter into the reconstructed filter response.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
65.
NEURAL NETWORKS FOR DYNAMIC RANGE CONVERSION AND DISPLAY MANAGEMENT OF IMAGES
Methods and systems for dynamic range conversion and display mapping of standard dynamic range (SDR) images onto high dynamic range (HDR) displays are described. Given an SDR input image, a processor generates an intensity (luminance) image and optionally a base layer image and a detail layer image. A first neural network uses the intensity image to predict statistics of the SDR image in a higher dynamic range. These predicted statistics together with the original image statistics of the input image are used to derive an optimal tone-mapping curve to map the input SDR image onto an HDR display. Optionally, a second neural network, using the intensity image and the detail layer image, can generate a residual detail layer image in a higher dynamic range to enhance the tone-mapping of the base layer image into the higher dynamic range.
G06T 5/92 - Dynamic range modification of images or parts thereof based on global image properties
G06V 10/60 - Extraction of image or video features relating to illumination properties, e.g. using a reflectance or lighting model
G06V 10/75 - Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video featuresCoarse-fine approaches, e.g. multi-scale approachesImage or video pattern matchingProximity measures in feature spaces using context analysisSelection of dictionaries
66.
DOWNLOAD CONTROL IN MULTI-SERVER COMMUNICATION SYSTEM
Apparatuses and methods for data traffic management in multi-source content delivery are described. The apparatus includes a downloader and a controller. The downloader is coupled to servers via communication links. The controller is configured to determine initial download requests for the servers based on predetermined information about a quality of the links. The controller is also configured to send the initial download requests to the servers with the downloader. The controller is further configured to update the information about the quality of the communication links after the downloader receives data associated with a data file from the servers via the communication links. The controller is also configured to determine subsequent download requests for the servers based on the updated information about the quality of the communication links. The controller of further configured to send the subsequent download requests to the servers via the downloader.
The present invention relates to methods and apparatus for encoding an HOA signal representation (c(t)) of a sound field having an order of N and a number O=(N+1)2 of coefficient sequences to a mezzanine HOA signal representation (wMEZZ(t)). The present invention further relates to methods and apparatus for decoding a reconstructed HOA signal representation from the mezzanine HOA signal representation.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
H04S 3/00 - Systems employing more than two channels, e.g. quadraphonic
H04S 3/02 - Systems employing more than two channels, e.g. quadraphonic of the matrix type, i.e. in which input signals are combined algebraically, e.g. after having been phase shifted with respect to each other
In a method to improve backwards compatibility when decoding high-dynamic range images coded in a wide color gamut (WCG) space which may not be compatible with legacy color spaces, hue and/or saturation values of images in an image database are computed for both a legacy color space (say, YCbCr-gamma) and a preferred WCG color space (say, IPT-PQ). Based on a cost function, a reshaped color space is computed so that the distance between the hue values in the legacy color space and rotated hue values in the preferred color space is minimized. HDR images are coded in the reshaped color space. Legacy devices can still decode standard dynamic range images assuming they are coded in the legacy color space, while updated devices can use color reshaping information to decode HDR images in the preferred color space at full dynamic range.
H04N 19/87 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving scene cut or scene change detection in combination with video compression
H04N 19/46 - Embedding additional information in the video signal during the compression process
H04N 19/85 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
The present disclosure relates to a method for processing audio content to suppress a target audio signal and an apparatus implementing the method. The method comprises obtaining an input audio signal, and determining, for a specific segment associated with a segment specific source activity metric indicating that the target audio source is inactive, a processing application coefficient by smoothing a segment specific source activity metric across at least two segments. The method further comprises extracting a difference audio signal based on a difference between a side audio signal and the input audio signal and extracting, from the difference audio signal, a source suppressed difference. The method further comprises forming a modified difference audio signal as a weighed sum of the difference audio signal and the source suppressed difference audio signal based on the processing application coefficient.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
70.
AUDIO ENCODER AND DECODER WITH DYNAMIC RANGE COMPRESSION METADATA
An audio processing unit (APU) is disclosed. The APU includes a buffer memory configured to store at least one frame of an encoded audio bitstream, where the encoded audio bitstream includes audio data and a metadata container. The metadata container includes a header and one or more metadata payloads after the header. The one or more metadata payloads include dynamic range compression (DRC) metadata, and the DRC metadata is or includes profile metadata indicative of whether the DRC metadata includes dynamic range compression (DRC) control values for use in performing dynamic range compression in accordance with at least one compression profile on audio content indicated by at least one block of the audio data.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
G10L 19/018 - Audio watermarking, i.e. embedding inaudible data in the audio signal
G10L 19/22 - Mode decision, i.e. based on audio signal content versus external parameters
71.
METHOD FOR ENCODING AND DECODING IMAGE USING ADAPTIVE DEBLOCKING FILTERING, AND APPARATUS THEREFOR
Disclosed is an encoding/decoding method and apparatus related to adaptive deblocking filtering. There is provided an image decoding method performing adaptive filtering in inter-prediction, the method including: reconstructing, from a bitstream, an image signal including a reference block on which block matching is performed in inter-prediction of a current block to be encoded; obtaining, from the bitstream, a flag indicating whether the reference block exists within a current picture where the current block is positioned; reconstructing the current block by using the reference block; adaptively applying an in-loop filter for the reconstructed current block based on the obtained flag; and storing the current block to which the in-loop filter is or is not applied in a decoded picture buffer (DPB).
H04N 19/82 - Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
H04N 19/107 - Selection of coding mode or of prediction mode between spatial and temporal predictive coding, e.g. picture refresh
H04N 19/117 - Filters, e.g. for pre-processing or post-processing
H04N 19/137 - Motion inside a coding unit, e.g. average field, frame or block difference
H04N 19/172 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/184 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
H04N 19/50 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
H04N 19/51 - Motion estimation or motion compensation
H04N 19/577 - Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
H04N 19/58 - Motion compensation with long-term prediction, i.e. the reference frame for a current frame not being the temporally closest one
H04N 19/593 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
H04N 19/60 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/91 - Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
The disclosure herein generally relates to capturing, acoustic pre-processing, encoding, decoding, and rendering of directional audio of an audio scene. In particular, it relates to a device adapted to modify a directional property of a captured directional audio in response to spatial data of a microphone system capturing the directional audio. The disclosure further relates to a rendering device configured to modify a directional property of a received directional audio in response to received spatial data.
An apparatus may include an interface system and a first local control system. The first local control system may be configured to: receive first sensor data from a first preview environment while a content stream is being presented in the first preview environment; generate, based at least in part on the first sensor data, first user engagement data corresponding to one or more people in the first preview environment, the first user engagement data indicating estimated engagement with presented content of the content stream; output, via the interface system, either the first user engagement data, the first sensor data, or both, to a data aggregation device; and determine, based at least in part on user preference data, whether to provide at least some of the first user engagement data, at least some of the first sensor data, or both, to one or more machine learning (ML) models.
Several embodiments of scalable image processing systems and methods are disclosed herein whereby color management processing of source image data to be displayed on a target display is changed according to varying levels of metadata.
H04N 19/30 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
G09G 5/02 - Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the way in which colour is displayed
H04N 9/69 - Circuits for processing colour signals for controlling the amplitude of colour signals, e.g. automatic chroma control circuits for modifying the colour signals by gamma correction
H04N 21/235 - Processing of additional data, e.g. scrambling of additional data or processing content descriptors
H04N 21/4402 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
75.
DETECTION AND ENHANCEMENT OF SPEECH IN BINAURAL RECORDINGS
Disclosed herein are method, systems, and computer-program products for segmenting a binaural recording of speech into parts containing self-speech and parts containing external speech, and processing each category with different settings, to obtain an enhanced overall presentation. The segmentation is based on a combination of: i) feature-based frame-by-frame classification, and ii) detecting dissimilarity by statistical methods. The segmentation information is then used by a speech enhancement chain, where independent settings are used to process the self- and external speech parts.
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
G10L 25/78 - Detection of presence or absence of voice signals
76.
METHOD AND DEVICE FOR DECODING A HIGHER-ORDER AMBISONICS (HOA) REPRESENTATION OF AN AUDIO SOUNDFIELD
The invention discloses rendering sound field signals, such as Higher-Order Ambisonics (HOA), for arbitrary loudspeaker setups, where the rendering results in highly improved localization properties and is energy preserving. This is obtained by rendering an audio sound field representation for arbitrary spatial loudspeaker setups and/or by a decoder that decodes based on a decode matrix (D). The decode matrix (D) is based on smoothing and scaling of a first decode matrix {circumflex over (D)} with smoothing coefficients. The first decode matrix {circumflex over (D)} is based on a mix matrix G and a mode matrix {tilde over (Ψ)}, where the mix matrix G was determined based on L speakers and positions of a spherical modelling grid related to a HOA order N, and the mode matrix {tilde over (Ψ)} was determined based on the spherical modelling grid and the HOA order N.
Overlapped block disparity estimation and compensation is described. Compensating for images with overlapped block disparity compensation (OBDC) involves determining if OBDC is enabled in a video bit stream, and determining if OBDC is enabled for one or more macroblocks that neighbor a first macroblock within the video bit stream. The neighboring macroblocks may be transform coded. If OBDC is enabled in the video bit stream and for the one or more neighboring macroblocks, predictions may be made for a region of the first macroblock that has an edge adjacent with the neighboring macroblocks. OBDC can be causally applied. Disparity compensation parameters or modes may be shared amongst views or layers. A variety of predictions may be used with causally-applied OBDC.
H04N 19/139 - Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
H04N 19/103 - Selection of coding mode or of prediction mode
H04N 19/105 - Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
H04N 19/152 - Data rate or code amount at the encoder output by measuring the fullness of the transmission buffer
H04N 19/159 - Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/182 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
H04N 19/184 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
H04N 19/30 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
H04N 19/46 - Embedding additional information in the video signal during the compression process
H04N 19/573 - Motion compensation with multiple frame prediction using two or more reference frames in a given prediction direction
H04N 19/577 - Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
H04N 19/583 - Motion compensation with overlapping blocks
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/61 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
78.
TECHNIQUES FOR CODED MULTI-SOURCE MEDIA DELIVERY TO LEGACY DEVICES VIA EDGE PROXY
A method is disclosed for delivering multi-source media content to a legacy client device via an edge proxy. The method includes determining that one or more criterion for a multi-source relay mode is met based on a determination that a client device requesting media content lacks support for decoding multi-source media data. In response thereto, the method may include instantiating, at a first server in communication with the client device, a multi-source media decoder associated with the client device. The method may also include (i) receiving, at the first server and concurrently from a plurality of multi-source media sources, multi-source media data corresponding to the media content; (ii) decoding, using the multi-source media decoder, the multi-source media data into uncoded media content data corresponding to the media content; and (iii) delivering at least a portion of the uncoded media content data from the first server to the client device.
Many portable playback devices cannot decode and playback encoded audio content having wide bandwidth and wide dynamic range with consistent loudness and intelligibility unless the encoded audio content has been prepared specially for these devices. This problem can be overcome by including with the encoded content some metadata that specifies a suitable dynamic range compression profile by either absolute values or differential values relative to another known compression profile. A playback device may also adaptively apply gain and limiting to the playback audio. Implementations in encoders, in transcoders and in decoders are disclosed.
G10L 19/22 - Mode decision, i.e. based on audio signal content versus external parameters
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
A method for compressing a HOA signal being an input HOA representation with input time frames (C(k)) of HOA coefficient sequences comprises spatial HOA encoding of the input time frames and subsequent perceptual encoding and source encoding. Each input time frame is decomposed (802) into a frame of predominant sound signals (XPS(k−1)) and a frame of an ambient HOA component ({tilde over (C)}AMB(k−1)). The ambient HOA component ({tilde over (C)}AMB(k−1)) comprises, in a layered mode, first HOA coefficient sequences of the input HOA representation (cn(k−1)) in lower positions and second HOA coefficient sequences (cAMB,n(k−1)) in remaining higher positions. The second HOA coefficient sequences are part of an HOA representation of a residual between the input HOA representation and the HOA representation of the predominant sound signals.
H04S 3/00 - Systems employing more than two channels, e.g. quadraphonic
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
G10L 19/24 - Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
H04S 7/00 - Indicating arrangementsControl arrangements, e.g. balance control
81.
Generating Binaural Audio in Response to Multi-Channel Audio Using at Least One Feedback Delay Network
In some embodiments, virtualization methods for generating a binaural signal in response to channels of a multi-channel audio signal, which apply a binaural room impulse response (BRIR) to each channel including by using at least one feedback delay network (FDN) to apply a common late reverberation to a downmix of the channels. In some embodiments, input signal channels are processed in a first processing path to apply to each channel a direct response and early reflection portion of a single-channel BRIR for the channel, and the downmix of the channels is processed in a second processing path including at least one FDN which applies the common late reverberation. Typically, the common late reverberation emulates collective macro attributes of late reverberation portions of at least some of the single-channel BRIRs. Other aspects are headphone virtualizers configured to perform any embodiment of the method.
H04S 7/00 - Indicating arrangementsControl arrangements, e.g. balance control
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
H04S 3/00 - Systems employing more than two channels, e.g. quadraphonic
82.
A METHOD FOR IDENTIFYING AUDIO MATCH CUT CANDIDATES AND PERFORMING AUDIO MATCH CUTTING
An aspect of the present disclosure relates to a method for creating an audio match cut between a first audio clip and a second audio clip, each audio clip comprising a plurality of audio content samples. The method comprises obtaining a transition point for crossfading from the first audio clip into the second audio clip and determining a plurality of similarity metrics. Each similarity metric indicating a similarity between the audio content of a sample of the first audio clip and the audio content of a sample of the second audio clip, wherein the plurality of similarity metrics are determined in a transition context window. The method further comprises determining a variance of the plurality of similarity metrics, determining a crossfading length based on the variance and generating a match cut audio clip by crossfading between the first and second audio clip at the transition point.
G11B 27/10 - IndexingAddressingTiming or synchronisingMeasuring tape travel
G11B 27/28 - IndexingAddressingTiming or synchronisingMeasuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
G10H 1/00 - Details of electrophonic musical instruments
Novel methods and systems are described for providing interactive motion blur on an image by motion inputs from movements of the mobile device displaying the image. The device can process the motion blur by modules providing motion blur parameter estimation, blur application, and image composition based on metadata and a baseline image from the encoder. A pre-loaded filter bank can provide blur kernels for blur application.
Disclosed are methods and systems which convert a multi-microphone input signal to a multichannel output signal making use of a time-and frequency-varying matrix. For each time and frequency tile, the matrix is derived as a function of a dominant direction of arrival and a steering strength parameter. Likewise, the dominant direction and steering strength parameter are derived from characteristics of the multi-microphone signals, where those characteristics include values representative of the inter-channel amplitude and group-delay differences.
H04R 1/40 - Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
Disclosed herein are techniques for generating sounds. In some examples, a method may involve receiving a user input comprising at least an input sound. The method may further involve extracting acoustic features of the input sound. The method may further involve determining a latent space for a target sound effect based at least in part on the user input. The method may further involve generating the target sound effect based on the input sound by providing the acoustic features of the input sound and the latent space for the target sound effect to a trained decoder network.
G10L 21/007 - Changing voice quality, e.g. pitch or formants characterised by the process used
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
G10H 5/00 - Instruments in which the tones are generated by means of electronic generators
Embodiments are disclosed for channel-based audio (CBA) (e.g., 22.2-ch audio) to object-based audio (OBA) conversion. The conversion includes converting CBA metadata to object audio metadata (OAMD) and reordering the CBA channels based on channel shuffle information derived in accordance with channel ordering constraints of the OAMD. The OBA with reordered channels is rendered in a playback device using the OAMD or in a source device, such as a set-top box or audio/video recorder. In an embodiment, the CBA metadata includes signaling that indicates a specific OAMD representation to be used in the conversion of the metadata. In an embodiment, pre-computed OAMD is transmitted in a native audio bitstream (e.g., AAC) for transmission (e.g., over HDMI) or for rendering in a source device. In an embodiment, pre-computed OAMD is transmitted in a transport layer bitstream (e.g., ISO BMFF, MPEG4 audio bitstream) to a playback device or source device.
Embodiments are directed to a companding method and system for reducing coding noise in an audio codec. A compression process reduces an original dynamic range of an initial audio signal through a compression process that divides the initial audio signal into a plurality of segments using a defined window shape, calculates a wideband gain in the frequency domain using a non-energy based average of frequency domain samples of the initial audio signal, and applies individual gain values to amplify segments of relatively low intensity and attenuate segments of relatively high intensity. The compressed audio signal is then expanded back to the substantially the original dynamic range that applies inverse gain values to amplify segments of relatively high intensity and attenuating segments of relatively low intensity. A QMF filterbank is used to analyze the initial audio signal to obtain a frequency domain representation.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
G10L 19/032 - Quantisation or dequantisation of spectral components
G10L 25/18 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
G10L 25/45 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of analysis window
H03G 7/00 - Volume compression or expansion in amplifiers
H04B 1/66 - Details of transmission systems, not covered by a single one of groups Details of transmission systems not characterised by the medium used for transmission for reducing bandwidth of signalsDetails of transmission systems, not covered by a single one of groups Details of transmission systems not characterised by the medium used for transmission for improving efficiency of transmission
Example embodiments disclosed herein relate to audio signal loudness control. A method for controlling loudness of an audio signal is disclosed. The method includes responsive to determining presence of a noise signal, deriving a target partial loudness adjustment based, at least in part, on at least one of a first factor related to the noise signal and a second factor related to the audio signal. The method further includes determining a target partial loudness of the audio signal based, at least in part, on the target partial loudness adjustment. Corresponding system, apparatus and computer program product are also disclosed.
Techniques for adaptive processing of media data based on separate data specifying a state of the media data are provided. A device in a media processing chain may determine whether a type of media processing has already been performed on an input version of media data. If so, the device may adapt its processing of the media data to disable performing the type of media processing. If not, the device performs the type of media processing. The device may create a state of the media data specifying the type of media processing. The device may communicate the state of the media data and an output version of the media data to a recipient device in the media processing chain, for the purpose of supporting the recipient device's adaptive processing of the media data.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
G10L 21/00 - Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
90.
METHODS, APPARATUS, AND SYSTEMS FOR CONVERSION BETWEEN AUDIO SCENE REPRESENTATIONS
Some disclosed examples involve converting input audio data in an input audio format to output audio data in an output audio format. The output audio format may have a higher resolution than the input audio format. The converting may involve applying a biased decoding matrix to the input audio data. The biased decoding matrix may be biased according to an estimated energy distribution of the input audio data and may include a combination of constant matrices and a variable matrix. The variable matrix may be a covariance matrix corresponding to the input audio data. The biased decoding matrix may vary over time as a function of the covariance matrix.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
H04S 3/00 - Systems employing more than two channels, e.g. quadraphonic
H04S 5/00 - Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation
Methods and apparatus for image enhancement using implicit tensor-product B-spline (TPB) modeling. According to an example embodiment, a method for image enhancement includes partitioning an input image into a plurality of first patches; applying TPB modeling to each of the plurality of first patches to generate a respective plurality of implicit TPB models; generating a plurality of second patches using the respective pluralities of implicit TPB models, each of the second patches representing a respective one of the first patches; and combining the plurality of second patches to form an output image representing the input image, the output image having a higher resolution or a better noise metric than the input image.
G06T 3/4053 - Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
An apparatus and method of pre-conditioning audio for machine perception. Machine perception differs from human perception, and different processing parameters are used for machine perception applications (e.g., speech to text processing) as compared to those used for human perception applications (e.g., voice communications). These different parameters may result in pre-conditioned audio that is worsened for human perception yet improved for machine perception.
Volume leveler controller and controlling method are disclosed. In one embodiment, A volume leveler controller includes an audio content classifier for identifying the content type of an audio signal in real time; and an adjusting unit for adjusting a volume leveler in a continuous manner based on the content type as identified. The adjusting unit may configured to positively correlate the dynamic gain of the volume leveler with informative content types of the audio signal, and negatively correlate the dynamic gain of the volume leveler with interfering content types of the audio signal.
H03G 7/00 - Volume compression or expansion in amplifiers
G10L 21/0364 - Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
H03G 3/30 - Automatic control in amplifiers having semiconductor devices
H03G 3/32 - Automatic control in amplifiers having semiconductor devices the control being dependent upon ambient noise level or sound level
A quantization parameter signaling mechanism for both SDR and HDR content in video coding is described using two approaches. The first approach is to send the user-defined QpC table directly in high level syntax. This leads to more flexible and efficient QP control for future codec development and video content coding. The second approach is to signal luma and chroma QPs independently. This approach eliminates the need for QpC tables and removes the dependency of chroma quantization parameter on luma QP.
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/172 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
H04N 19/186 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
H04N 19/46 - Embedding additional information in the video signal during the compression process
95.
FILM GRAIN PARAMETERS ADAPTATION BASED ON VIEWING ENVIRONMENT
Methods, systems, and bitstream syntax are described for metadata signaling and film-grain parameter adaptation based on a viewing environment which may differ from a reference environment. Example adaptation models are provided for viewing parameters that include: ambient room illumination, viewing distance, and pixels per inch in a target display. Example systems include a single reference viewing environment model and a multi-reference viewing environment model supporting adaptation of film-grain model parameters via adaptation functions or interpolation.
H04N 19/80 - Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
H04N 19/85 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
Methods and apparatus for compressing and decompressing MPI videos. According to an example embodiment, a method of compressing an MPI video includes splitting the MPI video into first and second frame sequences including texture and alpha frames of multiplane images, respectively. The method further includes applying sets of preprocessing operations to convert the first frame sequence into a third frame sequence and to convert the second frame sequence into a fourth frame sequence. Example preprocessing operations include, but are not limited to, applying a fill process, thresholding RGB channels based on the corresponding alpha channel, blurring images, computing pixelwise difference values of frames, and computing pixelwise product values of frames. The method also includes applying video compression to the second frame sequence and to the fourth frame sequence.
H04N 19/85 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
H04N 19/119 - Adaptive subdivision aspects e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
H04N 19/136 - Incoming video signal characteristics or properties
H04N 19/172 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
H04N 19/182 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
H04N 19/463 - Embedding additional information in the video signal during the compression process by compressing encoding parameters before transmission
Methods are described to communicate source color volume information in a coded bitstream using SEI messaging. Such data include at least the minimum, maximum, and average luminance values in the source data plus optional data that may include the color volume x and y chromaticity coordinates for the input color primaries (e.g., red, green, and blue) of the source data, and the color x and y chromaticity coordinates for the color primaries corresponding to the minimum, average, and maximum luminance values in the source data. Messaging data signaling an active region in each picture may also be included.
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/14 - Coding unit complexity, e.g. amount of activity or edge presence estimation
H04N 19/186 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
H04N 19/20 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
Methods are described to communicate source color volume information in a coded bitstream using SEI messaging. Such data include at least the minimum, maximum, and average luminance values in the source data plus optional data that may include the color volume x and y chromaticity coordinates for the input color primaries (e.g., red, green, and blue) of the source data, and the color x and y chromaticity coordinates for the color primaries corresponding to the minimum, average, and maximum luminance values in the source data. Messaging data signaling an active region in each picture may also be included.
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/14 - Coding unit complexity, e.g. amount of activity or edge presence estimation
H04N 19/186 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
H04N 19/20 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
99.
ENCODING AND DECODING MULTIPLE-INTENT IMAGES AND VIDEO USING METADATA
Systems and methods for encoding and decoding multiple-intent images and video using metadata. When encoding an image as a multiple-intent image, at least one appearance adjustment may be made to the image. Metadata characterizing the at least one appearance adjustment may be included in, or transmitted along with, the encoded multiple-intent image. When decoding a multiple-intent image, a system may obtain a selection of a desired rendering intent and, based on that selection, either render the multiple-intent image with the applied appearance adjustments or may use the metadata to invert the appearance adjustments and recover the image pre-appearance adjustments.
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
100.
IMPROVING NOISE COMPENSATION IN MASK-BASED SPEECH ENHANCEMENT
Methods and apparatus for improving noise compensation in mask-based speech enhancement are described. A method of processing an audio signal, which includes one or more speech segments, includes obtaining a mask for mask-based speech enhancement of the audio signal and obtaining a magnitude of the audio signal. An estimate of residual noise is determined in the audio signal after mask-based speech enhancement, based on the mask and the magnitude of the audio signal. A modified mask is determined based on the estimate of the residual noise. Further described are corresponding programs and computer-readable storage media.
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
G10L 25/78 - Detection of presence or absence of voice signals