The disclosure relates to a method and device for processing object-based audio and channel-based audio. The method comprises receiving a first frame of audio of a first format; receiving a second frame of audio of a second format different from the first format, the second frame for playback subsequent to the first frame; decoding the first frame of audio into a decoded first frame; decoding the second frame of audio into a decoded second frame; and generating a plurality of output frames of a third format by performing rendering based on the decoded first frame and the decoded second frame. The first format may be an object-based audio format and the second format is a channel-based audio format or vice versa.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
2.
DETECTION AND ENHANCEMENT OF SPEECH IN BINAURAL RECORDINGS
Disclosed herein are method, systems, and computer-program products for segmenting a binaural recording of speech into parts containing self-speech and parts containing external speech, and processing each category with different settings, to obtain an enhanced overall presentation. The segmentation is based on a combination of: i) feature-based frame-by-frame classification, and ii) detecting dissimilarity by statistical methods. The segmentation information is then used by a speech enhancement chain, where independent settings are used to process the self- and external speech parts.
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
G10L 25/78 - Detection of presence or absence of voice signals
Overlapped block disparity estimation and compensation is described. Compensating for images with overlapped block disparity compensation (OBDC) involves determining if OBDC is enabled in a video bit stream, and determining if OBDC is enabled for one or more macroblocks that neighbor a first macroblock within the video bit stream. The neighboring macroblocks may be transform coded. If OBDC is enabled in the video bit stream and for the one or more neighboring macroblocks, predictions may be made for a region of the first macroblock that has an edge adjacent with the neighboring macroblocks. OBDC can be causally applied. Disparity compensation parameters or modes may be shared amongst views or layers. A variety of predictions may be used with causally-applied OBDC.
H04N 19/139 - Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
H04N 19/103 - Selection of coding mode or of prediction mode
H04N 19/105 - Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
H04N 19/152 - Data rate or code amount at the encoder output by measuring the fullness of the transmission buffer
H04N 19/159 - Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/182 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
H04N 19/184 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
H04N 19/30 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
H04N 19/46 - Embedding additional information in the video signal during the compression process
H04N 19/573 - Motion compensation with multiple frame prediction using two or more reference frames in a given prediction direction
H04N 19/577 - Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
H04N 19/583 - Motion compensation with overlapping blocks
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/61 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
4.
METHOD AND DEVICE FOR DECODING A HIGHER-ORDER AMBISONICS (HOA) REPRESENTATION OF AN AUDIO SOUNDFIELD
The invention discloses rendering sound field signals, such as Higher-Order Ambisonics (HOA), for arbitrary loudspeaker setups, where the rendering results in highly improved localization properties and is energy preserving. This is obtained by rendering an audio sound field representation for arbitrary spatial loudspeaker setups and/or by a decoder that decodes based on a decode matrix (D). The decode matrix (D) is based on smoothing and scaling of a first decode matrix {circumflex over (D)} with smoothing coefficients. The first decode matrix {circumflex over (D)} is based on a mix matrix G and a mode matrix {tilde over (Ψ)}, where the mix matrix G was determined based on L speakers and positions of a spherical modelling grid related to a HOA order N, and the mode matrix {tilde over (Ψ)} was determined based on the spherical modelling grid and the HOA order N.
A method is disclosed for delivering multi-source media content to a legacy client device via an edge proxy. The method includes determining that one or more criterion for a multi-source relay mode is met based on a determination that a client device requesting media content lacks support for decoding multi-source media data. In response thereto, the method may include instantiating, at a first server in communication with the client device, a multi-source media decoder associated with the client device. The method may also include (i) receiving, at the first server and concurrently from a plurality of multi-source media sources, multi-source media data corresponding to the media content; (ii) decoding, using the multi-source media decoder, the multi-source media data into uncoded media content data corresponding to the media content; and (iii) delivering at least a portion of the uncoded media content data from the first server to the client device.
Many portable playback devices cannot decode and playback encoded audio content having wide bandwidth and wide dynamic range with consistent loudness and intelligibility unless the encoded audio content has been prepared specially for these devices. This problem can be overcome by including with the encoded content some metadata that specifies a suitable dynamic range compression profile by either absolute values or differential values relative to another known compression profile. A playback device may also adaptively apply gain and limiting to the playback audio. Implementations in encoders, in transcoders and in decoders are disclosed.
G10L 19/22 - Mode decision, i.e. based on audio signal content versus external parameters
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
In some embodiments, virtualization methods for generating a binaural signal in response to channels of a multi-channel audio signal, which apply a binaural room impulse response (BRIR) to each channel including by using at least one feedback delay network (FDN) to apply a common late reverberation to a downmix of the channels. In some embodiments, input signal channels are processed in a first processing path to apply to each channel a direct response and early reflection portion of a single-channel BRIR for the channel, and the downmix of the channels is processed in a second processing path including at least one FDN which applies the common late reverberation. Typically, the common late reverberation emulates collective macro attributes of late reverberation portions of at least some of the single-channel BRIRs. Other aspects are headphone virtualizers configured to perform any embodiment of the method.
H04S 7/00 - Indicating arrangementsControl arrangements, e.g. balance control
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
H04S 3/00 - Systems employing more than two channels, e.g. quadraphonic
8.
METHODS, APPARATUS AND SYSTEMS FOR DECOMPRESSING A HIGHER ORDER AMBISONICS (HOA) SIGNAL
A method for compressing a HOA signal being an input HOA representation with input time frames (C(k)) of HOA coefficient sequences comprises spatial HOA encoding of the input time frames and subsequent perceptual encoding and source encoding. Each input time frame is decomposed (802) into a frame of predominant sound signals (XPS(k−1)) and a frame of an ambient HOA component ({tilde over (C)}AMB(k−1)). The ambient HOA component ({tilde over (C)}AMB(k−1)) comprises, in a layered mode, first HOA coefficient sequences of the input HOA representation (cn(k−1)) in lower positions and second HOA coefficient sequences (cAMB,n(k−1)) in remaining higher positions. The second HOA coefficient sequences are part of an HOA representation of a residual between the input HOA representation and the HOA representation of the predominant sound signals.
H04S 3/00 - Systems employing more than two channels, e.g. quadraphonic
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
G10L 19/24 - Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
H04S 7/00 - Indicating arrangementsControl arrangements, e.g. balance control
9.
A METHOD FOR IDENTIFYING AUDIO MATCH CUT CANDIDATES AND PERFORMING AUDIO MATCH CUTTING
An aspect of the present disclosure relates to a method for creating an audio match cut between a first audio clip and a second audio clip, each audio clip comprising a plurality of audio content samples. The method comprises obtaining a transition point for crossfading from the first audio clip into the second audio clip and determining a plurality of similarity metrics. Each similarity metric indicating a similarity between the audio content of a sample of the first audio clip and the audio content of a sample of the second audio clip, wherein the plurality of similarity metrics are determined in a transition context window. The method further comprises determining a variance of the plurality of similarity metrics, determining a crossfading length based on the variance and generating a match cut audio clip by crossfading between the first and second audio clip at the transition point.
G11B 27/10 - IndexingAddressingTiming or synchronisingMeasuring tape travel
G11B 27/28 - IndexingAddressingTiming or synchronisingMeasuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
G10H 1/00 - Details of electrophonic musical instruments
10.
METHOD FOR ESTIMATING BANDWIDTH BETWEEN A VIDEO SERVER AND A VIDEO CLIENT
A method for estimating bandwidth between a video server and a video client for playing back a video stream includes video packets. The method involves the following steps performed by the video client: receiving the video packets from the video server wherein the video packets are transmitted by the video server at predetermined time intervals; calculating a difference in a size of the video packets and a difference in a time of reception of the video packets; and estimating the bandwidth based on the calculated difference in size and the calculated difference in the time of reception. The present disclosure further relates to a media player configured to perform the disclosed method.
H04N 21/442 - Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed or the storage space available from the internal hard disk
H04N 21/462 - Content or additional data management e.g. creating a master electronic program guide from data received from the Internet and a Head-end or controlling the complexity of a video stream by scaling the resolution or bit-rate based on the client capabilities
H04N 21/845 - Structuring of content, e.g. decomposing content into time segments
Novel methods and systems are described for providing interactive motion blur on an image by motion inputs from movements of the mobile device displaying the image. The device can process the motion blur by modules providing motion blur parameter estimation, blur application, and image composition based on metadata and a baseline image from the encoder. A pre-loaded filter bank can provide blur kernels for blur application.
Disclosed are methods and systems which convert a multi-microphone input signal to a multichannel output signal making use of a time-and frequency-varying matrix. For each time and frequency tile, the matrix is derived as a function of a dominant direction of arrival and a steering strength parameter. Likewise, the dominant direction and steering strength parameter are derived from characteristics of the multi-microphone signals, where those characteristics include values representative of the inter-channel amplitude and group-delay differences.
H04R 1/40 - Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
Disclosed herein are techniques for generating sounds. In some examples, a method may involve receiving a user input comprising at least an input sound. The method may further involve extracting acoustic features of the input sound. The method may further involve determining a latent space for a target sound effect based at least in part on the user input. The method may further involve generating the target sound effect based on the input sound by providing the acoustic features of the input sound and the latent space for the target sound effect to a trained decoder network.
G10L 21/007 - Changing voice quality, e.g. pitch or formants characterised by the process used
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
G10H 5/00 - Instruments in which the tones are generated by means of electronic generators
Embodiments are disclosed for channel-based audio (CBA) (e.g., 22.2-ch audio) to object-based audio (OBA) conversion. The conversion includes converting CBA metadata to object audio metadata (OAMD) and reordering the CBA channels based on channel shuffle information derived in accordance with channel ordering constraints of the OAMD. The OBA with reordered channels is rendered in a playback device using the OAMD or in a source device, such as a set-top box or audio/video recorder. In an embodiment, the CBA metadata includes signaling that indicates a specific OAMD representation to be used in the conversion of the metadata. In an embodiment, pre-computed OAMD is transmitted in a native audio bitstream (e.g., AAC) for transmission (e.g., over HDMI) or for rendering in a source device. In an embodiment, pre-computed OAMD is transmitted in a transport layer bitstream (e.g., ISO BMFF, MPEG4 audio bitstream) to a playback device or source device.
Embodiments are directed to a companding method and system for reducing coding noise in an audio codec. A compression process reduces an original dynamic range of an initial audio signal through a compression process that divides the initial audio signal into a plurality of segments using a defined window shape, calculates a wideband gain in the frequency domain using a non-energy based average of frequency domain samples of the initial audio signal, and applies individual gain values to amplify segments of relatively low intensity and attenuate segments of relatively high intensity. The compressed audio signal is then expanded back to the substantially the original dynamic range that applies inverse gain values to amplify segments of relatively high intensity and attenuating segments of relatively low intensity. A QMF filterbank is used to analyze the initial audio signal to obtain a frequency domain representation.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
G10L 19/032 - Quantisation or dequantisation of spectral components
G10L 25/18 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
G10L 25/45 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of analysis window
H03G 7/00 - Volume compression or expansion in amplifiers
H04B 1/66 - Details of transmission systems, not covered by a single one of groups Details of transmission systems not characterised by the medium used for transmission for reducing bandwidth of signalsDetails of transmission systems, not covered by a single one of groups Details of transmission systems not characterised by the medium used for transmission for improving efficiency of transmission
16.
ADAPTIVE PROCESSING WITH MULTIPLE MEDIA PROCESSING NODES
Techniques for adaptive processing of media data based on separate data specifying a state of the media data are provided. A device in a media processing chain may determine whether a type of media processing has already been performed on an input version of media data. If so, the device may adapt its processing of the media data to disable performing the type of media processing. If not, the device performs the type of media processing. The device may create a state of the media data specifying the type of media processing. The device may communicate the state of the media data and an output version of the media data to a recipient device in the media processing chain, for the purpose of supporting the recipient device's adaptive processing of the media data.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
G10L 21/00 - Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
Example embodiments disclosed herein relate to audio signal loudness control. A method for controlling loudness of an audio signal is disclosed. The method includes responsive to determining presence of a noise signal, deriving a target partial loudness adjustment based, at least in part, on at least one of a first factor related to the noise signal and a second factor related to the audio signal. The method further includes determining a target partial loudness of the audio signal based, at least in part, on the target partial loudness adjustment. Corresponding system, apparatus and computer program product are also disclosed.
Some disclosed examples involve converting input audio data in an input audio format to output audio data in an output audio format. The output audio format may have a higher resolution than the input audio format. The converting may involve applying a biased decoding matrix to the input audio data. The biased decoding matrix may be biased according to an estimated energy distribution of the input audio data and may include a combination of constant matrices and a variable matrix. The variable matrix may be a covariance matrix corresponding to the input audio data. The biased decoding matrix may vary over time as a function of the covariance matrix.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
H04S 3/00 - Systems employing more than two channels, e.g. quadraphonic
H04S 5/00 - Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation
Methods and apparatus for image enhancement using implicit tensor-product B-spline (TPB) modeling. According to an example embodiment, a method for image enhancement includes partitioning an input image into a plurality of first patches; applying TPB modeling to each of the plurality of first patches to generate a respective plurality of implicit TPB models; generating a plurality of second patches using the respective pluralities of implicit TPB models, each of the second patches representing a respective one of the first patches; and combining the plurality of second patches to form an output image representing the input image, the output image having a higher resolution or a better noise metric than the input image.
G06T 3/4053 - Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
An apparatus and method of pre-conditioning audio for machine perception. Machine perception differs from human perception, and different processing parameters are used for machine perception applications (e.g., speech to text processing) as compared to those used for human perception applications (e.g., voice communications). These different parameters may result in pre-conditioned audio that is worsened for human perception yet improved for machine perception.
Volume leveler controller and controlling method are disclosed. In one embodiment, A volume leveler controller includes an audio content classifier for identifying the content type of an audio signal in real time; and an adjusting unit for adjusting a volume leveler in a continuous manner based on the content type as identified. The adjusting unit may configured to positively correlate the dynamic gain of the volume leveler with informative content types of the audio signal, and negatively correlate the dynamic gain of the volume leveler with interfering content types of the audio signal.
H03G 7/00 - Volume compression or expansion in amplifiers
G10L 21/0364 - Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
H03G 3/30 - Automatic control in amplifiers having semiconductor devices
H03G 3/32 - Automatic control in amplifiers having semiconductor devices the control being dependent upon ambient noise level or sound level
A quantization parameter signaling mechanism for both SDR and HDR content in video coding is described using two approaches. The first approach is to send the user-defined QpC table directly in high level syntax. This leads to more flexible and efficient QP control for future codec development and video content coding. The second approach is to signal luma and chroma QPs independently. This approach eliminates the need for QpC tables and removes the dependency of chroma quantization parameter on luma QP.
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/172 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
H04N 19/186 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
H04N 19/46 - Embedding additional information in the video signal during the compression process
A system utilizing a high throughput coding mode for CABAC in HEVC is described. The system may include an electronic device configured to obtain a block of data to be encoded using an arithmetic based encoder; to generate a sequence of syntax elements using the obtained block; to compare an Absolute-3 value of the sequence or a parameter associated with the Absolute-3 value to a preset value; and to convert the Absolute-3 value to a codeword using a first code or a second code that is different than the first code, according to a result of the comparison.
H04N 19/60 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
H03M 7/40 - Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
H04N 19/44 - Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/91 - Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
24.
METHODS, DEVICES AND SYSTEMS FOR PARALLEL VIDEO ENCODING AND DECODING
A method for decoding a video bitstream is disclosed. The method comprises: entropy decoding a first portion of a video bitstream, wherein first portion of video bitstream is associated with a video frame, thereby producing a first portion of decoded data; entropy decoding a second portion of video bitstream, wherein second portion of video bitstream is associated with video frame, thereby producing a second portion of decoded data, wherein entropy decoding second portion of video bitstream is independent of entropy decoding first portion of video bitstream; and reconstructing a first portion of video frame associated with video bitstream using first portion of decoded data and second portion of decoded data.
H04N 19/91 - Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
H04N 19/119 - Adaptive subdivision aspects e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
H04N 19/136 - Incoming video signal characteristics or properties
H04N 19/15 - Data rate or code amount at the encoder output by monitoring actual compressed data size at the memory before deciding storage at the transmission buffer
H04N 19/159 - Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
H04N 19/17 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
H04N 19/172 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
H04N 19/174 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
H04N 19/184 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
H04N 19/192 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive
H04N 19/40 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video transcoding, i.e. partial or full decoding of a coded input stream followed by re-encoding of the decoded output stream
H04N 19/43 - Hardware specially adapted for motion estimation or compensation
H04N 19/436 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
H04N 19/44 - Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
H04N 19/46 - Embedding additional information in the video signal during the compression process
H04N 19/61 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/80 - Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
25.
METHODS, APPARATUS AND SYSTEMS FOR 6DOF AUDIO RENDERING AND DATA REPRESENTATIONS AND BITSTREAM STRUCTURES FOR 6DOF AUDIO RENDERING
The present disclosure relates to methods, apparatus and systems for encoding an audio signal into a bitstream, in particular at an encoder, comprising: encoding or including audio signal data associated with 3DoF audio rendering into one or more first bitstream parts of the bitstream, and encoding or including metadata associated with 6DoF audio rendering into one or more second bitstream parts of the bitstream. The present disclosure further relates to methods, apparatus and systems for decoding an audio signal and audio rendering based on the bitstream.
H04S 7/00 - Indicating arrangementsControl arrangements, e.g. balance control
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
Methods, systems, and bitstream syntax are described for metadata signaling and film-grain parameter adaptation based on a viewing environment which may differ from a reference environment. Example adaptation models are provided for viewing parameters that include: ambient room illumination, viewing distance, and pixels per inch in a target display. Example systems include a single reference viewing environment model and a multi-reference viewing environment model supporting adaptation of film-grain model parameters via adaptation functions or interpolation.
H04N 19/80 - Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
H04N 19/85 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
Methods and apparatus for compressing and decompressing MPI videos. According to an example embodiment, a method of compressing an MPI video includes splitting the MPI video into first and second frame sequences including texture and alpha frames of multiplane images, respectively. The method further includes applying sets of preprocessing operations to convert the first frame sequence into a third frame sequence and to convert the second frame sequence into a fourth frame sequence. Example preprocessing operations include, but are not limited to, applying a fill process, thresholding RGB channels based on the corresponding alpha channel, blurring images, computing pixelwise difference values of frames, and computing pixelwise product values of frames. The method also includes applying video compression to the second frame sequence and to the fourth frame sequence.
H04N 19/85 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
H04N 19/119 - Adaptive subdivision aspects e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
H04N 19/136 - Incoming video signal characteristics or properties
H04N 19/172 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
H04N 19/182 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
H04N 19/463 - Embedding additional information in the video signal during the compression process by compressing encoding parameters before transmission
Methods are described to communicate source color volume information in a coded bitstream using SEI messaging. Such data include at least the minimum, maximum, and average luminance values in the source data plus optional data that may include the color volume x and y chromaticity coordinates for the input color primaries (e.g., red, green, and blue) of the source data, and the color x and y chromaticity coordinates for the color primaries corresponding to the minimum, average, and maximum luminance values in the source data. Messaging data signaling an active region in each picture may also be included.
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/14 - Coding unit complexity, e.g. amount of activity or edge presence estimation
H04N 19/186 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
H04N 19/20 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
Methods are described to communicate source color volume information in a coded bitstream using SEI messaging. Such data include at least the minimum, maximum, and average luminance values in the source data plus optional data that may include the color volume x and y chromaticity coordinates for the input color primaries (e.g., red, green, and blue) of the source data, and the color x and y chromaticity coordinates for the color primaries corresponding to the minimum, average, and maximum luminance values in the source data. Messaging data signaling an active region in each picture may also be included.
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/14 - Coding unit complexity, e.g. amount of activity or edge presence estimation
H04N 19/186 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
H04N 19/20 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
A method of processing a representation of a multichannel audio signal is provided. The representation includes a first channel and metadata relating to a second channel. The metadata includes, for each of a plurality of first bands of a first filter bank, a respective prediction parameter. The method includes: applying a second filterbank with a plurality of second bands to the first channel to obtain, for each second band, a banded version of the first channel; for each second band, generating a respective time-domain filter based on the prediction parameters and first filters corresponding to the first bands; and for each second band, generating a prediction for the second channel based on a filtered version of the first channel, the filtered version being obtained by applying the respective time-domain filter in that second band to the banded version of the first channel. Also provided are corresponding apparatus, programs, and computer-readable storage media.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
31.
ENCODING AND DECODING MULTIPLE-INTENT IMAGES AND VIDEO USING METADATA
Systems and methods for encoding and decoding multiple-intent images and video using metadata. When encoding an image as a multiple-intent image, at least one appearance adjustment may be made to the image. Metadata characterizing the at least one appearance adjustment may be included in, or transmitted along with, the encoded multiple-intent image. When decoding a multiple-intent image, a system may obtain a selection of a desired rendering intent and, based on that selection, either render the multiple-intent image with the applied appearance adjustments or may use the metadata to invert the appearance adjustments and recover the image pre-appearance adjustments.
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
32.
IMPROVING NOISE COMPENSATION IN MASK-BASED SPEECH ENHANCEMENT
Methods and apparatus for improving noise compensation in mask-based speech enhancement are described. A method of processing an audio signal, which includes one or more speech segments, includes obtaining a mask for mask-based speech enhancement of the audio signal and obtaining a magnitude of the audio signal. An estimate of residual noise is determined in the audio signal after mask-based speech enhancement, based on the mask and the magnitude of the audio signal. A modified mask is determined based on the estimate of the residual noise. Further described are corresponding programs and computer-readable storage media.
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
G10L 25/78 - Detection of presence or absence of voice signals
The present disclosure provides methods, devices and computer program products for encoding and decoding a stereo audio signal based on an input signal. According to the disclosure, a hybrid approach of using both parametric stereo coding and a discrete representation of the stereo audio signal is used which may improve the quality of the encoded and decoded audio for certain bitrates.
G10L 19/06 - Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
G10L 25/06 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being correlation coefficients
The present invention relates to transposing signals in time and/or frequency and in particular to coding of audio signals. More particular, the present invention relates to high frequency reconstruction (HFR) methods including a frequency domain harmonic transposer. A method and system for generating a transposed output signal from an input signal using a transposition factor T is described. The system comprises an analysis window of length La, extracting a frame of the input signal, and an analysis transformation unit of order M transforming the samples into M complex coefficients. M is a function of the transposition factor T. The system further comprises a nonlinear processing unit altering the phase of the complex coefficients by using the transposition factor T, a synthesis transformation unit of order M transforming the altered coefficients into M altered samples, and a synthesis window of length Ls, generating a frame of the output signal.
G10L 19/022 - Blocking, i.e. grouping of samples in timeChoice of analysis windowsOverlap factoring
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
G10L 19/24 - Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
G10L 21/038 - Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
The present invention relates to audio coding systems which make use of a harmonic transposition method for high frequency reconstruction (HFR). A system and a method for generating a high frequency component of a signal from a low frequency component of the signal is described. The system comprises an analysis filter bank providing a plurality of analysis subband signals of the low frequency component of the signal. It also comprises a non-linear processing unit to generate a synthesis subband signal with a synthesis frequency by modifying the phase of a first and a second of the plurality of analysis subband signals and by combining the phase-modified analysis subband signals. Finally, it comprises a synthesis filter bank for generating the high frequency component of the signal from the synthesis subband signal.
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
Methods, systems, and bitstream syntax are described for a file container that supports the storage and transmission of multi-plane images. Examples are provided for coding texture and opacity information using HEVC or VVC coding and the HEIF container. Examples of carrying coded MPI images according to V3C and an example HEIF-based player are also presented.
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/136 - Incoming video signal characteristics or properties
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
37.
DYNAMIC RANGE TEST CHART FOR DEVICES WITH A REFLECTIVE LENS
Test charts for testing the dynamic range of mobile phone cameras. One example dynamic range test chart includes test patches, each patch having a brightness value, and alignment features arranged in the corners of the test chart. The test patches include an outer plurality of test patches annularly arranged on a first side of the test chart, and an inner plurality of test patches annularly arranged on a second side of the test chart. A brightness value of a darkest test patch included in the outer plurality of test patches is the same as a brightness value of a brightest test patch included in the inner plurality of test patches. The test patches are positioned such that light from test patches that is reflected off a flat cover element of a camera-under-test lands on portions of the test chart outside the test patches.
Methods and systems are described for optimizing a coding for multiview video stream are described. In one embodiment, an encoder receives a video source with multiple views. The encoder computes disparity statistics for a plurality of stereo pairs of the multiple views of the video source, wherein the number of multiple views is greater than or equal to three. The encoder further encodes the video stream using the disparity statistics, wherein the encoded video stream includes the plurality of stereo pairs. The encoder transmits the encoded video stream.
H04N 19/46 - Embedding additional information in the video signal during the compression process
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
Described herein is a method of processing audio data for playback, the method including: receiving, by a decoder, a bitstream including encoded audio data and metadata, wherein the metadata includes one or more dynamic range control (DRC) sets, and for each DRC set, an indication of whether the DRC set is configured for providing a dynamic loudness compensation effect; parsing the metadata, by the decoder, to identify DRC sets that are configured for providing the dynamic range compensation effect; decoding, by the decoder, the encoded audio data to obtain decoded audio data; selecting, by the decoder, one of the identified DRC sets configured for providing the dynamic loudness compensation effect: extracting from the bitstream, by the decoder, one or more DRC gains corresponding to the selected DRC set; applying to the decoded audio data, by the decoder, the one or more DRC gains corresponding to the selected DRC set to obtain dynamic loudness compensated audio data; and outputting the dynamic loudness compensated audio data for playback. Moreover, described are respective decoder and computer program products.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
A method (1000) of generating a plurality of audio channels from audio containing height and non-height audio channels for playing back them with an immersive loudspeaker system of with at least one height loudspeaker (5) inside a listening environment, comprising: applying (1500) a virtual height filter (1300) to a portion of each height channel (1010) for, when playing back the height channel by one of the loudspeakers, attenuating spectral components of the height channel directly emanating from said loudspeaker (1;2;3;4) and for amplifying spectral components of the height channel reflected from a roof or an area close to the roof inside the listening environment, to generate a plurality of virtual height filtered audio signals which are added to the corresponding non-height audio channels for playback by corresponding loudspeakers; and playing back the remaining portions of each height audio channel with the at least one height loudspeaker (5).
H04S 7/00 - Indicating arrangementsControl arrangements, e.g. balance control
H04S 5/00 - Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation
42.
SOURCE SEPARATION AND REMIXING IN SIGNAL PROCESSING
The present disclosure relates to a method and audio processing system (1) for performing source separation. The method comprises obtaining (S1) an audio signal (Sin) including a mixture of speech content and noise content, determining (S2a, S2b, S2c), from the audio signal, speech content (formula A), stationary noise content (formula C) and non-speech content (formula B). The stationary noise content (formula C) is a true subset of the non-speech content (formula B) and the method further comprises determining (S3), based on a difference between the stationary noise content (formula C) and the non-speech content (formula B) a non-stationary noise content formula D), obtaining (S5) a set of weighting factors and forming (S6) a processed audio signal based on a combination of the speech content (formula A), the stationary noise content (formula C), and the non-stationary noise content (formula D) weighted with their respective weighting factor.
The present disclosure relates to a method and audio processing system (1) for performing source separation. The method comprises obtaining (S1) an audio signal (Sin) including a mixture of speech content and noise content, determining (S2a, S2b, S2c), from the audio signal, speech content (formula A), stationary noise content (formula C) and non-speech content (formula B). The stationary noise content (formula C) is a true subset of the non-speech content (formula B) and the method further comprises determining (S3), based on a difference between the stationary noise content (formula C) and the non-speech content (formula B) a non-stationary noise content formula D), obtaining (S5) a set of weighting factors and forming (S6) a processed audio signal based on a combination of the speech content (formula A), the stationary noise content (formula C), and the non-stationary noise content (formula D) weighted with their respective weighting factor.
(Ŝ1) formula A
The present disclosure relates to a method and audio processing system (1) for performing source separation. The method comprises obtaining (S1) an audio signal (Sin) including a mixture of speech content and noise content, determining (S2a, S2b, S2c), from the audio signal, speech content (formula A), stationary noise content (formula C) and non-speech content (formula B). The stationary noise content (formula C) is a true subset of the non-speech content (formula B) and the method further comprises determining (S3), based on a difference between the stationary noise content (formula C) and the non-speech content (formula B) a non-stationary noise content formula D), obtaining (S5) a set of weighting factors and forming (S6) a processed audio signal based on a combination of the speech content (formula A), the stationary noise content (formula C), and the non-stationary noise content (formula D) weighted with their respective weighting factor.
(Ŝ1) formula A
({circumflex over (N)}1) formula B
The present disclosure relates to a method and audio processing system (1) for performing source separation. The method comprises obtaining (S1) an audio signal (Sin) including a mixture of speech content and noise content, determining (S2a, S2b, S2c), from the audio signal, speech content (formula A), stationary noise content (formula C) and non-speech content (formula B). The stationary noise content (formula C) is a true subset of the non-speech content (formula B) and the method further comprises determining (S3), based on a difference between the stationary noise content (formula C) and the non-speech content (formula B) a non-stationary noise content formula D), obtaining (S5) a set of weighting factors and forming (S6) a processed audio signal based on a combination of the speech content (formula A), the stationary noise content (formula C), and the non-stationary noise content (formula D) weighted with their respective weighting factor.
(Ŝ1) formula A
({circumflex over (N)}1) formula B
({circumflex over (N)}2) formula C
The present disclosure relates to a method and audio processing system (1) for performing source separation. The method comprises obtaining (S1) an audio signal (Sin) including a mixture of speech content and noise content, determining (S2a, S2b, S2c), from the audio signal, speech content (formula A), stationary noise content (formula C) and non-speech content (formula B). The stationary noise content (formula C) is a true subset of the non-speech content (formula B) and the method further comprises determining (S3), based on a difference between the stationary noise content (formula C) and the non-speech content (formula B) a non-stationary noise content formula D), obtaining (S5) a set of weighting factors and forming (S6) a processed audio signal based on a combination of the speech content (formula A), the stationary noise content (formula C), and the non-stationary noise content (formula D) weighted with their respective weighting factor.
(Ŝ1) formula A
({circumflex over (N)}1) formula B
({circumflex over (N)}2) formula C
({circumflex over (N)}NS) formula D
The present disclosure relates to a method for designing a processor (20) and a computer implemented neural network. The method comprises obtaining input data and corresponding ground truth target data and providing the input data to a processor (20) for outputting a first prediction of target data given the input data. The method further comprises providing the latent variables output by a processor module (21: 1, 21: 2, . . . 21: n−1) to a supervisor module (22: 1, 22: 2, 22: 3, . . . 22: n−1) which outputs a second prediction of target data based on latent variables and determining a first and second loss measure by comparing the predictions of target data with the ground truth target data. The method further comprises training the processor (20) and the supervisor module (22: 1, 22: 2, 22: 3, . . . 22: n−1) based on the first and second loss measure and adjusting the processor by at least one of removing, replacing and adding a processor module.
Methods, systems, and media for determining user movement direction are provided. In some embodiments, a method involves obtaining, using a control system, user acceleration data associated with a user. The method involves determining, using the control system, a movement period associated with a movement activity of the user using the user acceleration data, wherein the movement period indicates a duration between two sequential movements by the user. The method involves determining, using the control system, a movement direction corresponding to the movement activity using the user acceleration data based on a direction of acceleration orthogonal to the movement direction in which at least a portion of the user acceleration data is anti-periodic over a period of time corresponding to the movement period.
Embodiments are described for a method of rendering audio for playback through headphones comprising receiving digital audio content, receiving binaural rendering metadata generated by an authoring tool processing the received digital audio content, receiving playback metadata generated by a playback device, and combining the binaural rendering metadata and playback metadata to optimize playback of the digital audio content through the headphones.
Disclosed herein are techniques for generating audio data. In some embodiments, a method for generating audio data involves receiving one or more input images for which corresponding audio is to be generated. The method may further involve generating, using a pre-trained visual feature extractor, a set of visual features corresponding to the one or more input images. The method may further involve providing the visual features to a trained domain mapping model configured to generate audio features corresponding to the visual features. The method may further involve generating the audio corresponding to the one or more input images by providing the audio features to a pre-trained audio decoder model.
The present document relates to audio coding systems which make use of a harmonic transposition method for high frequency reconstruction (HFR), and to digital effect processors, e.g. so-called exciters, where generation of harmonic distortion adds brightness to the processed signal. In particular, a system configured to generate a high frequency component of a signal from a low frequency component of the signal is described. The system may comprise an analysis filter bank (501) configured to provide a set of analysis subband signals from the low frequency component of the signal; wherein the set of analysis subband signals comprises at least two analysis subband signals; wherein the analysis filter bank (501) has a frequency resolution of Δf. The system further comprises a nonlinear processing unit (502) configured to determine a set of synthesis subband signals from the set of analysis subband signals using a transposition order P; wherein the set of synthesis subband signals comprises a portion of the set of analysis subband signals phase shifted by an amount derived from the transposition order P; and a synthesis filter bank (504) configured to generate the high frequency component of the signal from the set of synthesis subband signals; wherein the synthesis filter bank (504) has a frequency resolution of FΔf; with F being a resolution factor, with F≥1; wherein the transposition order P is different from the resolution factor F.
A method comprising acquiring a set of voltage signals from a set of electrodes arranged in proximity to the ears of a user, based on the set of voltage signals, determining an EOG gaze vector in ego-centric coordinates, determining a head pose of the user in display coordinates, using a sensor device worn by the user, combining the EOG gaze vector and head pose to obtain a gaze vector in display coordinates, and determining a gaze point by calculating an intersection of the gaze vector and an imaging surface having a known position in display coordinates.
G06F 3/01 - Input arrangements or combined input and output arrangements for interaction between user and computer
G06F 3/04845 - Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range for image manipulation, e.g. dragging, rotation, expansion or change of colour
G06F 3/0487 - Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
H04S 7/00 - Indicating arrangementsControl arrangements, e.g. balance control
49.
CONTROL OF SPEECH PRESERVATION IN SPEECH ENHANCEMENT
A method for performing denoising on audio signals is provided. In some implementations, the method involves determining an aggressiveness control parameter value that modulates a degree of speech preservation to be applied. In some implementations, the method involves obtaining a training set of training samples, a training sample having a noisy audio signal and a target denoising mask. In some implementations, the method involves training a machine learning model, wherein the trained machine learning model is usable to take, as an input, a noisy test audio signal and generate a corresponding denoised test audio signal, and wherein the aggressiveness control parameter value is used for: 1) generating a frequency domain representation of the noisy audio signals included in the training set: 2) modifying the target denoising masks: 3) determining an architecture of the machine learning model: or determining a loss during training of the machine learning model.
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
50.
PHOTO CODING OPERATIONS FOR DIFFERENT IMAGE DISPLAYS
A primary image of a first image format is encoded into an image file designated for the first image format. A non-primary image of a second image format is encoded into one or more attendant segments of the image file. The second image format is different from the first image format. A display image derived from a reconstructed image is caused to be rendered with a recipient device of the image file. The reconstructed image is generated from one of the primary image or the non-primary image.
H04N 19/46 - Embedding additional information in the video signal during the compression process
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 21/2343 - Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
Described herein is a method of performing content-aware audio processing for an audio signal comprising a plurality of audio components of different types. The method includes source separating the audio signal into at least a voice-related audio component and a residual audio component. The method further includes determining a dynamic audio gain based on the voice-related audio component and the residual audio component. The method also includes performing audio level adjustment for the audio signal based on the determined audio gain. Further described are corresponding apparatus, programs, and computer-readable storage media.
Reference images and source images in a working color space are determined. The reference images are derived from a reference camera. The source images are derived from a source camera. Initial gain values between the reference images and the source images are derived based on reference codeword values of the reference images and reference images. The initial gain values are adjusted into modified gain values based on results of noise characterization performed with the source images. The modified gain values are applied to the source images to generate leveled source images. Based on source-to-reference tone mappings, the leveled source images are converted to tone-matched source images.
The present invention relates to transposing signals in time and/or frequency and in particular to coding of audio signals. More particular, the present invention relates to high frequency reconstruction (HFR) methods including a frequency domain harmonic transposer. A method and system for generating a transposed output signal from an input signal using a transposition factor T is described. The system comprises an analysis window of length La, extracting a frame of the input signal, and an analysis transformation unit of order M transforming the samples into M complex coefficients. M is a function of the transposition factor T. The system further comprises a nonlinear processing unit altering the phase of the complex coefficients by using the transposition factor T, a synthesis transformation unit of order M transforming the altered coefficients into M altered samples, and a synthesis window of length Ls, generating a frame of the output signal.
G10L 19/022 - Blocking, i.e. grouping of samples in timeChoice of analysis windowsOverlap factoring
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocodersCoding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
G10L 19/24 - Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
G10L 21/038 - Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
Some methods for controlling a set of controllable actuators may involve obtaining actuator data for the set of controllable actuators and receiving object-based sensory data including a set of sensory objects. Some methods may involve rendering the object-based sensory data to produce actuator control signals, wherein the rendering is based at least in part on the actuator data. Some methods may involve obtaining playback environment data. The rendering may be based, at least in part, on the playback environment data. Some methods may involve providing, by the control system, the actuator control signals to one or more controllable actuators of the set of controllable actuators. The set of controllable actuators may include one or more light fixtures, one or more haptic devices, one or more air flow control devices, or combinations thereof.
Some disclosed examples involve receiving, by a control system, a content bitstream including encoded object-based sensory data, the encoded object-based sensory data including one or more sensory objects and corresponding sensory metadata, the encoded object-based sensory data corresponding to sensory effects including lighting, haptics, airflow, one or more positional actuators, or combinations thereof, to be provided by one or more sensory actuators in an environment. Some disclosed examples involve extracting, by the control system, object-based sensory metadata from the content bitstream and providing, by the control system, the object-based sensory metadata to a sensory renderer. In some examples, the content bitstream also may include one or more encoded audio objects and/or encoded video data synchronized with the encoded object-based sensory metadata.
Some methods involve receiving virtual world data, including virtual world object data corresponding to one or more virtual world objects, and receiving virtual world state data corresponding to a virtual world state. At least some of the virtual world data may correspond to virtual world object properties of one or more virtual world objects. Some methods involve generating object-based multi-sensory (MS) content based, at least in part, on the virtual world object data and the virtual world state data, and providing the object-based MS content to an MS renderer. Some methods involve rendering, by the MS renderer, the object-based MS content to control signals for one or more actuators residing in a real-world environment in which video data corresponding to a virtual world is being presented on one or more displays. The actuators may include one or more light fixtures, haptic devices, air flow control devices, or combinations thereof.
A63F 13/28 - Output arrangements for video game devices responding to control signals received from the game device for affecting ambient conditions, e.g. for vibrating players' seats, activating scent dispensers or affecting temperature or light
G06F 3/01 - Input arrangements or combined input and output arrangements for interaction between user and computer
Some methods may involve obtaining controllable actuator location information for a set of one or more controllable actuators in an environment, the controllable actuators including one or more light fixtures, one or more haptic sensors, one or more airflow control devices, or combinations thereof. Some methods may involve obtaining controllable actuator capability information for each controllable actuator of the set of controllable actuators and obtaining environment information corresponding to the environment. Some methods may involve generating, based at least in part on the controllable actuator location information, the controllable actuator capability information, and the environment information, an actuator-room response (ARR) that summarizes responses of the environment to controllable actuator activations. Some methods may involve modifying the ARR to produce an actuator map (AM). The modifying may involve regularization, filling gaps in the ARR, reducing one or more overlapping volumes of the environment affected by multiple actuator responses, or combinations thereof.
Light from an array of laser light sources are spread to cover the modulating face of a DMD or other modulator. The spread may be performed, for example, by a varying curvature array of lenslets, each laser light directed at one of the lenslets. Light from neighboring and/or nearby light sources overlap at a modulator. The lasers are energized at different energy/brightness levels causing the light illuminating the modulator to itself be modulated (locally dimmed). The modulator then further modulates the locally dimmed lights to produce a desired image. A projector according to the invention may utilize, for example, a single modulator sequentially illuminated or separate primary color modulators simultaneously illuminated.
G09G 3/34 - Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes for presentation of an assembly of a number of characters, e.g. a page, by composing the assembly by combination of individual elements arranged in a matrix by control of light from an independent source
H04N 9/31 - Projection devices for colour picture display
H04N 13/32 - Image reproducers for viewing without the aid of special glasses, i.e. using autostereoscopic displays using arrays of controllable light sourcesImage reproducers for viewing without the aid of special glasses, i.e. using autostereoscopic displays using moving apertures or moving light sources
G03H 1/02 - Holographic processes or apparatus using light, infrared, or ultraviolet waves for obtaining holograms or for obtaining an image from themDetails peculiar thereto Details
G03H 1/12 - Spatial modulation, e.g. ghost imaging
G09G 3/00 - Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes
60.
SYSTEM AND TOOLS FOR ENHANCED 3D AUDIO AUTHORING AND RENDERING
Improved tools for authoring and rendering audio reproduction data are provided. Some such authoring tools allow audio reproduction data to be generalized for a wide variety of reproduction environments. Audio reproduction data may be authored by creating metadata for audio objects. The metadata may be created with reference to speaker zones. During the rendering process, the audio reproduction data may be reproduced according to the reproduction speaker layout of a particular reproduction environment.
H04S 7/00 - Indicating arrangementsControl arrangements, e.g. balance control
H04R 5/02 - Spatial or constructional arrangements of loudspeakers
H04S 3/00 - Systems employing more than two channels, e.g. quadraphonic
H04S 5/00 - Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation
A method for estimating a user's location in an environment may involve receiving output signals from each microphone of a plurality of microphones in the environment. At least two microphones of the plurality of microphones may be included in separate devices at separate locations in the environment and the output signals may correspond to a current utterance of a user. The method may involve determining multiple current acoustic features from the output signals of each microphone and applying a classifier to the multiple current acoustic features. Applying the classifier may involve applying a model trained on previously-determined acoustic features derived from a plurality of previous utterances made by the user in a plurality of user zones in the environment. The method may involve determining, based at least in part on output from the classifier, an estimate of the user zone in which the user is currently located.
G10L 21/0264 - Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
H04R 1/40 - Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
A method of compensating for environmental noise during a teleconference may involve: estimating, by a control system, a current speech spectrum corresponding to speech of remote teleconference participants; estimating, by the control system, a current noise spectrum corresponding to environmental noise in a local environment in which a local teleconference participant is located; calculating, by the control system, a current speech intelligibility index (SII) based, at least in part, on the current speech spectrum and the current noise spectrum; determining, by the control system and based at least in part on the current SII, whether to make an adjustment of a local audio system used by the local teleconference participant, wherein the determining involves evaluating the current SII according to one or more target SII parameters; and updating at least one of the one or more target SII parameters responsive to user input corresponding to a playback volume change.
H04M 9/08 - Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
H03G 3/32 - Automatic control in amplifiers having semiconductor devices the control being dependent upon ambient noise level or sound level
G10L 25/60 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
G10L 21/0364 - Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
63.
AUDIO ENCODING AND DECODING USING PRESENTATION TRANSFORM PARAMETERS
A method for encoding an input audio stream including the steps of obtaining a first playback stream presentation of the input audio stream intended for reproduction on a first audio reproduction system, obtaining a second playback stream presentation of the input audio stream intended for reproduction on a second audio reproduction system, determining a set of transform parameters suitable for transforming an intermediate playback stream presentation to an approximation of the second playback stream presentation, wherein the transform parameters are determined by minimization of a measure of a difference between the approximation of the second playback stream presentation and the second playback stream presentation, and encoding the first playback stream presentation and the set of transform parameters for transmission to a decoder.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
Described herein is a method of processing media content. The method includes receiving the media content and side information for the media content; and generating or modifying an ISO Base Media File Format, ISOBMFF, file associated with the media content based on the side information. The side information includes rating information relating to a playback restriction associated with the media content. The ISOBMFF file includes a rating information specific box, the rating information specific box including a representation of the rating information. Further described are a respective method of processing an ISOBMFF file, respective apparatus and computer program products.
In a method to improve backwards compatibility when decoding high-dynamic range images coded in a wide color gamut (WCG) space which may not be compatible with legacy color spaces, hue and/or saturation values of images in an image database are computed for both a legacy color space (say, YCbCr-gamma) and a preferred WCG color space (say, IPT-PQ). Based on a cost function, a reshaped color space is computed so that the distance between the hue values in the legacy color space and rotated hue values in the preferred color space is minimized. HDR images are coded in the reshaped color space. Legacy devices can still decode standard dynamic range images assuming they are coded in the legacy color space, while updated devices can use color reshaping information to decode HDR images in the preferred color space at full dynamic range.
H04N 19/82 - Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
H04N 19/46 - Embedding additional information in the video signal during the compression process
H04N 19/85 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
H04N 19/87 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving scene cut or scene change detection in combination with video compression
The present document describes a method (200) for determining N coefficients of an asymmetric prototype filter p0 for use in a low delay M-channel analysis and/or synthesis filter bank (101, 102) comprising M analysis filters hk (103) and/or M synthesis filters fk(106), k=0, . . . , M−1, wherein M is greater than 1, and wherein subband signals which are processed by the analysis and/or synthesis filter bank (101, 102) are decimated by a decimation factor S, with S
Methods, systems, and bitstream syntax are described for determining a preferred processing order of metadata messaging, such as supplemental enhancement information (SEI) messaging in MPEG video coding. Examples are provided to address issues related to backwards compatibility with legacy systems. For example, proposed messaging allows encoders to isolate messages critical to a decoder's implementation and assign importance to each SEI message so that backwards compatibility is preserved.
H04N 19/46 - Embedding additional information in the video signal during the compression process
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/85 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
Some disclosed methods involve obtaining sensor data from a sensor system during a content presentation and estimating user response events based on the sensor data. Some disclosed methods involve producing user attention analytics based at least in part on estimated user response events corresponding with estimated user attention to content intervals of the content presentation. Some disclosed methods involve causing the content presentation to be altered based, at least in part, on the user attention analytics and causing an altered content presentation to be provided.
H04N 21/442 - Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed or the storage space available from the internal hard disk
H04N 21/8405 - Generation or processing of descriptive data, e.g. content descriptors represented by keywords
H04N 21/845 - Structuring of content, e.g. decomposing content into time segments
H04N 21/8549 - Creating video summaries, e.g. movie trailer
G06F 3/01 - Input arrangements or combined input and output arrangements for interaction between user and computer
H04H 60/65 - Arrangements for services using the result of monitoring, identification or recognition covered by groups or for using the result on users' side
70.
METHODS, APPARATUS AND SYSTEMS FOR SCENE BASED AUDIO MONO DECODING
Audio signal encoding and decoding methods are disclosed herein. Some disclosed methods for encoding an audio signal involve obtaining an audio signal that represents an input audio scene with a primary channel and side channels, analyzing the power of the primary channel and analyzing the powers of the side channels. Some such methods involve detecting a mono mode for encoding the audio signal based on analyzing the power of the primary channel and the powers of the side channels and computing one or more downmix channels and spatial metadata from the audio signal for a detected mono mode. Some such methods involve encoding the one or more downmix channels and spatial metadata in a bitstream for the detected mono mode and indicating the mono mode in the bitstream.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
G10L 19/20 - Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding
71.
ARCHITECTURE FOR INTERACTIVE MULTI-VIEW MULTIPLANE-IMAGING VIDEO STREAMING
Methods and apparatus for multi-plane-image (MPI) video streaming. According to an example embodiment, a method of video streaming includes providing to a client device a media presentation description of an MPI streaming content stored in a storage container accessible via a server device and a respective initialization segment from the storage container. The respective initialization segment is configured to inform a selection of views of the MPI streaming content for which to request media segments for rendering. The method also includes receiving, from the client device, a request identifying the selection and indicating a respective recommended value of at least one of a bit rate, a resolution, a codec type, and a frame rate and transmitting to the client device one or more bitstreams carrying the media segments selected in the storage container based on the identified selection and further based on one or more of the respective recommended values.
H04N 21/218 - Source of audio or video content, e.g. local disk arrays
H04N 21/2343 - Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
H04N 21/235 - Processing of additional data, e.g. scrambling of additional data or processing content descriptors
H04N 21/6587 - Control parameters, e.g. trick play commands or viewpoint selection
Methods and apparatus for multiplane-imaging (MPI) video streaming. According to an example embodiment, a method for streaming an MPI video includes generating a sequence of video frames, each of the video frames including a respective plurality of patches representing texture and transparency layers of one or more multiplane images of the MPI video and applying video compression to the sequence of video frames to generate a video sub-stream. The method also includes generating a sequence of representations of atlas frames corresponding to the sequence of video frames to specify at least a packing arrangement of the patches and applying compression to the sequence of representations to generate an atlas sub-stream. The method further includes multiplexing the video sub-stream and the atlas sub-stream to generate a first coded bitstream encoding at least a portion of the MPI video.
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 21/218 - Source of audio or video content, e.g. local disk arrays
H04N 21/2343 - Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
H04N 21/235 - Processing of additional data, e.g. scrambling of additional data or processing content descriptors
H04N 21/6587 - Control parameters, e.g. trick play commands or viewpoint selection
The disclosed systems and methods include a context detection module that detects a current context of an environment of a mobile device. Audio and video processing of audio and images captured by a microphone and camera of the device, respectively, in the environment is determined based on the detected context. The context detection module contains at least one audio classifier and at least one visual classifier. In some embodiments, the context detection module can be extended to use sensor information, in place of or in addition to, the audio and visual information. The captured audio, visual and sensor information are aligned on a time axis based on outputs of the audio classifier, the visual classifier and timestamps associated with the sensor information. One or more fusion methods are used to combine the context detection results.
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
G06V 10/00 - Arrangements for image or video recognition or understanding
H04M 1/72454 - User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions according to context-related or environment-related conditions
Methods, systems, and bitstream syntax are described for video coding and decoding using source picture timing information which is captured by an encoder and is signaled as metadata to a decoder to assist in decoding. The proposed methods include example syntax for signaling source picture timing metadata as supplemental enhancement information (SEI) messaging for both single-layer and multi-layer video sequences.
H04N 19/172 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
H04N 19/31 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the temporal domain
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
A spatial coding method and audio system configured to reduce the complexity of an audio scene via audio-object clustering. In at least some examples, the spatial coding method is implemented using only a limited set of basic matrix operations, which tends to significantly reduce the associated computational complexity. For example, the spatial coding method employs a cost-matrix construction approach, under which the object inter-product matrix is constructed first and then a plurality of cost matrices is derived therefrom by decimation and addition, with no advanced computational operations, such as multiplications or divisions, being needed. At least some embodiments can beneficially be used for reduction, simplification, or compression of complex audio content, with minimal impact on the audio quality, such that the audio content can be distributed through transmission systems that do not possess sufficient bandwidth to timely deliver all of the original audio-object data to the end users.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
H04S 3/00 - Systems employing more than two channels, e.g. quadraphonic
H04S 7/00 - Indicating arrangementsControl arrangements, e.g. balance control
G10L 19/20 - Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding
Techniques for enhancing audio signals are provided herein. In some embodiments, a method for enhancing audio segments may involve receiving an input audio segment to be enhanced. The method may further involve generating features indicative of a type of audio content of the input audio segment. The method may further involve generating features indicative of noise present in the input audio segment by providing at least the features indicative of the type of audio content to a trained denoising feature extraction model. The method may further involve generating a denoising mask based on the features indicative of noise present in the input audio segment. The method may further involve generating an enhanced audio segment at least by applying the denoising mask to the input audio segment.
G10L 21/0216 - Noise filtering characterised by the method used for estimating noise
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
Methods and systems are described for video coding and decoding using intra affine prediction. Two or three control point best vectors (CPBV) to be used to derive motion vectors of affine predictions are generated based on creating for each CPBV a list of candidate best vectors, and selecting according to a criterion the two or three best ones. Methods to generate the list of the candidate best vectors include a first method based on neighboring coded units and a distance criterion among two consecutive best vectors in the list, and a second method based on template matching between a coded unit to be encoded or decoded and prior-decoded coded units in a reference area.
H04N 19/102 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
H04N 19/159 - Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
Methods and corresponding systems to process face regions are disclosed. The described methods include providing face bounding boxes and confidence levels for the faces, generating a histogram of the pixels and the faces, generating a probability of face, and generating a face probability map. A face contrast adjustment and a face saturation adjustment can be applied to the face probability map.
A noise management method and a noise management apparatus are provided. The noise management method includes: performing audio event segmentation on an input audio signal to generate content-aware segmentation information (504); estimating noise floor levels associated with the input audio signal based on the content-aware segmentation information and further based on sorting frequency-bin data corresponding to a fixed-length portion of the input audio signal (506); applying noise suppression to the input audio signal to generate an output audio signal, the noise suppression being performed in frequency bins having a selected frequency resolution and being based at least in part on the content-aware segmentation information and the estimated noise floor levels (508).
A computer-implemented method for encoding video data according to predicted quality values generated by machine learning includes providing a target video data package to a neural network to generate a plurality of predicted quality values for the target video data package, each of the plurality of predicted quality values associated with a different set of target encoding parameters from a range of encoding parameters, the neural network trained using training data including a plurality of reference video data packages and reference quality values calculated for each reference video data package as encoded according to different sets of reference encoding parameters from the range of encoding parameters, setting target encoding parameters for the target video data package based on the plurality of predicted quality values, and sending a control signal to an encoder module to encode the target video data package using the target encoding parameters.
H04N 19/154 - Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
H04N 19/189 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
H04N 19/147 - Data rate or code amount at the encoder output according to rate distortion criteria
H04N 19/149 - Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
81.
CODING TECHNIQUES AND METADATA FOR VIDEO COMMUNICATIONS USING GENERATIVE FACE VIDEO
Methods, systems, and metadata for video communications using generative face video (GFV) are described. In encoder, a GFV bitstream is generating comprising multiplexed coded face video pictures and GFV metadata. Using supplemental enhancement information (SEI), a GFV SEI message comprises syntax elements describing face features and at least one or more of: presence of a single or multiple faces, spatial sampling, temporal sampling, primary code picture characteristics and driving-picture handling, background handling, persistence of SEI, and compression parameters for face features. In a decoder, the decoder combines information extracted from the GFV metadata and the decoded face video pictures to generate a reconstructed output video.
H04N 19/17 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
H04N 19/587 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal sub-sampling or interpolation, e.g. decimation or subsequent interpolation of pictures in a video sequence
H04N 19/59 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
H04N 19/136 - Incoming video signal characteristics or properties
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/46 - Embedding additional information in the video signal during the compression process
H04N 21/80 - Generation or processing of content or additional data by content creator independently of the distribution processContent per se
82.
METHODS, APPARATUS AND SYSTEMS FOR CONTROLLING DOPPLER EFFECT MODELLING
Described is a method of modelling a Doppler effect when rendering audio content for a 6 degrees of freedom (6DoF) environment on a user side. In particular, the method may comprise obtaining first parameter values of one or more first parameters indicative of an allowable range of pitch factor modification values. The method may further comprise obtaining a second parameter value of a second parameter indicative of a desired strength of the to-be-modelled Doppler effect. The method may yet further comprise determining a pitch factor modification value based on a relative velocity between a listener and an audio source in the audio content, and the first and second parameter values, using a predefined pitch factor modification function. Particularly, the predefined pitch factor modification function may have the first and second parameters and may be a function for mapping relative velocities to pitch factor modification values. Finally, the method may comprise rendering the audio source based on the pitch factor modification value.
This disclosure falls into the field of audio coding, in particular it is related to the field of providing a framework for providing loudness consistency among differing audio output signals. In particular, the disclosure relates to methods, computer program products and apparatus for encoding and decoding of audio data bitstreams in order to attain a desired loudness level of an output audio signal.
G10L 19/24 - Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
Methods and systems for designing binaural room impulse responses (BRIRs) for use in headphone virtualizers, and methods and systems for generating a binaural signal in response to a set of channels of a multi-channel audio signal, including by applying a BRIR to each channel of the set, thereby generating filtered signals, and combining the filtered signals to generate the binaural signal, where each BRIR has been designed in accordance with an embodiment of the design method. Other aspects are audio processing units configured to perform any embodiment of the inventive method. In accordance with some embodiments, BRIR design is formulated as a numerical optimization problem based on a simulation model (which generates candidate BRIRs) and at least one objective function (which evaluates each candidate BRIR), and includes identification of a best one of the candidate BRIRs as indicated by performance metrics determined for the candidate BRIRs by each objective function.
A video encoding method according to an embodiment of the present invention includes generating header information that includes information about resolutions of motion vectors of respective blocks, determined based on motion prediction for a unit image. Here, the header information includes flag information indicating whether resolutions of all motion vectors included in the unit image are integer-pixel resolutions. Further, a video decoding method according to another embodiment of the present invention includes extracting information about resolutions of motion vectors of each unit image from header information included in a target bitstream to be decoded; and a decoding unit for decoding the unit image based on the resolution information. Here, the header information includes flag information indicating whether resolutions of all motion vectors included in the unit image are integer-pixel resolutions.
H04N 19/105 - Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
H04N 19/136 - Incoming video signal characteristics or properties
H04N 19/17 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
H04N 19/27 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding involving both synthetic and natural picture components, e.g. synthetic natural hybrid coding [SNHC]
H04N 19/50 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
H04N 19/51 - Motion estimation or motion compensation
H04N 19/52 - Processing of motion vectors by encoding by predictive encoding
H04N 19/523 - Motion estimation or motion compensation with sub-pixel accuracy
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
86.
LUMINANCE BASED CODING TOOLS FOR VIDEO COMPRESSION
Sample data and metadata related to spatial regions in images may be received from a coded video signal. It is determined whether specific spatial regions in the images correspond to a specific region of luminance levels. In response to determining the specific spatial regions correspond to the specific region of luminance levels, signal processing and video compression operations are performed on sets of samples in the specific spatial regions. The signal processing and video compression operations are at least partially dependent on the specific region of luminance levels.
H04N 19/186 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
H04N 19/117 - Filters, e.g. for pre-processing or post-processing
H04N 19/136 - Incoming video signal characteristics or properties
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/196 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters
87.
CONTENT-AWARE REAL-TIME LEVEL MANAGEMENT OF AUDIO CONTENT
Systems and methods for content-aware real-time level management of audio content. One example provides a method for real-time level management of audio content, the method comprising receiving an input audio signal; generating a short-time signal based on the input audio signal, identifying, with a sound source classifier, one or more sources of interest associated with the short-time signal, estimating, with a long-term energy estimator, a long-term Root Mean Squared energy (long-term RMS) for respective sources of interest of the one or more sources of interest, estimating, with an automatic gain control, a set of short-time gains based at least in part on the long-term RMS of at least one of the respective sources of interest, and applying the short-time gain to the short-time signal.
This disclosure falls into the field of audio coding, in particular it is related to the field of providing a framework for providing loudness consistency among differing audio output signals. In particular, the disclosure relates to methods, computer program products and apparatus for encoding and decoding of audio data bitstreams in order to attain a desired loudness level of an output audio signal.
G10L 19/24 - Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
An input image represented in an input domain is received from an input video signal. Forward reshaping is performed on the input image to generate a forward reshaped image represented in a reshaped image domain. Non-reshaping encoding operations are performed to encode the reshaped image into an encoded video signal. At least one of the non-reshaping encoding operations is implemented with an ML model that has been previously trained with training images in one or more training datasets in a preceding training stage. A recipient device of the encoded video signal is caused to generate a reconstructed image from the forward reshaped image.
H04N 19/517 - Processing of motion vectors by encoding
H04N 19/117 - Filters, e.g. for pre-processing or post-processing
H04N 19/82 - Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
90.
METHODS, APPARATUS, AND SYSTEMS FOR PROCESSING AUDIO SCENE INFORMATION
The disclosure relates to methods of processing audio scene information. One such method comprises: obtaining a voxel-based audio scene representation of an audio scene; for first and second locations in a two-dimensional voxel grid, successively encoding items of path information, each item of path information specifying a first location, a second location, a path length of an acoustic path, and a corner voxel on the acoustic path; and for a current item of path information, generating an encoded item of path information based on the item of path information. The encoded item of path information includes an indication of the respective first and second locations. If the corner voxel specified by the current item of path information is different from a comer voxel specified by a preceding item of path information, the encoded item of path information includes an indication of the corner voxel; If the comer voxel specified by the current item of path information is the same as the comer voxel specified by the preceding item of path information, the encoded item of path information includes an indication that the comer voxel specified by the current item of path information is the same as the corner voxel specified by the preceding item of path information, instead of the indication of the comer voxel. The disclosure further relates to corresponding apparatus, computer programs, and computer- readable storage media.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
H04S 7/00 - Indicating arrangementsControl arrangements, e.g. balance control
H03M 7/30 - CompressionExpansionSuppression of unnecessary data, e.g. redundancy reduction
Methods and apparatus implementing error protection for neural field streaming. According to an example embodiment, a method of neural-network streaming includes applying progressive coding to a plurality of coefficients representing a neural field to determine a priority order of coding units in a first bitstream including a base layer and one or more enhancement layers and generating a second bitstream by applying entropy coding to the coding units of the first bitstream. The second bitstream is an embedded bitstream having a plurality of data units thereof arranged in the priority order. The method also includes assigning subsets of the plurality of data units to packets of a packet set based on the priority order and generating an output stream of coded packets by applying rateless coding to the packet set, the output stream being a rateless embedded bitstream.
H04N 19/115 - Selection of the code volume for a coding unit prior to coding
H04N 19/147 - Data rate or code amount at the encoder output according to rate distortion criteria
H04N 19/154 - Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
H04N 19/169 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
H04N 19/187 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scalable video layer
H04N 19/30 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
H04N 19/37 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability with arrangements for assigning different transmission priorities to video input data or to video coded data
An audio processing method may involve receiving audio signals and associated spatial data, listener position data, loudspeaker position data and loudspeaker orientation data, and rendering the audio data for reproduction, based, at least in part, on the spatial data, the listener position data, the loudspeaker position data and the loudspeaker orientation data, to produce rendered audio signals. The rendering may involve applying a loudspeaker orientation factor that tends to reduce a relative activation of a loudspeaker based, at least in part, on an increased loudspeaker orientation angle. In some examples, the rendering may involve modifying an effect of the loudspeaker orientation factor based, at least in part, on a loudspeaker importance metric. The loudspeaker importance metric may correspond to a loudspeaker's importance for rendering an audio signal at the audio signal's intended perceived spatial position.
Methods and systems are described for processing a stereo video stream with one or more post-processing factor that can adjust the sense of depth for a user of the stereo video stream in a head mounted display. In one embodiment, a decoder receives a stream that include the stereo video stream and disparity statistics for the stream. The decoder demultiplexes the stream to recover the stereo video stream and the disparity statistics. Using the disparity statistics, the decoder computes a post-processing factor that can be one or more of a scale and a shift factor. The decoder applies the post-processing factor to the decoder stereo stream, where the resulting stereo video stream is provides the desired level of sense of depth for the user of the head mounted display.
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
95.
IMAGE SEGMENTATION AND FEATURE EXTRACTION USING NEURAL NETWORKS AND A SPATIO-TEMPORAL FILTER
Methods and systems for image-segmentation are described. Given an input sequence of video pictures, a feature extraction neural network generates from early layers a preliminary set of feature maps at full resolution. The output of the feature extraction network is down-sampled before being fed to a segmentation neural network which generates a source-segmentation map at a lower resolution. After upsampling the source-segmentation map at the full resolution, given filtering kernels that are based on the preliminary set of feature maps, a spatiotemporal trilateral filter is applied to the upsampled source-segmentation map to generate an output segmentation map.
Methods for dialogue enhancing audio content, comprising providing a first audio signal presentation of the audio components, providing a second audio signal presentation, receiving a set of dialogue estimation parameters configured to enable estimation of dialogue components from the first audio signal presentation, applying said set of dialogue estimation parameters to said first audio signal presentation, to form a dialogue presentation of the dialogue components; and combining the dialogue presentation with said second audio signal presentation to form a dialogue enhanced audio signal presentation for reproduction on the second audio reproduction system, wherein at least one of said first and second audio signal presentation is a binaural audio signal presentation.
H04S 3/00 - Systems employing more than two channels, e.g. quadraphonic
H04S 3/02 - Systems employing more than two channels, e.g. quadraphonic of the matrix type, i.e. in which input signals are combined algebraically, e.g. after having been phase shifted with respect to each other
H04S 7/00 - Indicating arrangementsControl arrangements, e.g. balance control
97.
METHOD FOR DATA RATE AND BUFFER ESTIMATION FOR MULTI-SOURCE DELIVERY
The present disclosure relates to a method and variable quality playback system for selecting a quality of media content. The method comprising receiving (S4001) media content of a data segment (1010) over at least one network path (1031a, 1031b, 1031c), the media content being encoded with network or application-layer code and storing (S4002) the media content in a network or application-layer decoder (1050). The network or application-layer decoder (1050) is configured to decode the media content and provide decoded media content to a buffer (5061) associated with a media renderer (1060). The method further comprises obtaining a decoding metric of the network or application-layer decoder (1050), the decoding metric indicating a property of the decoding process and selecting the quality of the media content of subsequent data segments based on the decoding metric.
H04N 21/44 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
H04N 21/442 - Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed or the storage space available from the internal hard disk
H04N 21/462 - Content or additional data management e.g. creating a master electronic program guide from data received from the Internet and a Head-end or controlling the complexity of a video stream by scaling the resolution or bit-rate based on the client capabilities
98.
CONTEXT-DEPENDENT COLOR-MAPPING OF IMAGE AND VIDEO DATA
Systems and methods for performing color mapping operations. One system includes a processor to perform post-production editing of image data. The processor is configured to identify a first region of an image and identify a second region of the image. The first region includes a first white point having a first tone, and the second region includes a second white point having a second tone. The processor is further configured to determine a color mapping function based on the first tone, apply the color mapping function to the second region of the image, and generate an output image.
Methods are described for determining thresholds to skip entropy coding of latent features in image and video coding using a neural network. A threshold based on a mean of estimated standard deviations of all latents is proposed. For autoregressive neural networks, to avoid drift between context model parameter estimates computed during training and inference, it is proposed to have two separate entropy parameter estimation networks: an initial entropy-estimation network, to be used in entropy skip encoding and decoding of quantized latents, and a refined entropy-estimation network, to be used in arithmetic encoding and decoding of the quantized latents. Methods and systems for multi-stage entropy skipping are also proposed.
A method for decoding video includes receiving quantized coefficients representative of a block of video representative of a plurality of pixels. The quantized coefficients are dequantized based upon a function of a remainder. The dequantized coefficients are inverse transformed to determine a decoded residue.
H04N 19/132 - Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/182 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
H04N 19/184 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
H04N 19/44 - Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder