Methods, apparatus, systems and articles of manufacture are disclosed to identify media based on historical data. An example method includes: comparing (a) a pitch shifted fingerprint, (b) a time shifted fingerprint, or (c) a resampled fingerprint to a reference fingerprint; in response to a match between any of (a) the pitch shifted fingerprint, (b) the time shifted fingerprint, or (c) the resampled fingerprint and the reference fingerprint, generating indications of (a) a pitch shift value, (b) a time shift value, or (c) a resample ratio that caused the match; in response to collecting broadcast media for a threshold period of time, processing the one or more indications; and in response to a request for a recommendation for information associated with a query, transmitting the recommendation including one or more frequencies of occurrence of (a) the pitch shift value, (b) the time shift value, or (c) the resample ratio.
G06F 16/683 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
Example systems and methods are audio identification based on data structure are disclosed. An example apparatus includes memory, and one or more processors to execute instructions to execute a constant Q transform on query time slices of query audio, binarize the constant Q transformed query time slices, execute a two-dimensional Fourier transform on query time windows within the binarized and constant Q transformed query time slices to generate two-dimensional Fourier transforms of the query time windows, sequentially order the two-dimensional Fourier transforms in a query data structure, and identify the query audio as a cover rendition of reference audio based on a comparison between the query data structure and a reference data structure associated with the reference audio.
G06F 16/68 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
G06F 16/61 - Indexing; Data structures therefor; Storage structures
G06F 17/14 - Fourier, Walsh or analogous domain transformations
G10L 25/27 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
3.
Methods and Apparatus for Efficient Media Indexing
Methods, apparatus, systems and articles of manufacture are disclosed for efficient media indexing. An example method disclosed herein includes means for initiating a list of hash seeds, the list of hash seeds including at least a first hash seed value and a second hash seed value among other hash seed values, means for generating to generate a first bucket distribution based on the first hash seed value and a first hash function and generate a second bucket distribution based on the second hash seed value used in combination with the first hash seed value, means for determining to determine a first entropy value of the first bucket distribution, wherein data associated with the first bucket distribution is stored in a first hash table and determine a second entropy value of the second bucket distribution.
Techniques of providing motion video content along with audio content are disclosed. In some example embodiments, a computer-implemented system is configured to perform operations comprising: receiving primary audio content; determining that at least one reference audio content satisfies a predetermined similarity threshold based on a comparison of the primary audio content with the at least one reference audio content; for each one of the at least one reference audio content, identifying motion video content based on the motion video content being stored in association with the one of the at least one reference audio content and not stored in association with the primary audio content; and causing the identified motion video content to be displayed on a device concurrently with a presentation of the primary audio content on the device.
H04N 21/43 - Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronizing decoder's clock; Client middleware
H04N 21/439 - Processing of audio elementary streams
H04N 21/45 - Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies
Example systems and methods for automated cover song identification are disclosed. An example apparatus includes at least one memory, machine-readable instructions, and one or more processors to execute the machine-readable instructions to at least execute a constant Q transform on time slices of first audio data to output constant Q transformed time slices, binarize the constant Q transformed time slices to output binarized and constant Q transformed time slices, execute a two-dimensional Fourier transform on time windows within the binarized and constant Q transformed time slices to output two-dimensional Fourier transforms of the time windows, generate a reference data structure based on a sequential order of the two-dimensional Fourier transforms, store the reference data structure in a database, and identify a query data structure associated with query audio data as a cover rendition of the audio data based on a comparison of the query and reference data structures.
G06F 16/683 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
A method implemented by a computing system comprises generating, by the computing system, a fingerprint comprising a plurality of bin samples associated with audio content. Each bin sample is specified within a frame of the fingerprint and is associated with one of a plurality of non-overlapping frequency ranges and a value indicative of a magnitude of energy associated with a corresponding frequency range. The computing system removes, from the fingerprint, a plurality of bin samples associated with a frequency sweep in the audio content.
G10L 25/54 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination for retrieval
G10L 19/018 - Audio watermarking, i.e. embedding inaudible data in the audio signal
G10L 19/028 - Noise substitution, e.g. substituting non-tonal spectral components by noisy source
G10L 25/18 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
G10L 25/27 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique
G10L 25/72 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for transmitting results of analysis
7.
Methods and Apparatus to Fingerprint an Audio Signal Via Exponential Normalization
Methods, apparatus, systems, and articles of manufacture are disclosed to fingerprint an audio signal via exponential normalization. An example apparatus includes an audio segmenter to divide an audio signal into a plurality of audio segments including a first audio segment and a second audio segment, the first audio segment including a first time-frequency bin, the second audio segment including a second time-frequency bin, a mean calculator to determine a first exponential mean value associated with the first time frequency bin based on a first magnitude of the audio signal associated with the first time frequency bin and a second exponential mean value associated with the second time frequency bin based on a second magnitude of the audio signal associated with the second time frequency bin and the first exponential mean value. The example apparatus further includes a bin normalizer to normalize the first time-frequency bin based on the second exponential mean value and a fingerprint generator to generate a fingerprint of the audio signal based on the normalized first time-frequency bins.
G06F 16/683 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
G10L 25/21 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being power information
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
Methods, apparatus, systems and articles of manufacture are disclosed for audio equalization. Example instructions disclosed herein cause one or more processors to at least: detect an irregularity in a frequency representation of an audio signal in response to a change in volume between a set of frequency values exceeding a threshold; and adjust a volume at a first frequency value of the set of frequency values to reduce the irregularity.
H04N 21/442 - Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed or the storage space available from the internal hard disk
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
H03F 3/181 - Low-frequency amplifiers, e.g. audio preamplifiers
H04N 9/87 - Regeneration of colour television signals
H04N 21/439 - Processing of audio elementary streams
H04N 21/45 - Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies
H04R 3/04 - Circuits for transducers for correcting frequency response
A cover song identification method implemented by a computing system comprises receiving, by a computing system and from a user device, harmonic pitch class profile (HPCP) information that specifies one or more HPCP features associated with target audio content. A major chord profile feature and a minor chord profile feature associated with the target audio content are derived from the HPCP features. Machine learning logic of the computing system determines, based on the major chord profile feature and the minor chord profile feature, a relatedness between the target audio content and each of a plurality of audio content items specified in records of a database. Each audio content item is associated with cover song information. Cover song information associated with an audio content item having a highest relatedness to the target audio content is communicated to the user device.
G06F 16/683 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
Systems and methods are disclosed for dynamic content delivery based on vehicle navigational attributes. An example apparatus includes at least one memory, machine readable instructions, and processor circuitry to execute the machine readable instructions to at least obtain navigational attributes from an electronic device of a vehicle via a network, determine a relevancy score for respective ones of first sporting event data items based on the navigational attributes, based on a determination that the navigational attributes correspond to a driving condition, identify a second sporting event data item of the first sporting event data items based on a relevancy score of the second sporting event data item corresponding to the driving condition, and transmit the second sporting event data item to the electronic device of the vehicle to cause the second sporting event data item to be presented.
G01C 21/26 - Navigation; Navigational instruments not provided for in groups specially adapted for navigation in a road network
B60W 40/08 - Estimation or calculation of driving parameters for road vehicle drive control systems not related to the control of a particular sub-unit related to drivers or passengers
G01C 21/36 - Input/output arrangements for on-board computers
G06F 16/2457 - Query processing with adaptation to user needs
G06F 16/9535 - Search customisation based on user profiles and personalisation
G06F 16/9537 - Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
H04L 67/12 - Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
Methods, apparatus, systems and articles of manufacture are disclosed to identify media that has been pitch shifted, time shifted, and/or resampled. An example apparatus includes: memory; instructions in the apparatus; and processor circuitry to execute the instructions to: transmit a fingerprint of an audio signal and adjusting instructions to a central facility to facilitate a query, the adjusting instructions identifying at least one of a pitch shift, a time shift, or a resample ratio; obtain a response including an identifier for the audio signal and information corresponding to how the audio signal was adjusted; and change the adjusting instructions based on the information.
G06F 16/683 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
Methods, apparatus, systems and articles of manufacture are disclosed for dynamic volume adjustment via audio classification. Example apparatus include at least one memory; instructions; and at least one processor to execute the instructions to: analyze, with a neural network, a parameter of an audio signal associated with a first volume level to determine a classification group associated with the audio signal; determine an input volume of the audio signal; determine a classification gain value based on the classification group; determine an intermediate gain value as an intermediate between the input volume and the classification gain value by applying a first weight to the input volume and a second weight to the classification gain value; apply the intermediate gain value to the audio signal, the intermediate gain value to modify the first volume level to a second volume level; and apply a compression value to the audio signal, the compression value to modify the second volume level to a third volume level that satisfies a target volume threshold.
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
Example systems and methods for automated generation of banner images are disclosed. A program identifier associated with a particular media program may be received by a system, and used for accessing a set of iconic digital images and corresponding metadata associated with the particular media program. The system may select a particular iconic digital image for placing a banner of text associated with the particular media program, by applying an analytical model of banner-placement criteria to the iconic digital images. The system may apply another analytical model for banner generation to the particular iconic image to determine (i) dimensions and placement of a bounding box for containing the text, (ii) segmentation of the text for display within the bounding box, and (iii) selection of font, text size, and font color for display of the text. The system may store the particular iconic digital image and banner metadata specifying the banner.
An example method may include receiving, at a computing device, a digital image associated with a particular media content program, the digital image containing one or more faces of particular people associated with the particular media content program. A computer-implemented automated face recognition program may be applied to the digital image to recognize, based on at least one feature vector from a prior-determined set of feature vectors, one or more of the particular people in the digital image, together with respective geometric coordinates for each of the one or more detected faces. At least a subset of the prior-determined set of feature vectors may be associated with a respective one of the particular people. The digital image together may be stored in non-transitory computer-readable memory, together with information assigning respective identities of the recognized particular people, and associating with each respective assigned identity geometric coordinates in the digital image.
G06V 40/16 - Human faces, e.g. facial parts, sketches or expressions
G06F 16/783 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
G06T 7/73 - Determining position or orientation of objects or cameras using feature-based methods
15.
SELECTING BALANCED CLUSTERS OF DESCRIPTIVE VECTORS
A clustering machine can cluster descriptive vectors in a balanced manner. The clustering machine calculates distances between pairs of descriptive vectors and generates clusters of vectors arranged in a hierarchy. The clustering machine determines centroid vectors of the clusters, such that each cluster is represented by its corresponding centroid vector. The clustering machine calculates a sum of inter-cluster vector distances between pairs of centroid vectors, as well as a sum of intra-cluster vector distances between pairs of vectors in the clusters. The clustering machine calculates multiple scores of the hierarchy by varying a scalar and calculating a separate score for each scalar. The calculation of each score is based on the two sums previously calculated for the hierarchy. The clustering machine may select or otherwise identify a balanced subset of the hierarchy by finding an extremum in the calculated scores.
Methods, apparatus, systems and articles of manufacture are disclosed to identify media. An example method includes: in response to a query, generating an adjusted sample media fingerprint by applying an adjustment to a sample media fingerprint; comparing the adjusted sample media fingerprint to a reference media fingerprint; and in response to the adjusted sample media fingerprint matching the reference media fingerprint, transmitting information associated with the reference media fingerprint and the adjustment.
G06F 16/683 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
Methods, apparatus, systems and articles of manufacture are disclosed to improve detection of audio signatures. An example apparatus includes at least one memory, instructions in the apparatus, and processor circuitry to execute the instructions to: determine a first time difference of arrival for a first audio sensor of a meter and a second audio sensor of the meter based on a first audio recording from the first audio sensor and a second audio recording from the second audio sensor; determine a second time difference of arrival for the first audio sensor and a third audio sensor of the meter based on the first audio recording and a third audio recording from the third audio sensor; determine a match by comparing the first time difference of arrival to i) a first virtual source time difference of arrival and ii) a second virtual source time difference of arrival; in response to determining that the first time difference of arrival matches the first virtual source time difference of arrival, identify a first virtual source location as the location of a media presentation device presenting media; and remove the second audio recording to reduce a computational burden on the processor.
G01S 5/22 - Position of source determined by co-ordinating a plurality of position lines defined by path-difference measurements
G01S 5/24 - Position of single direction-finder fixed by determining direction of a plurality of spaced sources of known location
H04R 1/40 - Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
Example methods and systems for indexing fingerprints are described. Fingerprints may be made up of sub-fingerprints, each of which corresponds to a frame of the media, which is a smaller unit of time than the fingerprint. In some example embodiments, multiple passes are performed. For example, a first pass may be performed that compares the sub-fingerprints of the query fingerprint with every thirty-second sub-fingerprint of the reference material to identify likely matches. In this example, a second pass is performed that compares the sub-fingerprints of the query fingerprint with every fourth sub-fingerprint of the likely matches to provide a greater degree of confidence. A third pass may be performed that uses every sub-fingerprint of the most likely matches, to help distinguish between similar references or to identify with greater precision the timing of the match. Each of these passes is amenable to parallelization.
Methods, apparatus, systems, and articles of manufacture are disclosed to improve media identification. An example apparatus includes a hash handler to generate a first set of reference matches by performing hash functions on a subset of media data associated with media to generate hashed media data based on a first bucket size, a candidate determiner to identify a second set of reference matches that include ones of the first set, the second set including ones having first quantities of hits that did not satisfy a threshold, determine second quantities of hits for ones of the second set by matching ones to the hash tables based on a second bucket size, and identify one or more candidate matches based on at least one of (1) ones of the first set or (2) ones of the second set, and a report generator to generate a report including a media identification.
G06F 16/683 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
G06F 18/2115 - Selection of the most significant subset of features by evaluating different subsets according to an optimisation criterion, e.g. class separability, forward selection or backward elimination
Techniques of providing an interactive programming guide with a personalized lineup are disclosed. In some embodiments, a profile is accessed, and a personalized lineup is determined based on the profile. The personalized lineup may include a corresponding media content identification assigned to each one of a plurality of sequential time slots, where each media content identification identifies media content for the corresponding time slot. A first interactive programming guide may be caused to be displayed on a first media content device associated with the profile, where the first interactive programming guide includes the personalized lineup.
H04N 21/482 - End-user interface for program selection
H04N 21/2668 - Creating a channel for a dedicated end-user group, e.g. by inserting targeted commercials into a video stream based on end-user profiles
H04N 21/431 - Generation of visual interfaces; Content or additional data rendering
H04N 21/442 - Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed or the storage space available from the internal hard disk
H04N 21/45 - Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies
H04N 21/454 - Content filtering, e.g. blocking advertisements
H04N 21/466 - Learning process for intelligent management, e.g. learning user preferences for recommending movies
H04N 21/472 - End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification or for manipulating displayed content
H04N 21/475 - End-user interface for inputting end-user data, e.g. PIN [Personal Identification Number] or preference data
H04N 21/845 - Structuring of content, e.g. decomposing content into time segments
21.
METHODS AND APPARATUS TO CONTROL LIGHTING EFFECTS BASED ON MEDIA CONTENT
Methods, apparatus, systems and articles of manufacture are disclosed to adjust device control information. The example apparatus comprises a light drive waveform generator to obtain metadata corresponding to media and generate device control information based on the metadata, the device control information to inform a lighting device to enable consecutive light pulses; an effect engine to apply an attack parameter and a decay parameter to consecutive light pulses corresponding to the device control information, the attack parameter and the decay parameter based on the metadata to affect a shape of the consecutive light pulses; and a color timeline generator to generate color information based on the metadata, the color information to inform the lighting device to change a color state.
A method and system for computer-based generation of podcast metadata, to facilitate operations such as searching for and recommending podcasts based on the generated metadata. In an example method, a computing system obtains a text representation of a podcast episode and obtains person data defining a list of person names such as celebrity names. The computing system then correlates the person data with the text representation, to find a match between a listed person name a text string in the text representation. Further, the computing system predicts a named-entity span in the text representation and determines that the predicted named-entity span matches a location of the text string in the text representation of the podcast episode, and based on this determination, the computing system generates and outputs metadata that associates the person name with the podcast episode.
G06F 16/383 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
G06F 16/335 - Filtering based on additional data, e.g. user or group profiles
G06F 16/683 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
G06F 16/783 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
G06F 30/27 - Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
A computing system automatically detects, within a digital video frame, a video frame region that depicts a textual expression of a scoreboard. The computing system (a) engages in an edge-detection process to detect edges of at least scoreboard image elements depicted by the digital video frame, with at least some of these edges being of the textual expression and defining alphanumeric shapes; (b) applies pattern-recognition to identify the alphanumeric shapes; (c) establishes a plurality of minimum bounding rectangles each bounding a respective one of the identified alphanumeric shapes; (d) establishes, based on at least two of the minimum bounding rectangles, a composite shape that encompasses the identified alphanumeric shapes that were bounded by the at least two minimum bounding rectangles; and (e) based on the composite shape occupying a particular region, deems the particular region to be the video frame region that depicts the textual expression.
G06V 10/44 - Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
G06V 20/40 - Scenes; Scene-specific elements in video content
G06V 20/62 - Text, e.g. of license plates, overlay texts or captions on TV images
24.
Methods and Apparatus to Fingerprint an Audio Signal
Methods, apparatus, systems, and articles of manufacture to fingerprint an audio signal. An example apparatus disclosed herein includes an audio segmenter to divide an audio signal into a plurality of audio segments, a bin normalizer to normalize the second audio segment to thereby create a first normalized audio segment, a subfingerprint generator to generate a first subfingerprint from the first normalized audio segment, the first subfingerprint including a first portion corresponding to a location of an energy extremum in the normalized second audio segment, a portion strength evaluator to determine a likelihood of the first portion to change, and a portion replacer to, in response to determining the likelihood does not satisfy a threshold, replace the first portion with a second portion to thereby generate a second subfingerprint.
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
G11B 27/28 - Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
25.
System and Method for Podcast Repetitive Content Detection
In one aspect, a method includes detecting a fingerprint match between query fingerprint data representing at least one audio segment within podcast content and reference fingerprint data representing known repetitive content within other podcast content, detecting a feature match between a set of audio features across multiple time-windows of the podcast content, and detecting a text match between at least one query text sentences from a transcript of the podcast content and reference text sentences, the reference text sentences comprising text sentences from the known repetitive content within the other podcast content. The method also includes responsive to the detections, generating sets of labels identifying potential repetitive content within the podcast content. The method also includes selecting, from the sets of labels, a consolidated set of labels identifying segments of repetitive content within the podcast content, and responsive to selecting the consolidated set of labels, performing an action.
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
G10L 17/02 - Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
G10L 17/06 - Decision making techniques; Pattern matching strategies
G10L 25/90 - Pitch determination of speech signals
26.
Methods and Systems for Extracting Sport-Related Information from Digital Video Frames
A computing system automatically extracting, from a digital video frame, scoreboard information including a first team name, a second team name, a first score, and a second score. The computing system (a) detects, within the digital video frame, a plurality of frame regions based on each detected frame region depicting text; (b) selects, from the detected frame regions, a set of frame regions based on the frame regions of the selected set cooperatively having a geometric arrangement that corresponds with a candidate geometric arrangement of the scoreboard information; (c) recognizes characters respectively within each of the frame regions of the selected set of frame regions; (d) based at least on the recognized characters in the frame regions of the selected set, detects the scoreboard information; and (e) records the detected scoreboard information.
G06V 10/44 - Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
G06V 20/40 - Scenes; Scene-specific elements in video content
G06V 20/62 - Text, e.g. of license plates, overlay texts or captions on TV images
Techniques of content unification are disclosed. In some example embodiments, a computer-implemented method comprises: determining clusters based a comparison of a plurality of audio content using a first matching criteria, each cluster of the plurality of clusters comprising at least two audio content from the plurality of audio content; for each cluster of the plurality of clusters, determining a representative audio content for the cluster from the at least two audio content of the cluster; loading the corresponding representative audio content of each cluster into an index; matching the query audio content to one of the representative audio contents using a first matching criteria; determining the corresponding cluster of the matched representative audio content; and identifying a match between the query audio content and at least one of the audio content of the cluster of the matched representative audio content based on a comparison using a second matching criteria.
G06F 16/683 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
G06F 16/61 - Indexing; Data structures therefor; Storage structures
Methods, apparatus, systems and articles of manufacture are disclosed to select reference sub-fingerprints for comparison to query sub-fingerprints based on a determination that a query sub-fingerprint is a match with a reference sub-fingerprint, generate a count vector that stores total counts of matches between the query sub-fingerprints and different subsets of the reference sub-fingerprints, each of the different subsets being aligned to the query sub-fingerprints at a different offset from a reference point, each of the different offsets being mapped by the count vector to a different total count, calculate a maximum count among the total counts, a median of the total counts, and a difference between the maximum count and the median of the total counts, and classify the reference sub-fingerprints as a match with the query sub-fingerprints based on the difference between the maximum count in the count vector and the median.
G06F 16/683 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
G06F 16/61 - Indexing; Data structures therefor; Storage structures
Methods and systems for automated video segmentation are disclosed. A sequence of video frames having video segments of contextually-related sub-sequences may be received. Each frame may be labeled according to segment and segment class. A video graph may be constructed in which each node corresponds to a different frame, and each edge connects a different pair of nodes, and is associated with a time between video frames and a similarity metric of the connected frames. An artificial neural network (ANN) may be trained to predict both labels for the nodes and clusters of the nodes corresponding to predicted membership among the segments, using the video graph as input to the ANN, and ground-truth clusters of ground-truth labeled nodes. The ANN may be further trained to predict segment classes of the predicted clusters, using the segment classes as ground truths. The trained ANN may be configured for application runtime video sequences.
Methods, apparatus, systems and articles of manufacture are disclosed to determine audio quality. Example apparatus disclosed herein include an equalization (EQ) model query generator to generate a query to a neural network, the query including a representation of a sample of an audio signal. Example apparatus disclosed herein also include an EQ analyzer to access a plurality of equalization settings determined by the neural network based on the query; and compare the equalization settings to an equalization threshold to determine if the audio signal is to be removed from subsequent processing.
G10L 25/60 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
G06F 16/683 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
G06N 3/04 - Architecture, e.g. interconnection topology
G10L 25/18 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
H04R 3/04 - Circuits for transducers for correcting frequency response
31.
Methods, apparatus, and articles of manufacture to identify sources of network streaming services
Methods, apparatus and articles of manufacture to identify sources of network streaming services are disclosed. An example method includes receiving a first audio signal that represents a decompressed second audio signal, identifying, from the first audio signal, a parameter of an audio compression configuration used to form the decompressed second audio signal, and identifying a source of the decompressed second audio signal based on the identified audio compression configuration.
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
G10L 25/03 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
H04H 60/58 - Arrangements characterised by components specially adapted for monitoring, identification or recognition covered by groups or of audio
32.
Automated Cropping of Images Using a Machine Learning Predictor
Example systems and methods may selection of video frames using a machine learning (ML) predictor program are disclosed. The ML predictor program may generate predicted cropping boundaries for any given input image. Training raw images associated with respective sets of training master images indicative of cropping characteristics for the training raw image may be input to the ML predictor, and the ML predictor program trained to predict cropping boundaries for raw image based on expected cropping boundaries associated training master images. At runtime, the trained ML predictor program may be applied to runtime raw images in order to generate respective sets of runtime cropping boundaries corresponding to different cropped versions of the runtime raw image. The runtime raw images may be stored with information indicative of the respective sets of runtime boundaries.
G06T 7/174 - Segmentation; Edge detection involving the use of two or more images
G06V 10/25 - Determination of region of interest [ROI] or a volume of interest [VOI]
G06V 10/26 - Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
G06V 10/774 - Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
33.
MACHINE-CONTROL OF A DEVICE BASED ON MACHINE-DETECTED TRANSITIONS
Apparatus, methods, and systems that operate to provide interactive streaming content identification and processing are disclosed. An example apparatus includes a classifier to determine an audio characteristic value representative of an audio characteristic in audio; a transition detector to detect a transition between a first category and a second category by comparing the audio characteristic value to a threshold value among a set of threshold values, the set of threshold values corresponding to the first category and the second category; and a context manager to control a device to switch from a first fingerprinting algorithm to a second fingerprinting algorithm different than the first fingerprinting algorithm, responsive to the detected transition between the first category and the second category.
G10H 1/00 - ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE - Details of electrophonic musical instruments
H04L 65/612 - Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio for unicast
H04M 1/72454 - User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions according to context-related or environment-related conditions
H04N 21/414 - Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance
H04N 21/422 - Input-only peripherals, e.g. global positioning system [GPS]
H04N 21/439 - Processing of audio elementary streams
H04N 21/44 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to MPEG-4 scene graphs
Example systems and methods are audio identification based on data structure are disclosed. An example apparatus includes memory, and one or more processors to execute instructions to execute a constant Q transform on query time slices of query audio, binarize the constant Q transformed query time slices, execute a two-dimensional Fourier transform on query time windows within the binarized and constant Q transformed query time slices to generate two-dimensional Fourier transforms of the query time windows, sequentially order the two-dimensional Fourier transforms in a query data structure, and identify the query audio as a cover rendition of reference audio based on a comparison between the query data structure and a reference data structure associated with the reference audio.
G06F 16/68 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
G06F 16/61 - Indexing; Data structures therefor; Storage structures
G06F 17/14 - Fourier, Walsh or analogous domain transformations
G10L 25/27 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
35.
Identifying and Labeling Segments within Video Content
In one aspect, an example method includes (i) obtaining fingerprint repetition data for a portion of video content, with the fingerprint repetition data including a list of other portions of video content matching the portion of video content and respective reference identifiers for the other portions of video content; (ii) identifying the portion of video content as a program segment rather than an advertisement segment based at least on a number of unique reference identifiers within the list of other portions of video content relative to a total number of reference identifiers within the list of other portions of video content; (iii) determining that the portion of video content corresponds to a program specified in an electronic program guide using a time stamp of the portion of video content; and (iv) storing an indication of the portion of video content in a data file for the program.
H04N 21/44 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to MPEG-4 scene graphs
G06V 20/40 - Scenes; Scene-specific elements in video content
H04N 21/8352 - Generation of protective data, e.g. certificates involving content or source identification data, e.g. UMID [Unique Material Identifier]
36.
Separating Media Content Into Program Segments and Advertisement Segments
In one aspect, an example method includes (i) extracting, by a computing system, features from media content; (ii) generating, by the computing system, repetition data for respective portions of the media content Extracting, by a computing system, features from media content using the features, with repetition data for a given portion including a list of other portions of the media content matching the given portion; (iii) determining, by the computing system, transition data for the media content; (iv) Generating, by the computing system, repetition data for respective selecting, by the computing system, a portion within the media content using data for a given portion includes a list of other portions of the the transition data; (v) classifying, by the computing system, the portion as media content matching the given portion either an advertisement segment or a program segment using repetition data for the portion; and (vi) outputting, by the computing system, data indicating a result of the classifying for the portion.
H04N 21/44 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to MPEG-4 scene graphs
In one aspect, an example method for generating a candidate image for use as backdrop imagery for a graphical user interface is disclosed. The method includes receiving a raw image and determining an edge image from the raw image using edge detection. The method also includes identifying a candidate region of interest (ROI) in the raw image based on the candidate ROI enclosing a portion of the edge image having edge densities exceeding a threshold edge density. The method also includes manipulating the raw image relative to a backdrop imagery canvas for a graphical user interface based on a location of the candidate ROI within the raw image. The method also includes generating, based on the manipulating, a set of candidate backdrop images in which at least a portion of the candidate ROI occupies a preselected area of the backdrop imagery canvas, and storing the set of candidate backdrop images.
Methods, apparatus, systems and articles of manufacture are disclosed for audio equalization based on variant selection. An example apparatus to equalize audio includes at least one memory, machine readable instructions, and processor circuity to at least one of instantiate or execute the machine readable instructions to train a neural network model to apply a first audio equalization profile to first audio associated with a first variant of media, and apply a second audio equalization profile to second audio associated with a second variant of media. The processor circuitry is to at least one of instantiate or execute the machine readable instructions to at least one of dispatch or execute the neural network model.
Example methods and systems for generating a video presentation to accompany audio are described. The video presentation to accompany the audio track is generated from one or more video sequences. In some example embodiments, the video sequences are divided into video segments that correspond to discontinuities between frames. Video segments are concatenated to form a video presentation to which the audio track is added. In some example embodiments, only video segments having a duration equal to an integral number of beats of music in the audio track are used to form the video presentation. In these example embodiments, transitions between video segments in the video presentation that accompanies the audio track are aligned with the beats of the music.
G06F 16/783 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
40.
User Profile Based on Clustering Tiered Descriptors
A user of a network-based system may correspond to a user profile that describes the user. The user profile may describe the user using one or more descriptors of items that correspond to the user (e.g., items owned by the user, items liked by the user, or items rated by the user). In some situations, such a user profile may be characterized as a “taste profile” that describes an array or distribution of one or more tastes, preferences, or habits of the user. Accordingly, the user profile machine within the network-based system may generate the user profile by accessing descriptors of items that correspond to the user, clustering one or more of the descriptors, and generating the user profile based on one or more clusters of the descriptors.
An embodiment may involve, based on a profile associated with a client device, selecting an audio file containing music. Based on an attribute of the audio file containing the music, an audio file containing a story may be selected. A playlist for the client device may be generated, where the playlist includes (i) a reference to the audio file containing the music, and (ii) a reference to the audio file containing the story. A server device may transmit the playlist to the client device over a wide area network. Reception of the playlist at the client device may cause an audio player application to retrieve and play out each of the audio file containing the music and the audio file containing the story.
H04L 67/06 - Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
H04L 67/10 - Protocols in which an application is distributed across nodes in the network
H04L 67/60 - Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
H04N 21/442 - Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed or the storage space available from the internal hard disk
42.
Predictive Measurement of End-User Activities at Specified Times
Methods and systems for determining if end-users are expected to be receiving transmissions from a multimedia network at a particular time. Data including end-user type, a multimedia network, a particular time slot of the repeating cycles, and a network reach descriptor may be received. End-users may be identified by end-user type. For each end-user, a probability of receiving transmissions from the multimedia network during time slots prior to the particular time slot may be determined, based on previous viewing activities. Each probability may be adjusted by an offset such that an average of the adjusted probabilities corresponds to the network reach descriptor. A determination may be made of whether or not each end-user is expected to have been receiving transmissions from the multimedia network at the particular time slot, based on the adjusted respective probability.
H04N 21/262 - Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission or generating play-lists
H04N 21/25 - Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication or learning user preferences for recommending movies
H04N 21/258 - Client or end-user data management, e.g. managing client capabilities, user preferences or demographics or processing of multiple end-users preferences to derive collaborative data
43.
Methods and Systems for Determining Accuracy of Sport-Related Information Extracted from Digital Video Frames
A computing system determines accuracy of sport-related information extracted from a time sequence of digital video frames that represent a sport event, the extracted sport-related information including an attribute that changes over the time sequence. The computing system (a) detects, based on the extracted sport-related information, a pattern of change of the attribute over the time sequence and (b) makes a determination of whether the detected pattern is an expected pattern of change associated with the sport event. If the determination is that the detected pattern is the expected pattern, then, responsive to making the determination, the computing system takes a first action that corresponds to the sport-related information being accurate. Whereas, if the determination is that the detected pattern is not the expected pattern, then, responsive to making the determination, the computing system takes a second action that corresponds to the sport-related information being inaccurate.
Methods and systems for determining projected amounts of viewing time of a TV program by end-users are disclosed. Data including end-user type, a TV program descriptor, TV network, and start time of transmission may be received. End-users may be identified by end-user type. A machine-learning model applied to the data and viewing history data may generate parameters for determining how much of the TV program they are each expected to view during a sequence of time intervals. For each end-user, the parameters may be applied to make a determination of temporal-fraction values of the TV program the end-user is expected to view during the time interval, and for each time interval, conditioning values used to condition the determination for the next time interval. Projected subtotals of viewing time may be determined, based on the temporal-fraction values. A projected total amount viewing time of the TV program may then be determined.
H04N 21/25 - Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication or learning user preferences for recommending movies
H04N 21/24 - Monitoring of processes or resources, e.g. monitoring of server load, available bandwidth or upstream requests
H04N 21/258 - Client or end-user data management, e.g. managing client capabilities, user preferences or demographics or processing of multiple end-users preferences to derive collaborative data
H04N 21/262 - Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission or generating play-lists
A machine may form all or part of a network-based system configured to provide media service to one or more user devices. The machine may be configured to define a station library within a larger collection of media files. In particular, the machine may access metadata that describes a seed that forms the basis on which the station library is to be defined. The machine may determine a genre composition for the station library based on the metadata. The machine may generate a list of media files from the metadata based on a relevance of each media file to the station library. The machine may determine the relevance of each media file based on a similarity of the media file to the genre composition of the station library as well as a comparison of metadata describing the media file to the accessed metadata that describes the seed.
In one aspect, an example method includes (i) extracting a sequence of audio features from a portion of a sequence of media content; (ii) extracting a sequence of video features from the portion of the sequence of media content; (iii) providing the sequence of audio features and the sequence of video features as an input to a transition detector neural network that is configured to classify whether or not a given input includes a transition between different content segments; (iv) obtaining from the transition detector neural network classification data corresponding to the input; (v) determining that the classification data is indicative of a transition between different content segments; and (vi) based on determining that the classification data is indicative of a transition between different content segments, outputting transition data indicating that the portion of the sequence of media content includes a transition between different content segments.
G06F 18/2413 - Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
G06N 3/049 - Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
G06V 10/80 - Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
G06V 20/40 - Scenes; Scene-specific elements in video content
H04N 21/234 - Processing of video elementary streams, e.g. splicing of video streams or manipulating MPEG-4 scene graphs
Systems and methods for monitoring of icon in an external display device are disclosed. Images of an icon displayed in a display device may be continually captured as video frames by a video camera of an icon monitoring system. While operating in a first mode, video frames may be continually analyzed to determine if the captured image matches an active template icon known to match the captured image of the icon. While the captured image matches the active template icon, operating in the first mode continues. Upon detecting a failed match to the active template icon, the system starts operating in a second to search among known template icons for a new match. Upon finding a new match, the active template icon may be updated to the new match, and operation switches back to the first mode. Times of transitions between the first and second modes may be recorded.
G06N 3/082 - Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
G06F 3/04817 - Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance using icons
Methods and systems for prediction audience ratings are disclosed. A database of television (TV) viewing data may include program records for a multiplicity of existing TV programs. A system may receive a training plurality of program records from the TV viewing data, and for each program record a most similar TV program based on content characteristics may be identified. A synthetic program record may be constructed by merging features of each record and its most similar record. Audience performance metrics may be omitted from synthetic records. An aggregate of the training plurality of program records and the synthetic program records may be used to train a machine-learning (ML) model to predict audience performance metrics of the new or hypothetical TV programs not yet available for viewing and/or not yet transmitted or streamed.
H04N 21/25 - Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication or learning user preferences for recommending movies
H04N 21/442 - Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed or the storage space available from the internal hard disk
Accurately detection of logos in media content on media presentation devices is addressed. Logos and products are detected in media content produced in retail deployments using a camera. Logo recognition uses saliency analysis, segmentation techniques, and stroke analysis to segment likely logo regions. Logo recognition may suitably employ feature extraction, signature representation, and logo matching. These three approaches make use of neural network based classification and optical character recognition (OCR). One method for OCR recognizes individual characters then performs string matching. Another OCR method uses segment level character recognition with N-gram matching. Synthetic image generation for training of a neural net classifier and utilizing transfer learning features of neural networks are employed to support fast addition of new logos for recognition.
G06V 10/46 - Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
50.
Methods and apparatus for efficient media indexing
Methods, apparatus, systems and articles of manufacture are disclosed for efficient media indexing. An example method disclosed herein includes means for initiating a list of hash seeds, the list of hash seeds including at least a first hash seed value and a second hash seed value among other hash seed values, means for generating to generate a first bucket distribution based on the first hash seed value and a first hash function and generate a second bucket distribution based on the second hash seed value used in combination with the first hash seed value, means for determining to determine a first entropy value of the first bucket distribution, wherein data associated with the first bucket distribution is stored in a first hash table and determine a second entropy value of the second bucket distribution.
In one aspect, an example method to be performed by a vehicle-based media system includes (a) receiving audio content; (b) causing one or more speakers to output the received audio content; (c) using a microphone of the vehicle-based media system to capture the output audio content; (d) identifying reference audio content that has at least a threshold extent of similarity with the captured audio content; (e) identifying visual content based at least on the identified reference audio content; and (f) outputting, via a user interface of the vehicle-based media system, the identified visual content.
H04H 20/62 - Arrangements specially adapted for specific applications, e.g. for traffic information or for mobile receivers for local area broadcast, e.g. instore broadcast for transportation systems, e.g. in vehicles
G01C 21/36 - Input/output arrangements for on-board computers
G06Q 30/0207 - Discounts or incentives, e.g. coupons or rebates
G10L 19/018 - Audio watermarking, i.e. embedding inaudible data in the audio signal
H04N 21/41 - Structure of client; Structure of client peripherals
H04N 21/414 - Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance
H04N 21/422 - Input-only peripherals, e.g. global positioning system [GPS]
H04R 3/12 - Circuits for transducers for distributing signals to two or more loudspeakers
H04W 4/02 - Services making use of location information
H04W 4/44 - Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P] for communication between vehicles and infrastructures, e.g. vehicle-to-cloud [V2C] or vehicle-to-home [V2H]
52.
Synthesizing A Presentation From Multiple Media Clips
In an example implementation, a method is described. The implementation accesses first and second media clips. The implementation also matches a first fingerprint of the first media clip with a second fingerprint of the second media clip and determines an overlap of the first media clip with the second media clip. The implementation also, based on the overlap, merges the first and second media clips into a group of overlapping media clips, transmits, to a client device, data identifying the group of overlapping media clips and specifying a synchronization of the first media clip with the second media clip, and generates for display on a display device of the client computing device, a graphical user interface that identifies the group of overlapping media clips, specifies the synchronization of the first media clip with the second media clip, and allows access to, and manipulation of, the first and second media clips.
G11B 27/28 - Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
In one aspect, an example method to be performed by a vehicle-based media system includes (a) receiving audio content; (b) causing one or more speakers to output the received audio content; (c) using a microphone of the vehicle-based media system to capture the output audio content; (d) identifying reference audio content that has at least a threshold extent of similarity with the captured audio content; (e) identifying a geographic location associated with the identified reference audio content; and (f) based at least on the identified geographic location associated with the identified reference audio content, outputting, via the user interface of the vehicle-based media system, a prompt to navigate to the identified geographic location.
H04H 20/62 - Arrangements specially adapted for specific applications, e.g. for traffic information or for mobile receivers for local area broadcast, e.g. instore broadcast for transportation systems, e.g. in vehicles
G01C 21/36 - Input/output arrangements for on-board computers
G06Q 30/0207 - Discounts or incentives, e.g. coupons or rebates
G10L 19/018 - Audio watermarking, i.e. embedding inaudible data in the audio signal
H04N 21/41 - Structure of client; Structure of client peripherals
H04N 21/414 - Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance
H04N 21/422 - Input-only peripherals, e.g. global positioning system [GPS]
H04R 3/12 - Circuits for transducers for distributing signals to two or more loudspeakers
H04W 4/02 - Services making use of location information
H04W 4/44 - Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P] for communication between vehicles and infrastructures, e.g. vehicle-to-cloud [V2C] or vehicle-to-home [V2H]
54.
Selection of Video Frames Using a Machine Learning Predictor
Example systems and methods of selection of video frames using a machine learning (ML) predictor program are disclosed. The ML predictor program may generate predicted cropping boundaries for any given input image. Training raw images associated with respective sets of training master images indicative of cropping characteristics for the training raw image may be input to the ML predictor, and the ML predictor program trained to predict cropping boundaries for raw image based on expected cropping boundaries associated training master images. At runtime, the trained ML predictor program may be applied to a sequence of video image frames to determine for each respective video image frame a respective score corresponding to a highest statistical confidence associated with one or more subsets of cropping boundaries predicted for the respective video image frame. Information indicative of the respective video image frame having the highest score may be stored or recorded.
Systems and methods are provided for filtering at least one media content catalog based on criteria for a station library to generate a first list of candidate tracks for the station library, combining a similarity score and a popularity score for each track of the first list of candidate tracks to generate a total score for each track of the first list of candidate tracks, generating a list of top ranked tracks for the first genre, and returning the list of top ranked tracks of the first genre as part of the station library.
A machine may be configured to generate one or more audio fingerprints of one or more segments of audio data. The machine may access audio data to be fingerprinted and divide the audio data into segments. For any given segment, the machine may generate a spectral representation from the segment; generate a vector from the spectral representation; generate an ordered set of permutations of the vector; generate an ordered set of numbers from the permutations of the vector; and generate a fingerprint of the segment of the audio data, which may be considered a sub-fingerprint of the audio data. In addition, the machine or a separate device may be configured to determine a likelihood that candidate audio data matches reference audio data.
Example methods and systems for inserting information into playing content are described. In some example embodiments, the methods and systems may identify a break in content playing via a playback device, select an information segment representative of information received by the playback device to present during the identified break, and insert the information segment into the content playing via the playback device upon an occurrence of the identified break.
G06F 16/683 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
A machine is configured to identify a media file that, when played to a user, is likely to modify an emotional or physical state of the user to or towards a target emotional or physical state. The machine accesses play counts that quantify playbacks of media files for the user. The playbacks may be locally performed or detected by the machine from ambient sound. The machine accesses arousal scores of the media files and determines a distribution of the play counts over the arousal scores. The machine uses one or more relative maxima in the distribution in selecting a target arousal score for the user based on contextual data that describes an activity of the user. The machine selects one or more media files based on the target arousal score. The machine may then cause the selected media file to be played to the user.
Methods and systems for modification of electronic system operation based on acoustic ambience classification are presented. In an example method, at least one audio signal present in a physical environment of a user is detected. The at least one audio signal is analyzed to extract at least one audio feature from the audio signal. The audio signal is classified based on the audio feature to produce at least one classification of the audio signal. Operation of an electronic system interacting with the user in the physical environment is modified based on the classification of the audio signal.
Methods, apparatus, and systems are disclosed for synchronizing streaming media content. An example apparatus includes a storage device, and a processor to execute instructions to identify a first source streaming broadcast media to a first computing device based on an audio fingerprint of audio associated with the broadcast media, identify sources broadcasting the broadcast media streaming to the first computing device, the sources available to a second computing device including the processor, select a second source of the identified sources for streaming the broadcast media to the second computing device, the second source different than the first source, detect termination of the streaming of the broadcast media on the first computing device, the termination corresponding to a termination time of the broadcast media, and automatically start, by using the selected second source, streaming of the broadcast media to the second computing device at the termination time.
H04N 21/43 - Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronizing decoder's clock; Client middleware
H04N 21/439 - Processing of audio elementary streams
H04H 60/40 - Arrangements for identifying or recognising characteristics with a direct linkage to broadcast information or to broadcast space-time, e.g. for identifying broadcast stations or for identifying users for identifying broadcast time or space for identifying broadcast time
H04H 60/58 - Arrangements characterised by components specially adapted for monitoring, identification or recognition covered by groups or of audio
H04H 60/65 - Arrangements for services using the result of monitoring, identification or recognition covered by groups or for using the result on users' side
H04L 65/611 - Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio for multicast or broadcast
In one aspect, an example method to be performed by a computing device includes (a) determining that a ride-sharing session is active; (b) in response to determining the ride-sharing session is active, using a microphone of the computing device to capture audio content; (c) identifying reference audio content that has at least a threshold extent of similarity with the captured audio content; (d) determining that the ride-sharing session is inactive; and (e) outputting an indication of the identified reference audio content.
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
G06F 16/68 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
G06F 16/683 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
62.
Obtaining artist imagery from video content using facial recognition
An example method may include receiving, at a computing device, a digital image associated with a particular media content program, the digital image containing one or more faces of particular people associated with the particular media content program. A computer-implemented automated face recognition program may be applied to the digital image to recognize, based on at least one feature vector from a prior-determined set of feature vectors, one or more of the particular people in the digital image, together with respective geometric coordinates for each of the one or more detected faces. At least a subset of the prior-determined set of feature vectors may be associated with a respective one of the particular people. The digital image together may be stored in non-transitory computer-readable memory, together with information assigning respective identities of the recognized particular people, and associating with each respective assigned identity geometric coordinates in the digital image.
G06T 7/73 - Determining position or orientation of objects or cameras using feature-based methods
G06F 16/783 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
G06V 40/16 - Human faces, e.g. facial parts, sketches or expressions
63.
Methods and Apparatus for Harmonic Source Enhancement
Methods and apparatus for harmonic source enhancement are disclosed herein. An example apparatus includes an interface to receive a media signal. The example apparatus also includes a harmonic source enhancer to determine a magnitude spectrogram of audio corresponding to the media signal; generate a time-frequency mask based on the magnitude spectrogram; and apply the time-frequency mask to the magnitude spectrogram to enhance a harmonic source of the media signal.
G10K 11/175 - Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
G10L 25/18 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
64.
Methods and apparatus for playback using pre-processed information and personalization
Methods, apparatus, systems and articles of manufacture are disclosed for playback using pre-processed profile information and personalization. Example apparatus disclosed herein include a synchronizer to, in response to receiving a media signal to be played on a playback device, access an equalization (EQ) profile corresponding to the media signal; an EQ personalization manager to generate a personalized EQ setting; and an EQ adjustment implementor to modify playback of the media signal on the playback device based on a blended equalization generation based on the EQ profile and the personalized EQ setting.
H04N 21/442 - Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed or the storage space available from the internal hard disk
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
H04N 9/87 - Regeneration of colour television signals
H04N 21/439 - Processing of audio elementary streams
H04N 21/45 - Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies
H04R 3/04 - Circuits for transducers for correcting frequency response
H03F 3/181 - Low-frequency amplifiers, e.g. audio preamplifiers
65.
GENERATION OF MEDIA STATION PREVIEWS USING A REFERENCE DATABASE
In one aspect, an example method includes (i) while a media playback device of a vehicle is playing back content received on a first channel, sending, by the media playback device to a server, a preview request, the preview request identifying a second channel that is different from the first channel; (ii) receiving, by the media playback device from the server, a response to the preview request, the response including identifying information corresponding to content being provided on the second channel; and (iii) while the media playback device is playing back the content received on the first channel, providing, by the media playback device for display, at least a portion of the identifying information corresponding to content being provided on the second channel.
H04N 21/44 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to MPEG-4 scene graphs
H04N 21/472 - End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification or for manipulating displayed content
H04N 21/2387 - Stream processing in response to a playback request from an end-user, e.g. for trick-play
H04N 21/278 - Content descriptor database or directory service for end-user access
H04N 21/45 - Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies
Techniques of providing motion video content along with audio content are disclosed. In some example embodiments, a computer-implemented system is configured to perform operations comprising: receiving primary audio content; determining that at least one reference audio content satisfies a predetermined similarity threshold based on a comparison of the primary audio content with the at least one reference audio content; for each one of the at least one reference audio content, identifying motion video content based on the motion video content being stored in association with the one of the at least one reference audio content and not stored in association with the primary audio content; and causing the identified motion video content to be displayed on a device concurrently with a presentation of the primary audio content on the device.
H04N 21/439 - Processing of audio elementary streams
H04N 21/43 - Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronizing decoder's clock; Client middleware
H04N 21/45 - Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies
67.
Methods and apparatus for dynamic volume adjustment via audio classification
Methods, apparatus, systems and articles of manufacture are disclosed for dynamic volume adjustment via audio classification. Example apparatus include at least one memory; instructions; and at least one processor to execute the instructions to: analyze, with a neural network, a parameter of an audio signal associated with a first volume level to determine a classification group associated with the audio signal; determine an input volume of the audio signal; determine a classification gain value based on the classification group; determine an intermediate gain value as an intermediate between the input volume and the classification gain value by applying a first weight to the input volume and a second weight to the classification gain value; apply the intermediate gain value to the audio signal, the intermediate gain value to modify the first volume level to a second volume level; and apply a compression value to the audio signal, the compression value to modify the second volume level to a third volume level that satisfies a target volume threshold.
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
Apparatus, systems, articles of manufacture, and methods for volume adjustment are disclosed herein. An example method includes collecting data corresponding to a volume of an audio signal as the audio signal is output through a device, when an average volume of the audio signal does not satisfy a volume threshold for a specified timespan, determining a difference between the average volume and a desired volume, and applying a gain to the audio signal to adjust the volume of the audio signal to the desired volume, the gain determined based on the difference between the average volume and the desired volume.
A method for controlling presentation of metadata regarding media. A system could generate query fingerprints representing media content being presented, the media content having been identified as being a first media-content item. The system could further detect a threshold mismatch comprising at least one of the query fingerprints not matching any of first reference fingerprints known to represent the first media-content item. In response, the system could engage in new media identification, establishing that the media content is a second media-content item, and could obtain both second reference fingerprints known to represent the second media-content item and metadata regarding the second media-content item. Further, the system could validate the new identification as a condition precedent to presenting the obtained metadata, the validating including comparing with the obtained second digital reference fingerprints the at least one digital query fingerprint that did not match any of the first digital reference fingerprints.
G06F 16/483 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
41 - Education, entertainment, sporting and cultural services
Goods & Services
Entertainment services, namely, providing information related to entertainment media, namely, music, music artists, entertainers, radio, podcasts, movies, television, entertainment video, and sports via a global communication network; providing an online database accessible to facilitate search and discovery of entertainment media, and to provide information related to entertainment media for presentation in connection with an end user playing entertainment media, selecting entertainment media for playback, and requesting information related to entertainment media
71.
Selecting balanced clusters of descriptive vectors
A clustering machine can cluster descriptive vectors in a balanced manner. The clustering machine calculates distances between pairs of descriptive vectors and generates clusters of vectors arranged in a hierarchy. The clustering machine determines centroid vectors of the clusters, such that each cluster is represented by its corresponding centroid vector. The clustering machine calculates a sum of inter-cluster vector distances between pairs of centroid vectors, as well as a sum of intra-cluster vector distances between pairs of vectors in the clusters. The clustering machine calculates multiple scores of the hierarchy by varying a scalar and calculating a separate score for each scalar. The calculation of each score is based on the two sums previously calculated for the hierarchy. The clustering machine may select or otherwise identify a balanced subset of the hierarchy by finding an extremum in the calculated scores.
Techniques of providing an interactive programming guide with a personalized lineup are disclosed. In some embodiments, a profile is accessed, and a personalized lineup is determined based on the profile. The personalized lineup may include a corresponding media content identification assigned to each one of a plurality of sequential time slots, where each media content identification identifies media content for the corresponding time slot. A first interactive programming guide may be caused to be displayed on a first media content device associated with the profile, where the first interactive programming guide includes the personalized lineup.
H04N 21/482 - End-user interface for program selection
H04N 21/2668 - Creating a channel for a dedicated end-user group, e.g. by inserting targeted commercials into a video stream based on end-user profiles
H04N 21/431 - Generation of visual interfaces; Content or additional data rendering
H04N 21/442 - Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed or the storage space available from the internal hard disk
H04N 21/45 - Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies
H04N 21/454 - Content filtering, e.g. blocking advertisements
H04N 21/466 - Learning process for intelligent management, e.g. learning user preferences for recommending movies
H04N 21/472 - End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification or for manipulating displayed content
H04N 21/475 - End-user interface for inputting end-user data, e.g. PIN [Personal Identification Number] or preference data
H04N 21/845 - Structuring of content, e.g. decomposing content into time segments
73.
Methods and apparatus to identify media that has been pitch shifted, time shifted, and/or resampled
Methods, apparatus, systems and articles of manufacture are disclosed to identify media that has been pitch shifted, time shifted, and/or resampled. An example apparatus includes: memory; instructions in the apparatus; and processor circuitry to execute the instructions to: transmit a fingerprint of an audio signal and adjusting instructions to a central facility to facilitate a query, the adjusting instructions identifying at least one of a pitch shift, a time shift, or a resample ratio; obtain a response including an identifier for the audio signal and information corresponding to how the audio signal was adjusted; and change the adjusting instructions based on the information.
G06F 16/683 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
In one aspect, an example method includes (i) retrieving, from a text index, closed captioning repetition data for a segment of a sequence of media content; (ii) generating features using the closed captioning repetition data; (iii) providing the features as input to a classification model, wherein the classification model is configured to output classification data indicative of a likelihood of the features being characteristic of a program segment; (iv) obtaining the classification data output by the classification model; (v) determining a prediction of whether the segment is a program segment using the classification data; and (vi) storing the prediction for the segment in a database.
A method implemented by a computing system comprises generating, by the computing system, a fingerprint comprising a plurality of bin samples associated with audio content. Each bin sample is specified within a frame of the fingerprint and is associated with one of a plurality of non-overlapping frequency ranges and a value indicative of a magnitude of energy associated with a corresponding frequency range. The computing system removes, from the fingerprint, a plurality of bin samples associated with a frequency sweep in the audio content.
G10L 19/018 - Audio watermarking, i.e. embedding inaudible data in the audio signal
G10L 25/27 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique
G10L 25/54 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination for retrieval
G10L 25/72 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for transmitting results of analysis
G10L 19/028 - Noise substitution, e.g. substituting non-tonal spectral components by noisy source
G10L 25/18 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
76.
Dynamic content delivery based on vehicle navigational attributes
Systems and methods are disclosed for dynamic content delivery based on vehicle navigational attributes. An example apparatus includes at least one memory, machine readable instructions, and processor circuitry to execute the machine readable instructions to at least obtain navigational attributes from an electronic device of a vehicle via a network, determine a relevancy score for respective ones of first sporting event data items based on the navigational attributes, based on a determination that the navigational attributes correspond to a driving condition, identify a second sporting event data item of the first sporting event data items based on a relevancy score of the second sporting event data item corresponding to the driving condition, and transmit the second sporting event data item to the electronic device of the vehicle to cause the second sporting event data item to be presented.
G06F 15/16 - Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
B60W 40/08 - Estimation or calculation of driving parameters for road vehicle drive control systems not related to the control of a particular sub-unit related to drivers or passengers
G01C 21/26 - Navigation; Navigational instruments not provided for in groups specially adapted for navigation in a road network
G01C 21/36 - Input/output arrangements for on-board computers
G06F 16/2457 - Query processing with adaptation to user needs
G06F 16/9535 - Search customisation based on user profiles and personalisation
G06F 16/9537 - Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
H04L 67/12 - Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
In one aspect, an example method includes (i) determining, by a computing system, a mean image of a set of frames of video content; (ii) extracting, by the computing system, a reference template of static content from the mean image; (iii) identifying, by the computing system, the extracted reference template of static content in a frame of the set of frames of the video content; (iv) labeling a segment within the video content as either a program segment or an advertisement segment based on the identifying of the extracted reference template of static content in the frame of the video content; and (v) generating data identifying the labeled segment.
A method and system for computer-based generation of podcast metadata, to facilitate operations such as searching for and recommending podcasts based on the generated metadata. In an example method, a computing system obtains a text representation of a podcast episode and obtains person data defining a list of person names such as celebrity names. The computing system then correlates the person data with the text representation, to find a match between a listed person name a text string in the text representation. Further, the computing system predicts a named-entity span in the text representation and determines that the predicted named-entity span matches a location of the text string in the text representation of the podcast episode, and based on this determination, the computing system generates and outputs metadata that associates the person name with the podcast episode.
G06F 16/383 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
G06F 16/683 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
G06F 16/783 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
G06F 30/27 - Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
Example systems and methods for automated generation of banner images are disclosed. A program identifier associated with a particular media program may be received by a system, and used for accessing a set of iconic digital images and corresponding metadata associated with the particular media program. The system may select a particular iconic digital image for placing a banner of text associated with the particular media program, by applying an analytical model of banner-placement criteria to the iconic digital images. The system may apply another analytical model for banner generation to the particular iconic image to determine (i) dimensions and placement of a bounding box for containing the text, (ii) segmentation of the text for display within the bounding box, and (iii) selection of font, text size, and font color for display of the text. The system may store the particular iconic digital image and banner metadata specifying the banner.
Methods, apparatus, systems and articles of manufacture are disclosed for audio equalization. Example instructions disclosed herein cause one or more processors to at least: detect an irregularity in a frequency representation of an audio signal in response to a change in volume between a set of frequency values exceeding a threshold; and adjust a volume at a first frequency value of the set of frequency values to reduce the irregularity.
H04N 21/442 - Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed or the storage space available from the internal hard disk
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
H03F 3/181 - Low-frequency amplifiers, e.g. audio preamplifiers
H04N 9/87 - Regeneration of colour television signals
H04N 21/439 - Processing of audio elementary streams
H04N 21/45 - Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies
H04R 3/04 - Circuits for transducers for correcting frequency response
Methods and systems for automated video segmentation are disclosed. A sequence of video frames having video segments of contextually-related sub-sequences may be received. Each frame may be labeled according to segment and segment class. A video graph may be constructed in which each node corresponds to a different frame, and each edge connects a different pair of nodes, and is associated with a time between video frames and a similarity metric of the connected frames. An artificial neural network (ANN) may be trained to predict both labels for the nodes and clusters of the nodes corresponding to predicted membership among the segments, using the video graph as input to the ANN, and ground-truth clusters of ground-truth labeled nodes. The ANN may be further trained to predict segment classes of the predicted clusters, using the segment classes as ground truths. The trained ANN may be configured for application runtime video sequences.
A computing system engages in digital image processing of received video frames to generate sport data that indicates a score and/or a time associated with a sport event. The digital image processing includes: (i) identifying a first frame region of the video frames based on the first frame region depicting a scoreboard; (ii) executing a first procedure that analyzes the identified first frame region to detect, within the identified first frame region, second frame region(s) based on the second frame region(s) depicting text of the scoreboard; (iii) in response to detecting the second frame region(s), executing a second procedure to recognize the text in at least one of the second frame region(s); and (iv) based at least on the recognizing of the text, generating the sport data. In response to completing the digital image processing, the computing system then carries out an action based on the generated sport data.
A computing system automatically detects, within a digital video frame, a video frame region that depicts a textual expression of a scoreboard. The computing system (a) engages in an edge-detection process to detect edges of at least scoreboard image elements depicted by the digital video frame, with at least some of these edges being of the textual expression and defining alphanumeric shapes; (b) applies pattern-recognition to identify the alphanumeric shapes; (c) establishes a plurality of minimum bounding rectangles each bounding a respective one of the identified alphanumeric shapes; (d) establishes, based on at least two of the minimum bounding rectangles, a composite shape that encompasses the identified alphanumeric shapes that were bounded by the at least two minimum bounding rectangles; and (e) based on the composite shape occupying a particular region, deems the particular region to be the video frame region that depicts the textual expression.
G06V 10/44 - Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
G06V 20/40 - Scenes; Scene-specific elements in video content
G06V 20/62 - Text, e.g. of license plates, overlay texts or captions on TV images
84.
GENERATION OF MEDIA STATION PREVIEWS USING A SECONDARY TUNER
In one aspect, an example method includes (i) while a media playback device of a vehicle is playing back content received on a first channel, generating, by the media playback device, a query fingerprint using second content received on a second channel; (ii) sending, by the media playback device, the query fingerprint to a server that maintains a reference database containing a plurality of reference fingerprints; (iii) receiving, by the media playback device from the server, identifying information corresponding to a reference fingerprint of the plurality of reference fingerprints that matches the query fingerprint; and (iv) while the media playback device is playing back the first content received on the first channel, providing, by the media playback device for display, at least a portion of the identifying information.
G05B 19/42 - Recording and playback systems, i.e. in which the programme is recorded from a cycle of operations, e.g. the cycle of operations being manually controlled, after which this record is played back on the same machine
G06F 3/0481 - Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
H04N 5/50 - Tuning indicators; Automatic tuning control
85.
Methods and apparatus to segment audio and determine audio segment similarities
Methods, apparatus, and systems are disclosed to segment audio and determine audio segment similarities. An example apparatus includes at least one memory storing instructions and processor circuitry to execute instructions to at least select an anchor index beat of digital audio, identify a first segment of the digital audio based on the anchor index beat to analyze, the first segment having at least two beats and a respective center beat, concatenate time-frequency data of the at least two beats and the respective center beat to form a matrix of the first segment, generate a first deep feature based on the first segment, the first deep feature indicative of a descriptor of the digital audio, and train internal coefficients to classify the first deep feature as similar to a second deep feature based on the descriptor of the first deep feature and a descriptor of a second deep feature.
Methods, apparatus, systems and articles of manufacture are disclosed to improve detection of audio signatures. An example apparatus includes at least one memory, instructions in the apparatus, and processor circuitry to execute the instructions to: determine a first time difference of arrival for a first audio sensor of a meter and a second audio sensor of the meter based on a first audio recording from the first audio sensor and a second audio recording from the second audio sensor; determine a second time difference of arrival for the first audio sensor and a third audio sensor of the meter based on the first audio recording and a third audio recording from the third audio sensor; determine a match by comparing the first time difference of arrival to i) a first virtual source time difference of arrival and ii) a second virtual source time difference of arrival; in response to determining that the first time difference of arrival matches the first virtual source time difference of arrival, identify a first virtual source location as the location of a media presentation device presenting media; and remove the second audio recording to reduce a computational burden on the processor.
G01S 5/22 - Position of source determined by co-ordinating a plurality of position lines defined by path-difference measurements
G01S 5/24 - Position of single direction-finder fixed by determining direction of a plurality of spaced sources of known location
H04R 1/40 - Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
Example methods and systems for inserting information into playing content are described. In some example embodiments, the methods and systems may identify a break in content playing via a playback device, select an information segment representative of information received by the playback device to present during the identified break, and insert the information segment into the content playing via the playback device upon an occurrence of the identified break.
G06F 16/683 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
G11B 27/11 - Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information not detectable on the record carrier
Apparatus, systems, articles of manufacture, and methods for volume adjustment are disclosed herein. An example method includes collecting data corresponding to a volume of an audio signal as the audio signal is output through a device, when an average volume of the audio signal does not satisfy a volume threshold for a specified timespan, determining a difference between the average volume and a desired volume, and applying a gain to the audio signal to adjust the volume of the audio signal to the desired volume, the gain determined based on the difference between the average volume and the desired volume.
Methods and apparatus for audio identification during a performance are disclosed herein. An example apparatus includes at least one memory and at least one processor to transform a segment of audio into a log-frequency spectrogram based on a constant Q transform using a logarithmic frequency resolution, transform the log-frequency spectrogram into a binary image, each pixel of the binary image corresponding to a time frame and frequency channel pair, each frequency channel representing a corresponding quarter tone frequency channel in a range from C3-C8, generate a matrix product of the binary image and a plurality of reference fingerprints, normalize the matrix product to form a similarity matrix, select an alignment of a line in the similarity matrix that intersects one or more bins in the similarity matrix with the largest calculated Hamming similarities, and select a reference fingerprint based on the alignment.
G06F 16/683 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
90.
Methods and Systems for Scoreboard Region Detection
A computing system automatically detects, in a sequence of video frames, a video frame region that depicts a scoreboard. The video frames of the sequence depict image elements including (i) scoreboard image elements that are unchanging across the video frames of the sequence and (ii) other image elements that change across the video frames of the sequence. Given this, the computing system (a) receives the sequence, (b) engages in an edge-detection process to detect, in the video frames of the sequence, a set of edges of the depicted image elements, (c) identifies a subset of the detected set of edges based on each edge of the subset being unchanging across the video frames of the sequence, and (d) detects, based on the edges of the identified subset, the video frame region that depicts the scoreboard.
G06V 20/40 - Scenes; Scene-specific elements in video content
G06V 20/62 - Text, e.g. of license plates, overlay texts or captions on TV images
G06V 10/44 - Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
G06V 10/75 - Image or video pattern matching; Proximity measures in feature spaces using context analysis; Selection of dictionaries
91.
Vehicle-based media system with audio ad and visual content synchronization feature
In one aspect, an example method to be performed by a vehicle-based media system includes (a) receiving audio content; (b) causing one or more speakers to output the received audio content; (c) using a microphone of the vehicle-based media system to capture the output audio content; (d) identifying reference audio content that has at least a threshold extent of similarity with the captured audio content; (e) identifying visual content based at least on the identified reference audio content; and (f) outputting, via a user interface of the vehicle-based media system, the identified visual content.
G10L 19/018 - Audio watermarking, i.e. embedding inaudible data in the audio signal
H04H 20/62 - Arrangements specially adapted for specific applications, e.g. for traffic information or for mobile receivers for local area broadcast, e.g. instore broadcast for transportation systems, e.g. in vehicles
H04N 21/41 - Structure of client; Structure of client peripherals
H04N 21/422 - Input-only peripherals, e.g. global positioning system [GPS]
H04R 3/12 - Circuits for transducers for distributing signals to two or more loudspeakers
H04W 4/02 - Services making use of location information
H04W 4/44 - Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P] for communication between vehicles and infrastructures, e.g. vehicle-to-cloud [V2C] or vehicle-to-home [V2H]
In one aspect, an example method to be performed by a computing device includes (a) determining that a ride-sharing session is active; (b) in response to determining the ride-sharing session is active, using a microphone of the computing device to capture audio content; (c) identifying reference audio content that has at least a threshold extent of similarity with the captured audio content; (d) determining that the ride-sharing session is inactive; and (e) outputting an indication of the identified reference audio content.
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
G06F 16/68 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
G06F 16/683 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
Example methods and systems for modifying the playback of content using pre-processed profile information are described. Example instructions, when executed, cause at least one processor to access a media stream that includes media and a profile of equalization parameters, the media stream provided to a device via a network, the profile of equalization parameters included in the media stream selected based on a comparison of a reference fingerprint to a query fingerprint generated based on the media, the profile of equalization parameters including an equalization parameter for the media; and modify playback of the media based on the equalization parameter specified in the accessed profile.
H04H 60/47 - Arrangements for identifying or recognising characteristics with a direct linkage to broadcast information or to broadcast space-time, e.g. for identifying broadcast stations or for identifying users for recognising genres
H04R 3/04 - Circuits for transducers for correcting frequency response
H04H 60/58 - Arrangements characterised by components specially adapted for monitoring, identification or recognition covered by groups or of audio
H04H 60/65 - Arrangements for services using the result of monitoring, identification or recognition covered by groups or for using the result on users' side
H04N 21/233 - Processing of audio elementary streams
H04N 21/234 - Processing of video elementary streams, e.g. splicing of video streams or manipulating MPEG-4 scene graphs
H04N 21/266 - Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system or merging a VOD unicast channel into a multicast channel
H04N 21/414 - Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance
H04N 21/432 - Content retrieval operation from a local storage medium, e.g. hard-disk
H04N 21/654 - Transmission by server directed to the client
In one aspect, an example method includes (i) encountering, by a media playback device of a vehicle, a trigger to update a list of currently tunable radio stations; (ii) based on encountering the trigger to update the list of currently tunable radio stations, updating, by the media playback device, the list of currently tunable radio stations using a location of the vehicle and radio station contour data stored in a local database of the media playback device; and (iii) displaying, by the media playback device, a station list using the list of currently tunable radio stations.
An example method involves comparing a primary element of a first piece of audio data to a primary element of a second piece of audio data; based on the comparing of the primary elements, determining that the first and second pieces of audio data have the same predominant mood category; in response to determining that the first and second pieces of audio data have the same predominant mood category, comparing a first mood score of the primary element of the first piece of audio data to a second mood score of the primary element of a second piece of audio data; determining that an output of the comparison of the two mood scores exceeds a threshold value; and in response to determining that the output of the comparison of the two mood scores exceeds the threshold value, providing an indicator to an application.
G10H 1/00 - ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE - Details of electrophonic musical instruments
96.
Radio Head Unit with Dynamically Updated Tunable Channel Listing
In one aspect, an example method includes (i) encountering, by a media playback device of a vehicle, a trigger to update a list of currently tunable radio stations; (ii) based on encountering the trigger to update the list of currently tunable radio stations, updating, by the media playback device, the list of currently tunable radio stations using a location of the vehicle and radio station contour data stored in a local database of the media playback device; and (iii) displaying, by the media playback device, a station list using the list of currently tunable radio stations.
H04H 60/73 - Systems specially adapted for using specific information, e.g. geographical or meteorological information using meta-information
H04H 40/18 - Arrangements characterised by circuits or components specially adapted for receiving
H04H 60/41 - Arrangements for identifying or recognising characteristics with a direct linkage to broadcast information or to broadcast space-time, e.g. for identifying broadcast stations or for identifying users for identifying broadcast time or space for identifying broadcast space, i.e. broadcast channels, broadcast stations or broadcast areas
97.
RADIO HEAD UNIT WITH DYNAMICALLY UPDATED TUNABLE CHANNEL LISTING
In one aspect, an example method includes (i) encountering, by a media playback device of a vehicle, a trigger to update a list of currently tunable radio stations; (ii) based on encountering the trigger to update the list of currently tunable radio stations, updating, by the media playback device, the list of currently tunable radio stations using a location of the vehicle and radio station contour data stored in a local database of the media playback device; and (iii) displaying, by the media playback device, a station list using the list of currently tunable radio stations.
Methods and systems are disclosed for generating general feature vectors (GFVs), each simultaneously constructed for separate tasks of image reconstruction and fingerprint-based image discrimination. The computing system may include machine-learning-based components configured for extracting GFVs from images, signal processing for both transmission and reception and recovery of the extracted GFVs, generating reconstructed images from the recovered GFVs, and discriminating between fingerprints generated from the recovered GFVs and query fingerprints generated from query GFVs. A set of training images may be received at the computing system. In each of one or more training iterations over the set of training images, the components may be jointly trained with each training image of the set by minimizing a joint loss function computed as a sum of losses due to signal processing and recovery, image reconstruction, and fingerprint discrimination. The trained components may be configured for runtime implementation among one or more computing devices.
In one aspect, an example method includes (i) receiving a first group of video content items; (ii) identifying from among the first group of video content items, a second group of video content items having a threshold extent of similarity with each other; (iii) determining a quality score for each video content item of the second group; (iv) identifying from among the second group of video content items, a third group of video content items each having a quality score that exceeds a quality score threshold; and (v) based on the identifying of the third group, transmitting at least a portion of at least one video content item of the identified third group to a digital video-effect (DVE) system, wherein the system is configured for using the at least the portion of the at least one video content item of the identified third group to generate a video content item.
Methods and apparatus for harmonic source enhancement are disclosed herein. An example apparatus includes an interface to receive a media signal. The example apparatus also includes a harmonic source enhancer to determine a magnitude spectrogram of audio corresponding to the media signal; generate a time-frequency mask based on the magnitude spectrogram; and apply the time-frequency mask to the magnitude spectrogram to enhance a harmonic source of the media signal.
H03G 5/00 - Tone control or bandwidth control in amplifiers
H04R 1/32 - Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
G10K 11/175 - Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
G10L 25/18 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band