A method of building a new voice having a new timbre using a timbre vector space includes receiving timbre data filtered using a temporal receptive field. The timbre data is mapped in the timbre vector space. The timbre data is related to a plurality of different voices. Each of the plurality of different voices has respective timbre data in the timbre vector space. The method builds the new timbre using the timbre data of the plurality of different voices using a machine learning system.
G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 19/018 - Audio watermarking, i.e. embedding inaudible data in the audio signal
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
2.
USER INTERFACE FOR CONTENT MODERATION FOR VOICE CHAT
A content moderation system analyzes speech, or characteristics thereof, and determines a toxicity score representing the likelihood that a given clip of speech is toxic. A user interface displays a timeline with various instances of toxicity by one or more users for a give session. The user interface is optimized for moderation interaction, and shows how the conversation containing toxicity evolves over the time domain of a conversation.
H04L 12/18 - Arrangements for providing special services to substations for broadcast or conference
G06F 3/0482 - Interaction with lists of selectable items, e.g. menus
G06F 3/0484 - Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
G10L 25/27 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique
G10L 25/63 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination for estimating an emotional state
H04L 65/403 - Arrangements for multi-party communication, e.g. for conferences
3.
MULTI-STAGE ADAPTIVE SYSTEM FOR CONTENT MODERATION
A toxicity moderation system has an input configured to receive speech from a speaker. The system includes a multi-stage toxicity machine learning system having a first stage and a second stage. The first stage is trained to analyze the received speech to determine whether a toxicity level of the speech meets a toxicity threshold. The first stage is also configured to filter-through, to the second stage, speech that meets the toxicity threshold, and is further configured to filter-out speech that does not meet the toxicity threshold.
G10L 25/63 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination for estimating an emotional state
A method of building a new voice having a new timbre using a timbre vector space includes receiving timbre data filtered using a temporal receptive field. The timbre data is mapped in the timbre vector space. The timbre data is related to a plurality of different voices. Each of the plurality of different voices has respective timbre data in the timbre vector space. The method builds the new timbre using the timbre data of the plurality of different voices using a machine learning system.
G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 19/018 - Audio watermarking, i.e. embedding inaudible data in the audio signal
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
A method for online voice content moderation provides a multi-stage voice content analysis system. The system includes a pre-moderator stage having a toxicity scorer configured to provide a toxicity score for a given toxic speech content from a user. The toxicity score is a function of a platform content policy. The method generates a toxicity score for the given toxic speech content. The toxic speech content is provided to a moderator as a function of the toxicity score.
A content moderation system analyzes speech, or characteristics thereof, and determines a toxicity score representing the likelihood that a given clip of speech is toxic. A user interface displays a timeline with various instances of toxicity by one or more users for a give session. The user interface is optimized for moderation interaction, and shows how the conversation containing toxicity evolves over the time domain of a conversation.
H04L 12/18 - Arrangements for providing special services to substations for broadcast or conference
G06F 3/0482 - Interaction with lists of selectable items, e.g. menus
G06F 3/0484 - Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
G10L 25/27 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique
G10L 25/63 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination for estimating an emotional state
H04L 65/403 - Arrangements for multi-party communication, e.g. for conferences
A method for online voice content moderation provides a multi-stage voice content analysis system. The system includes a pre-moderator stage having a toxicity scorer configured to provide a toxicity score for a given toxic speech content from a user. The toxicity score is a function of a platform content policy. The method generates a toxicity score for the given toxic speech content. The toxic speech content is provided to a moderator as a function of the toxicity score.
Illustrative embodiments employ trained artificial intelligence to provide real-time (e.g., zero introduced latency), or near-real-time (e.g., less than 500 ms of introduced latency), moderation of a verbal communication, without the need for human moderators.
Illustrative embodiments employ trained artificial intelligence to provide real-time (e.g., zero introduced latency), or near-real-time (e.g., less than 500 ms of introduced latency), moderation of a verbal communication, without the need for human moderators.
By using predictive technology with pre-defined knowledge of undesirable content (e.g., speech to be redacted from a verbal communication), undesirable content of a verbal communication (e.g., human speech or text-to-speech communication) may be censored, as the verbal communication is created. Prediction of undesirable content may be based on context of the initial audio communication (e.g., words preceding the offensive language) and/or the phonetic content of the verbal communication preceding the undesirable content, and/or the phonetic content of the undesirable content itself (e.g., the first sounds of offensive language).
A63F 13/67 - Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor adaptively or by learning from player actions, e.g. skill level adjustment or by storing successful combat sequences for re-use
G10L 15/187 - Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
G10L 15/197 - Probabilistic grammars, e.g. word n-grams
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
Illustrative embodiments employ trained artificial intelligence to provide realtime (e.g., zero introduced latency), or near-real-time (e.g., less than 500 ms of introduced latency), moderation of a verbal communication, without the need for human moderators. By using predictive technology with pre-defined knowledge of undesirable content (e.g., speech to be redacted from a verbal communication), undesirable content of a verbal communication (e.g., human speech or text-to-speech communication) may be censored, as the verbal communication is created. Prediction of undesirable content may be based on context of the initial audio communication (e.g., words preceding the offensive language) and / or the phonetic content of the verbal communication preceding the undesirable content, and/ or the phonetic content of the undesirable content itself (e.g., the first sounds of offensive language).
A toxicity moderation system has an input configured to receive speech from a speaker. The system includes a multi-stage toxicity machine learning system having a first stage and a second stage. The first stage is trained to analyze the received speech to determine whether a toxicity level of the speech meets a toxicity threshold. The first stage is also configured to filter-through, to the second stage, speech that meets the toxicity threshold, and is further configured to filter-out speech that does not meet the toxicity threshold.
A toxicity moderation system has an input configured to receive speech from a speaker. The system includes a multi-stage toxicity machine learning system having a first stage and a second stage. The first stage is trained to analyze the received speech to determine whether a toxicity level of the speech meets a toxicity threshold. The first stage is also configured to filter-through, to the second stage, speech that meets the toxicity threshold, and is further configured to filter-out speech that does not meet the toxicity threshold.
G10L 25/63 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination for estimating an emotional state
09 - Scientific and electric apparatus and instruments
Goods & Services
Downloadable computer programs for editing and altering
sound; downloadable application software that alters and
modifies the properties of a sound recording; downloadable
application software for adding sound effects to sound
recordings; downloadable software applications for enhancing
audio recordings; downloadable computer software for use in
sound database management, system administration, for
generating and processing sound signals, and for converting
analog and digital sound signals.
A method of building a new voice having a new timbre using a timbre vector space includes receiving timbre data filtered using a temporal receptive field. The timbre data is mapped in the timbre vector space. The timbre data is related to a plurality of different voices. Each of the plurality of different voices has respective timbre data in the timbre vector space. The method builds the new timbre using the timbre data of the plurality of different voices using a machine learning system.
G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G10L 19/018 - Audio watermarking, i.e. embedding inaudible data in the audio signal
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
09 - Scientific and electric apparatus and instruments
Goods & Services
(1) Downloadable computer programs for editing and altering sound; downloadable application software that alters and modifies the properties of a sound recording; downloadable application software for adding sound effects to sound recordings; downloadable software applications for enhancing audio recordings; downloadable computer software for use in sound database management, system administration, for generating and processing sound signals, and for converting analog and digital sound signals.
15.
Generation and detection of watermark for real-time voice conversion
A method watermarks speech data by using a generator to generate speech data including a watermark. The generator is trained to generate the speech data including the watermark. The training process generates first speech from the generator. The first speech data is configured to represent speech. The first speech data includes a candidate watermark. The training also produces an inconsistency message as a function of at least one difference between the first speech data and at least authentic speech data. The training further includes transforming the first speech data, including the candidate watermark, using a watermark robustness module to produce transformed speech data including a transformed candidate watermark. The transformed speech data includes a transformed candidate watermark. The training further produces a watermark-detectability message, using a watermark detection machine learning system, relating to one or more desirable watermark features of the transformed candidate watermark.
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
16.
GENERATION AND DETECTION OF WATERMARK FOR REAL-TIME VOICE CONVERSION
A method watermarks speech data by using a generator to generate speech data including a watermark. The generator is trained to generate the speech data including the watermark. The training process generates first speech from the generator. The first speech data is configured to represent speech. The first speech data includes a candidate watermark. The training also produces an inconsistency message as a function of at least one difference between the first speech data and at least authentic speech data. The training further includes transforming the first speech data, including the candidate watermark, using a watermark robustness module to produce transformed speech data including a transformed candidate watermark. The transformed speech data includes a transformed candidate watermark. The training further produces a watermark-detectability message, using a watermark detection machine learning system, relating to one or more desirable watermark features of the transformed candidate watermark.
G10L 19/018 - Audio watermarking, i.e. embedding inaudible data in the audio signal
G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
42 - Scientific, technological and industrial services, research and design
Goods & Services
Providing temporary use of on-line non-downloadable software for monitoring, analyzing and managing online platform user communications and interactions and policing online platform behavior in view of user community behavior standards
09 - Scientific and electric apparatus and instruments
Goods & Services
Downloadable software featuring computer programs for editing and altering sound; downloadable software for creating, enhancing and supplementing audio effects in only games and entertainment platforms
A method of building a new voice having a new timbre using a timbre vector space includes receiving timbre data filtered using a temporal receptive field. The timbre data is mapped in the timbre vector space. The timbre data is related to a plurality of different voices. Each of the plurality of different voices has respective timbre data in the timbre vector space. The method builds the new timbre using the timbre data of the plurality of different voices using a machine learning system.
G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G10L 19/018 - Audio watermarking, i.e. embedding inaudible data in the audio signal
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
09 - Scientific and electric apparatus and instruments
Goods & Services
Downloadable computer programs for editing and altering sound; downloadable application software that alters and modifies the properties of a sound recording; downloadable application software for adding sound effects to sound recordings; downloadable software applications for enhancing audio recordings; downloadable computer software for use in sound database management, system administration, for generating and processing sound signals, and for converting analog and digital sound signals
A method of building a new voice having a new timbre using a timbre vector space includes receiving timbre data filtered using a temporal receptive field. The timbre data is mapped in the timbre vector space. The timbre data is related to a plurality of different voices. Each of the plurality of different voices has respective timbre data in the timbre vector space. The method builds the new timbre using the timbre data of the plurality of different voices using a machine learning system.
G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G10L 19/018 - Audio watermarking, i.e. embedding inaudible data in the audio signal
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
A method of building a speech conversion system uses target information from a target voice and source speech data. The method receives the source speech data and the target timbre data, which is within a timbre space. A generator produces first candidate data as a function of the source speech data and the target timbre data. A discriminator compares the first candidate data to the target timbre data with reference to timbre data of a plurality of different voices. The discriminator determines inconsistencies between the first candidate data and the target timbre data. The discriminator produces an inconsistency message containing information relating to the inconsistencies. The inconsistency message is fed back to the generator, and the generator produces a second candidate data. The target timbre data in the timbre space is refined using information produced by the generator and/or discriminator as a result of the feeding back.
G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 19/018 - Audio watermarking, i.e. embedding inaudible data in the audio signal
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
A timbre vector space construction system for building a timbre vector space has an input. The input is configured to receive a first speech segment in a first voice and a second speech segment in a second voice. The system also includes a temporal receptive field to transform the first speech segment into a first plurality of analytical segments, and the second speech segment into a second plurality of analytical segments. Each of the first plurality of smaller analytical segments, and each of the second plurality of analytical segments have a frequency distribution that represents a different portion of the timbre data of the respective voices. The system also includes a machine learning system configured to map the first voice relative to the second voice in the timbre vector space as a function of the frequency distribution of the first plurality of analytical segments the second plurality of analytical segments.
G10L 21/00 - Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G10L 19/018 - Audio watermarking, i.e. embedding inaudible data in the audio signal
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks