Soundhound AI IP, LLC

United States of America

1-100 of 190 for Soundhound AI IP, LLC

Sort by

Query


Aggregations
IP Type
Patent	186
Trademark	4

Jurisdiction
United States	187
World	3

Date
New (last 4 weeks)	2
2025 February	2
2025 January	4
2024 December	2
2024 November	3
2025 (YTD)	6
2024	31
2023	17
2022	26
2021	37
2020	10
Before 2020	63
See more See less
IPC Class
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog	75
G10L 15/18 - Speech classification or search using natural language modelling	54
G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice	32
G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications	32
G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit	29
G10L 15/26 - Speech to text systems	28
G10L 15/08 - Speech classification or search	23
G06F 3/16 - Sound inputSound output	20
G06F 17/30 - Information retrieval; Database structures therefor	19
G06F 40/30 - Semantic analysis	17
G10L 15/00 - Speech recognition	16
G10L 15/19 - Grammatical context, e.g. disambiguation of recognition hypotheses based on word sequence rules	15
G06Q 30/02 - MarketingPrice estimation or determinationFundraising	14
G10L 15/16 - Speech classification or search using artificial neural networks	14
G06F 17/27 - Automatic analysis, e.g. parsing, orthograph correction	11
G06F 40/205 - Parsing	11
G06F 40/211 - Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars	9
G06F 40/253 - Grammatical analysisStyle critique	9
G06F 16/242 - Query formulation	8
G06F 40/40 - Processing or translation of natural language	8
G10L 15/04 - SegmentationWord boundary detection	8
G06F 17/00 - Digital computing or data processing equipment or methods, specially adapted for specific functions	7
G06F 16/2457 - Query processing with adaptation to user needs	5
G06F 40/284 - Lexical analysis, e.g. tokenisation or collocates	5
G06Q 30/0251 - Targeted advertisements	5
G10L 15/197 - Probabilistic grammars, e.g. word n-grams	5
G10L 17/00 - Speaker identification or verification techniques	5
G10L 17/04 - Training, enrolment or model building	5
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination	5
G06N 20/00 - Machine learning	4
See more See less
NICE Class
09 - Scientific and electric apparatus and instruments	3
42 - Scientific, technological and industrial services, research and design	3
38 - Telecommunications services	1

Status
Pending	52
Registered / In Force	138

1 2 Next Page

1. DERIVING ACOUSTIC FEATURES AND LINGUISTIC FEATURES FROM RECEIVED SPEECH AUDIO

Application Number	18945442
Status	Pending
Filing Date	2024-11-12
First Publication Date	2025-02-27
Owner	SoundHound AI IP, LLC (USA)
Inventor	Lokeswarappa, Kiran Garaga Gedalius, Joel Mont-Reynaud, Bernard Huang, Jun

Abstract

A computer-implemented method is provided. The method including receiving speech audio of dictation associated with a user ID, deriving acoustic features from the speech audio, storing the derived acoustic features in a user profile associated with the user ID, receiving a request for acoustic features through an application programming interface (API), the request including the user ID, and sending the derived acoustic features through the API.

IPC Classes ?

G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
G06F 40/205 - Parsing
G06F 40/211 - Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
G06F 40/253 - Grammatical analysisStyle critique
G06N 20/00 - Machine learning
G06Q 30/0241 - Advertisements
G06Q 30/0251 - Targeted advertisements
G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G10L 15/18 - Speech classification or search using natural language modelling
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 15/26 - Speech to text systems
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
G10L 25/90 - Pitch determination of speech signals
H04L 67/306 - User profiles

2. METHOD AND SYSTEM FOR ACOUSTIC MODEL CONDITIONING ON NON-PHONEME INFORMATION FEATURES

Application Number	18928627
Status	Pending
Filing Date	2024-10-28
First Publication Date	2025-02-13
Owner	SoundHound AI IP, LLC. (USA)
Inventor	Gowayyed, Zizu Mohajer, Keyvan

Abstract

A method and system for acoustic model conditioning on non-phoneme information features for optimized automatic speech recognition is provided. The method includes using an encoder model to encode sound embedding from a known key phrase of speech and conditioning an acoustic model with the sound embedding to optimize its performance in inferring the probabilities of phonemes in the speech. The sound embedding can comprise non-phoneme information related to the key phrase and the following utterance. Further, the encoder model and the acoustic model can be neural networks that are jointly trained with audio data.

IPC Classes ?

G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
G10L 15/04 - SegmentationWord boundary detection
G10L 15/08 - Speech classification or search
G10L 15/16 - Speech classification or search using artificial neural networks
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks

3. ARTIFICIAL INTELLIGENCE SMART ANSWERING ARCHITECTURE

Application Number	US2024038711
Publication Number	2025/024260
Status	In Force
Filing Date	2024-07-19
Publication Date	2025-01-30
Owner	SOUNDHOUND AI IP, LLC (USA)
Inventor	Stonehocker, Timothy P. Mohajer, Kamyar

Abstract

An automated answering system and method are disclosed for use in providing automated customer service. The automated answering system uses generative artificial intelligence to aid in forming a knowledgebase of information regarding a merchant's business that is used in answering the customer queries. The automated answering system of the present technology also uses generative artificial intelligence to aid in formulating a response to queries using the formed knowledgebase.

IPC Classes ?

G06F 16/9032 - Query formulation

4. ARTIFICIAL INTELLIGENCE SMART ANSWERING ARCHITECTURE

Application Number	18356659
Status	Pending
Filing Date	2023-07-21
First Publication Date	2025-01-23
Owner	SoundHound AI IP, LLC (USA)
Inventor	Stonehocker, Timothy P. Mohajer, Kamyar

Abstract

IPC Classes ?

G06Q 30/016 - After-sales
G06N 5/046 - Forward inferencingProduction systems

5. METHOD AND SYSTEM FOR CONVERSATION TRANSCRIPTION WITH METADATA

Application Number	18889219
Status	Pending
Filing Date	2024-09-18
First Publication Date	2025-01-09
Owner	SoundHound AI IP, LLC. (USA)
Inventor	Bradley, Kiersten L. Coeytaux, Ethan Yin, Ziming

Abstract

Methods and systems for enabling an efficient review of meeting content via a metadata-enriched, speaker-attributed and multiuser-editable transcript are disclosed. By incorporating speaker diarization and other metadata, the system can provide a structured and effective way to review and/or edit the transcript by one or more editors. One type of metadata can be image or video data to represent the meeting content. Furthermore, the present subject matter utilizes a multimodal diarization model to identify and label different speakers. The system can synchronize various sources of data, e.g., audio channel data, voice feature vectors, acoustic beamforming, image identification, and extrinsic data, to implement speaker diarization.

IPC Classes ?

G10L 15/26 - Speech to text systems
G06F 40/134 - Hyperlinking
G06F 40/166 - Editing, e.g. inserting or deleting
G06F 40/284 - Lexical analysis, e.g. tokenisation or collocates
G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G10L 15/07 - Adaptation to the speaker

6. USING A SPECIALIST GRAMMAR TO ENABLE ORDERING FROM A MENU USING NATURAL LANGUAGE

Application Number	18891119
Status	Pending
Filing Date	2024-09-20
First Publication Date	2025-01-09
Owner	SOUNDHOUND AI IP, LLC (USA)
Inventor	Aung, Joe Kyaw Soe Garcia, Vincent Ren, Junru

Abstract

A computer system ingests a catalog of a plurality of items. The catalog is specific to a particular domain and including names for individual items of the plurality of items. One or more attributes are respectively associated to the individual items of the plurality of items. A specialist grammar specific to the particular domain of the catalog is obtained and used to interpret natural language input related to the catalog based on the names for the individual items of the plurality of items and their associated one or more attributes.

IPC Classes ?

G06F 40/295 - Named entity recognition
G06F 40/40 - Processing or translation of natural language
G06N 5/022 - Knowledge engineeringKnowledge acquisition
G10L 15/26 - Speech to text systems

7. CONTENT FILTERING IN MEDIA PLAYING DEVICES

Application Number	18823308
Status	Pending
Filing Date	2024-09-03
First Publication Date	2024-12-26
Owner	SoundHound AI IP, LLC. (USA)
Inventor	Khov, Thor S. Kong, Terry

Abstract

Various approaches relate to user defined content filtering in media playing devices of undesirable content represented in stored and real-time content from content providers. For example, video, image, and/or audio data can be analyzed to identify and classify content included in the data using various classification models and object and text recognition approaches. Thereafter, the identification and classification can be used to control presentation and/or access to the content and/or portions of the content. For example, based on the classification, portions of the content can be modified (e.g., replaced, removed, degraded, etc.) using one or more techniques (e.g., media replacement, media removal, media degradation, etc.) and then presented.

IPC Classes ?

H04N 21/454 - Content filtering, e.g. blocking advertisements
G06N 3/045 - Combinations of networks
G06V 20/40 - ScenesScene-specific elements in video content
H04N 21/44 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
H04N 21/466 - Learning process for intelligent management, e.g. learning user preferences for recommending movies

8. QUERY-SPECIFIC TARGETED AD DELIVERY

Application Number	18811530
Status	Pending
Filing Date	2024-08-21
First Publication Date	2024-12-12
Owner	SoundHound AI IP, LLC (USA)
Inventor	Master, Aaron Mohajer, Keyvan

Abstract

An audio recognition system provides for delivery of promotional content to its user. A user interface device, locally or with the assistance of a network-connected server, performs recognition of audio in response to queries. Recognition can be through a method such as processing features extracted from the audio. Audio can comprise recorded music, singing or humming, instrumental music, vocal music, spoken voice, or other recognizable types of audio. Campaign managers provide promotional content for delivery in response to audio recognized in queries.

IPC Classes ?

G06Q 30/0251 - Targeted advertisements
G06F 16/60 - Information retrievalDatabase structures thereforFile system structures therefor of audio data
G06Q 30/0241 - Advertisements
G06Q 30/0273 - Determination of fees for advertising

9. MACHINE LEARNING SYSTEM FOR DIGITAL ASSISTANTS

Application Number	18780970
Status	Pending
Filing Date	2024-07-23
First Publication Date	2024-11-14
Owner	SoundHound AI IP, LLC. (USA)
Inventor	Singh, Pranav Zhang, Yilun Mohajer, Keyvan Fazeli, Mohammadreza

Abstract

A machine learning system for a digital assistant is described, together with a method of training such a system. The machine learning system is based on an encoder-decoder sequence-to-sequence neural network architecture trained to map input sequence data to output sequence data, where the input sequence data relates to an initial query and the output sequence data represents canonical data representation for the query. The method of training involves generating a training dataset for the machine learning system. The method involves clustering vector representations of the query data samples to generate canonical-query original-query pairs in training the machine learning system.

IPC Classes ?

G06F 16/242 - Query formulation
G06N 3/045 - Combinations of networks
G06N 3/088 - Non-supervised learning, e.g. competitive learning

10. AUTOMATIC LEARNING OF ENTITIES, WORDS, PRONUNCIATIONS, AND PARTS OF SPEECH

Application Number	18783423
Status	Pending
Filing Date	2024-07-25
First Publication Date	2024-11-14
Owner	SoundHound AI IP, LLC. (USA)
Inventor	Relin, Anton V.

Abstract

Systems for automatic speech recognition and/or natural language understanding automatically learn new words by finding subsequences of phonemes that, if they were a new word, would enable a successful tokenization of a phoneme sequence. Systems can learn alternate pronunciations of words by finding phoneme sequences with a small edit distance to existing pronunciations. Systems can learn the part of speech of words by finding part-of-speech variations that would enable parses by syntactic grammars. Systems can learn what types of entities a word describes by finding sentences that could be parsed by a semantic grammar but for the words not being on an entity list.

IPC Classes ?

G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
G10L 15/14 - Speech classification or search using statistical models, e.g. Hidden Markov Models [HMM]
G10L 15/19 - Grammatical context, e.g. disambiguation of recognition hypotheses based on word sequence rules

11. SYSTEM AND METHOD FOR VOICE MORPHING IN A DATA ANNOTATOR TOOL

Application Number	18778301
Status	Pending
Filing Date	2024-07-19
First Publication Date	2024-11-07
Owner	SoundHound AI IP, LLC. (USA)
Inventor	Ross, Dylan H.

Abstract

A system and method for masking an identity of a speaker of natural language speech, such as speech clips to be labeled by humans in a system generating voice transcriptions for training an automatic speech recognition model. The natural language speech is morphed prior to being presented to the human for labeling. In one embodiment, morphing comprises pitch shifting the speech randomly either up or down, then frequency shifting the speech, then pitch shifting the speech in a direction opposite the first pitch shift. Labeling the morphed speech comprises at least one or more of transcribing the morphed speech, identifying a gender of the speaker, identifying an accent of the speaker, and identifying a noise type of the morphed speech.

IPC Classes ?

G06F 40/56 - Natural language generation
G06F 40/58 - Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G10L 15/18 - Speech classification or search using natural language modelling
G10L 19/125 - Pitch excitation, e.g. pitch synchronous innovation CELP [PSI-CELP]
G10L 19/26 - Pre-filtering or post-filtering
G10L 21/013 - Adapting to target pitch

12. SERVER SUPPORTED RECOGNITION OF WAKE PHRASES

Application Number	18771489
Status	Pending
Filing Date	2024-07-12
First Publication Date	2024-10-31
Owner	SoundHound AI IP, LLC. (USA)
Inventor	Jain, Newton Zaheer, Sameer Syed

Abstract

A server supports multiple virtual assistants. It receives requests that include wake phrase audio and an identification of the source of the request, such as a virtual assistant device. Based on the identification, the server searches a database for a wake phrase detector appropriate for the identified source. The server then applies the wake phrase detector to the received wake phrase audio. If the wake phrase audio triggers the wake phrase detector, the server provides an appropriate response to the source.

IPC Classes ?

G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G06F 8/41 - Compilation
G10L 15/08 - Speech classification or search
G10L 15/16 - Speech classification or search using artificial neural networks

13.

14. AUTOMATIC SYNCHRONIZATION FOR AN OFFLINE VIRTUAL ASSISTANT

Application Number	18752481
Status	Pending
Filing Date	2024-06-24
First Publication Date	2024-10-17
Owner	SoundHound AI IP, LLC (USA)
Inventor	Stahl, Karl

Abstract

[Object] Technology is provided to enable a mobile terminal to function as a digital assistant even when the mobile terminal is in a state where it cannot communicate with a server apparatus. [Solution] When a user terminal 200 receives a query A from a user, user terminal 200 sends query A to a server 100. Server 100 interprets the meaning of query A using a grammar A. Server 100 obtains a response to query A based on the meaning of query A and sends the response to user terminal 200. Server 100 further sends grammar A to user terminal 200. That is, server 100 sends to user terminal 200 a grammar used to interpret the query received from user terminal 200.

IPC Classes ?

G10L 15/19 - Grammatical context, e.g. disambiguation of recognition hypotheses based on word sequence rules
G06F 16/242 - Query formulation
G06F 40/253 - Grammatical analysisStyle critique
G10L 15/07 - Adaptation to the speaker
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

15. ENABLING NATURAL LANGUAGE INTERACTIONS WITH USER INTERFACES FOR USERS OF A SOFTWARE APPLICATION

Application Number	18739011
Status	Pending
Filing Date	2024-06-10
First Publication Date	2024-10-03
Owner	SoundHound AI IP, LLC (USA)
Inventor	Yabas, Utku Hubert, Philipp Stahl, Karl

Abstract

A user specifies a natural language command to a device. Software on the device generates contextual metadata about the user interface of the device, such as data about all visible elements of the user interface, and sends the contextual metadata along with the natural language command to a natural language understanding engine. The natural language understanding engine parses the natural language query using a stored grammar (e.g., a grammar provided by a maker of the device) and as a result of the parsing identifies information about the command (e.g., the user interface elements referenced by the command) and provides that information to the device. The device uses that provided information to respond to the command.

IPC Classes ?

G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G06F 40/211 - Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
G06F 40/284 - Lexical analysis, e.g. tokenisation or collocates
G10L 15/183 - Speech classification or search using natural language modelling using context dependencies, e.g. language models
G10L 15/26 - Speech to text systems

16. METHOD AND SYSTEM FOR CONVERSATION TRANSCRIPTION WITH METADATA

Application Number	18743562
Status	Pending
Filing Date	2024-06-14
First Publication Date	2024-10-03
Owner	SoundHound AI IP, LLC. (USA)
Inventor	Bradley, Kiersten L. Coeytaux, Ethan Yin, Ziming

Abstract

Methods and systems for enabling an efficient review of meeting content via a metadata-enriched, speaker-attributed transcript are disclosed. By incorporating speaker diarization and other metadata, the system can provide a structured and effective way to review and/or edit the transcript. One type of metadata can be image or video data to represent the meeting content. Furthermore, the present subject matter utilizes a multimodal diarization model to identify and label different speakers. The system can synchronize various sources of data, e.g., audio channel data, voice feature vectors, acoustic beamforming, image identification, and extrinsic data, to implement speaker diarization.

IPC Classes ?

G10L 15/26 - Speech to text systems
G06F 40/134 - Hyperlinking
G06F 40/166 - Editing, e.g. inserting or deleting
G06F 40/284 - Lexical analysis, e.g. tokenisation or collocates
G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G10L 15/07 - Adaptation to the speaker

17. DYNAMIC SERVICE LEVEL ASSIGNMENT SYSTEM FOR DATA PROCESSING MANAGER

Application Number	18637771
Status	Pending
Filing Date	2024-04-17
First Publication Date	2024-09-05
Owner	SOUNDHOUND AI IP, LLC (USA)
Inventor	Stonehocker, Tim Gowayyed, Zizo Emami, Mijad Eichstaedt, Matthias Jiang, Evelyn Berryhill, Ryan Ramona, Mathieu Veira, Neil

Abstract

A data processing system includes a queue manager receiving data processing requests and determining a queue depth representing the number of pending requests. A load supervisor assigns a service level to each request based on the queue depth when the request is at the head of the queue. The system offers two service levels, with the second level requiring fewer computing resources than the first. This dynamic management system optimizes resource allocation by adjusting service levels based on the workload, ensuring efficient processing of data requests.

IPC Classes ?

G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
G10L 15/16 - Speech classification or search using artificial neural networks
G10L 15/26 - Speech to text systems

18. METHOD FOR PROVIDING INFORMATION, METHOD FOR GENERATING DATABASE, AND PROGRAM

Application Number	18662973
Status	Pending
Filing Date	2024-05-13
First Publication Date	2024-09-05
Owner	SoundHound AI IP, LLC. (USA)
Inventor	Naito, Masaki Tsuchida, Keisuke Yoneyama, Jun Sawada, Kaku

Abstract

As audio (1) is input to an extension of a browser, the extension transmits the audio (1) to a language processing server. A speech recognition unit obtains a text (1) corresponding to the audio (1), and transmits the text (1) to a natural language understanding unit. In the natural language understanding unit, an information processing unit identifies a URL (1) corresponding to the text (1), and transmits the URL (1) to the browser. The extension passes the URL (1) to a browsing function. The browsing function uses the URL (1) to access a web server. The web server transmits a web page (1) corresponding to the URL (1) to the browser. The browsing function shows a screen corresponding to the web page (1) on a display.

IPC Classes ?

G06F 16/955 - Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
G06F 16/33 - Querying
G06F 40/40 - Processing or translation of natural language
G10L 15/26 - Speech to text systems

19. MULTI-MODAL AUDIO PROCESSING

Application Number	18642492
Status	Pending
Filing Date	2024-04-22
First Publication Date	2024-08-15
Owner	SOUNDHOUND AI IP, LLC (USA)
Inventor	Stahl, Karl

Abstract

A method for processing an audio signal involves receiving sound waves at a microphone, converting them into a first audio signal, and extracting a second audio signal from an electromagnetic signal received at a receiver. The first audio signal is correlated with the second audio signal to calculate a correlation value. If the correlation value exceeds a threshold, the first audio signal is processed using the second audio signal to reduce unwanted sound contributions, resulting in a processed audio signal. Further processing is then performed on the processed audio signal to determine a characteristic of the desired sound.

IPC Classes ?

H04R 1/10 - EarpiecesAttachments therefor
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 21/0316 - Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
G10L 25/06 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being correlation coefficients
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
H04R 1/08 - MouthpiecesAttachments therefor
H04R 5/033 - Headphones for stereophonic communication

20. SEMANTICALLY CONDITIONED VOICE ACTIVITY DETECTION

Application Number	18047650
Status	Pending
Filing Date	2022-10-19
First Publication Date	2024-07-11
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Leitman, Victor

Abstract

A method includes recognizing words comprised by a first utterance; interpreting the recognized words according to a grammar comprised by a domain; from the interpreting of the recognized words, determining a timeout period for the first utterance based on the domain of the first utterance; detecting end of voice activity in the first utterance; executing an instruction following an amount of time after detecting end of voice activity of the first utterance in response to the amount of time exceeding the timeout period, the executed instruction based at least in part on interpreting the recognized words.

IPC Classes ?

G10L 15/197 - Probabilistic grammars, e.g. word n-grams
G10L 15/18 - Speech classification or search using natural language modelling
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 25/78 - Detection of presence or absence of voice signals

21. MULTI-PARTICIPANT VOICE ORDERING

Application Number	18391886
Status	Pending
Filing Date	2023-12-21
First Publication Date	2024-06-27
Owner	SoundHound AI IP, LLC (USA)
Inventor	Macrae, Robert Grossman, Jon Halstvedt, Scott

Abstract

A voice interface recognizes spoken utterances from multiple users. It responds to the utterances in ways such as modifying the attributes of instances of items. The voice interface computes a voice vector for each utterance and associates it with the item instance that is modified. For following utterances with a closely matching voice vector, the voice interface modifies the same instance. For following utterances with a voice vector that is not a close match to one stored for any item instance, the voice interface modifies a different item instance.

IPC Classes ?

G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G06F 3/16 - Sound inputSound output
G06Q 50/12 - Hotels or restaurants
G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit

22. MULTI-PARTICIPANT VOICE ORDERING

Application Number	US2023085627
Publication Number	2024/138102
Status	In Force
Filing Date	2023-12-22
Publication Date	2024-06-27
Owner	SOUNDHOUND AI IP, LLC (USA)
Inventor	Macrae, Robert Grossman, Jon Halstvedt, Scott

Abstract

IPC Classes ?

G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G06F 3/16 - Sound inputSound output
G06Q 30/0601 - Electronic shopping [e-shopping]
G10L 17/00 - Speaker identification or verification techniques

23.

24. SYSTEM AND METHOD FOR ADAPTED INTERACTIVE EXPERIENCES

Application Number	18440935
Status	Pending
Filing Date	2024-02-13
First Publication Date	2024-06-06
Owner	SoundHound AI IP, LLC (USA)
Inventor	Mckenzie, Joel Zhang, Qindi

Abstract

Natural language grammars interpret expressions at the conversational human-machine interfaces of devices. Under conditions favoring engagement, as specified in a unit of conversational code, the device initiates a discussion using one or more of TTS, images, video, audio, and animation depending on the device capabilities of screen and audio output. Conversational code units specify conditions based on conversation state, mood, and privacy. Grammars provide intents that cause calls to system functions. Units can provide scripts for guiding the conversation. The device, or supporting server system, can provide feedback to creators of the conversational code units for analysis and machine learning.

IPC Classes ?

G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G06Q 30/0251 - Targeted advertisements
G10L 15/18 - Speech classification or search using natural language modelling
G10L 15/19 - Grammatical context, e.g. disambiguation of recognition hypotheses based on word sequence rules

25. REAL-TIME NATURAL LANGUAGE PROCESSING AND FULFILLMENT

Application Number	US2023079577
Publication Number	2024/107682
Status	In Force
Filing Date	2023-11-14
Publication Date	2024-05-23
Owner	SOUNDHOUND AI IP, LLC (USA)
Inventor	Grossmann, Jon Macrae, Robert Halstvedt, Scott Mohajer, Keyvan

Abstract

A system and method of real-time feedback confirmation to solicit a virtual assistant response from an evolving semantic state of at least a portion of an utterance. A user accesses a virtual assistant on an electronic device having the system and/or method configured to capture a command, a question, and/or a fulfillment request from audio such as, the speech emitted from the speaking user. The speech may be intercepted by a speech engine configured to transcribe the speech into text that is matched with the fragment pattern's regular expression to generate a fragment and/or the speech may be processed with a machine learning model to identify fragments. The fragments are identified by a domain handler configured to update a data structure of the current semantic state of the utterance in real-time on an interface of an electronic device.

IPC Classes ?

G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog

26. REAL-TIME NATURAL LANGUAGE PROCESSING AND FULFILLMENT

Application Number	18055821
Status	Pending
Filing Date	2022-11-15
First Publication Date	2024-05-16
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Grossmann, Jon Macrae, Robert Halstvedt, Scott Mohajer, Keyvan

Abstract

IPC Classes ?

G10L 15/18 - Speech classification or search using natural language modelling
G06F 3/16 - Sound inputSound output
G06F 40/30 - Semantic analysis
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog

27. DOMAIN SPECIFIC NEURAL SENTENCE GENERATOR FOR MULTI-DOMAIN VIRTUAL ASSISTANTS

Application Number	18050182
Status	Pending
Filing Date	2022-10-27
First Publication Date	2024-05-02
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Singh, Pranav Zhang, Yilun Na, Eunjee Bettaglio, Olivia

Abstract

Automatically generating sentences that a user can say to invoke a set of defined actions performed by a virtual assistant are disclosed. A sentence is received and keywords are extracted from the sentence. Based on the keywords, additional sentences are generated. A classifier model is applied to the generated sentences to determine a sentence that satisfies a threshold. In the situation a sentence satisfies the threshold, an intent associated with the classifier model can be invoked. In the situation the sentences fail to satisfy the classifier model, the virtual assistant can attempt to interpret the received sentence according to the most likely intent by invoking a sentence generation model fine-tuned for a particular domain, generate additional sentences with a high probability of having the same intent and fulfill the specific action defined by the intent.

IPC Classes ?

G10L 15/18 - Speech classification or search using natural language modelling
G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog

28. TEXT-TO-SPEECH SYSTEM WITH VARIABLE FRAME RATE

Application Number	18051507
Status	Pending
Filing Date	2022-10-31
First Publication Date	2024-05-02
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Pearson, Steve Grossman, Jon

Abstract

A neural TTS system is trained to generate key acoustic frames at variable rates while omitting other frames. The frame skipping depends on the acoustic features to be generated for the input text. The TTS system can interpolate frames between the key frames at a target rate for a vocoder to synthesis audio samples.

IPC Classes ?

G10L 13/047 - Architecture of speech synthesisers
G10L 13/06 - Elementary speech units used in speech synthesisersConcatenation rules

29. ADAPTING AN UTTERANCE CUT-OFF PERIOD WITH USER SPECIFIC PROFILE DATA

Application Number	18401770
Status	Pending
Filing Date	2024-01-02
First Publication Date	2024-04-25
Owner	SOUNDHOUND AI IP HOLDING, LLC (USA) SOUNDHOUND AI IP, LLC (USA)
Inventor	Aguayo, Patricia Pozon Zhang, Jennifer Hee Young Probell, Jonah

Abstract

A system detects a period of non-voice activity and compares its duration to a cutoff period. The system adapts the cutoff period based on parsing previously-recognized speech of a user that is stored on a user's device or the system, which detects the voice activity, to determine according to a model, such as a machine-learned model, the probability that the speech recognized so far is a prefix to a longer complete utterance. The cutoff period is longer when a parse of previously recognized speech, which is based on the user profile, has a high probability of being a prefix of a longer utterance.

IPC Classes ?

G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 15/05 - Word boundary detection
G10L 15/18 - Speech classification or search using natural language modelling
G10L 25/78 - Detection of presence or absence of voice signals

30. Automatic Speech Recognition with Voice Personalization and Generalization

Application Number	18046137
Status	Pending
Filing Date	2022-10-12
First Publication Date	2024-04-18
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Mohajer, Keyvan

Abstract

A voice morphing model can transform diverse voices to one or a small number of target voices. An acoustic model can be trained for high accuracy on the target voices. Speech recognition on diverse voices can be performed by morphing it to a target voice and then performing recognition on audio with the target voice. The morphing model and an acoustic model for speech recognition can be trained separately or jointly. A voice morphing model can transform diverse voices to one or a small number of target voices. An acoustic model can be trained for high accuracy on the target voices. Speech recognition on diverse voices can be performed by morphing it to a target voice and then performing recognition on audio with the target voice. The morphing model and an acoustic model for speech recognition can be trained separately or jointly. A source of requests for speech recognition can pass audio and a voiceprint with requests. Speech recognition can run with improved accuracy by biasing an acoustic model for the voice in the audio using the voiceprint. The audio can be used to calculate a new voiceprint, which can be used to update the voiceprint included with the audio. The updated voiceprint can be sent back to the source and then used with future speech recognition requests.

IPC Classes ?

G10L 15/18 - Speech classification or search using natural language modelling
G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice

31. MESSAGE PROCESSING METHOD, INFORMATION PROCESSING APPARATUS, AND PROGRAM

Application Number	18456219
Status	Pending
Filing Date	2023-08-25
First Publication Date	2024-02-29
Owner	SoundHound AI IP, LLC. (USA)
Inventor	Matsuda, Yuki Tsuchida, Keisuke

Abstract

[Object] To provide a technique for more accurate interpretation of a message inputted by a user. [Object] To provide a technique for more accurate interpretation of a message inputted by a user. [Solving Means] An information processing server 300 obtains a first message from a user in a thread 001, has a context of the first message stored in a context database 500 in association with the thread 001, obtains a second message from the user in the thread 001, and provides the second message to a conversation server 400 together with the context of the first message.

IPC Classes ?

H04L 51/04 - Real-time or near real-time messaging, e.g. instant messaging [IM]
H04L 51/02 - User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail using automatic reactions or user delegation, e.g. automatic replies or chatbot-generated messages
H04L 51/216 - Handling conversation history, e.g. grouping of messages in sessions or threads

32. VIRTUAL ASSISTANT DOMAIN FUNCTIONALITY

Application Number	18493522
Status	Pending
Filing Date	2023-10-24
First Publication Date	2024-02-15
Owner	SoundHound AI IP, LLC (USA)
Inventor	Mohajer, Kamyar Mohajer, Keyvan Mont-Reynaud, Bernard Singh, Pranav

Abstract

Aspects include methods, systems, and computer-program products providing virtual assistant domain functionality. A natural language query including one or more words is received. A collection of natural language modules is accessed. The collection natural language modules are configured to process sets of natural language queries. A natural language module, from the collection of natural language modules, is identified to interpret the natural language query. An interpretation of the natural language query is computed using the identified natural language module. A response to the natural language query is returned using the computed interpretation.

IPC Classes ?

G06F 40/40 - Processing or translation of natural language
G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
G06Q 30/0283 - Price estimation or determination
G06Q 20/10 - Payment architectures specially adapted for electronic funds transfer [EFT] systemsPayment architectures specially adapted for home banking systems

33. Authorization of Action by Voice Identification

Application Number	17818628
Status	Pending
Filing Date	2022-08-09
First Publication Date	2024-02-15
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Hassan, Ahmadul Hom, James

Abstract

Actions are authorized by computing a confidence score that exceeds a threshold. The confidence score is based on a match between metadata about requests and fields in corresponding database records. The confidences score weights matches by the dependability of the metadata for authentication. The confidence score is further based on the closeness of a sample of speech audio to a stored voiceprint. Additional identification may be required for authorization. The confidence score requirement may be relaxed based on identification in a buffer of recent action requests.

IPC Classes ?

G06F 21/32 - User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
G10L 17/12 - Score normalisation
G06F 3/16 - Sound inputSound output

34. USING SEMANTIC GRAMMAR EXTENSIBILITY FOR COLLECTIVE ARTIFICIAL INTELLIGENCE

Application Number	18381593
Status	Pending
Filing Date	2023-10-18
First Publication Date	2024-02-08
Owner	SOUNDHOUND AI IP, LLC (USA)
Inventor	Mont-Reynaud, Bernard Wilson, Christopher S. Mohajer, Keyvan

Abstract

Support for natural language expressions is provided by the use of semantic grammars that describe the structure of expressions in that grammar and that construct the meaning of a corresponding natural language expression. A semantic grammar extension mechanism is provided, which allows one semantic grammar to be used in the place of another semantic grammar. This enriches the expressivity of semantic grammars in a simple, natural, and decoupled manner.

IPC Classes ?

G06F 40/30 - Semantic analysis
G06F 8/20 - Software design

35. MEANING INFERENCE FROM SPEECH AUDIO

Application Number	18474853
Status	Pending
Filing Date	2023-09-26
First Publication Date	2024-02-08
Owner	SoundHound AI IP, LLC (USA)
Inventor	Krishnaswamy, Sudharsan Wieman, Maisy Probell, Jonah

Abstract

A system and method invoke virtual assistant action, which may comprise an argument. From audio, a probability of an intent is inferred. A probability of a domain and a plurality of variable values may also be inferred. Invoking the action is in response to the intent probability exceeding a threshold. Invoking the action may also be in response to the domain probability exceeding a threshold, a variable value probability exceeding a threshold, detecting an end of utterance, and a specific amount of time having elapsed. The intent probability may increase when the audio includes speech of words with the same meaning in multiple natural languages. Invoking the action may also be conditional on the variable value exceeding its threshold within a certain period of time of the intent probability exceeding its threshold.

IPC Classes ?

G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G10L 15/16 - Speech classification or search using artificial neural networks
G10L 15/18 - Speech classification or search using natural language modelling
G10L 13/02 - Methods for producing synthetic speechSpeech synthesisers
G10L 15/197 - Probabilistic grammars, e.g. word n-grams
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 15/187 - Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams

36. TRAINING A DEVICE SPECIFIC ACOUSTIC MODEL

Application Number	18379618
Status	Pending
Filing Date	2023-10-12
First Publication Date	2024-02-01
Owner	SoundHound AI IP, LLC (USA)
Inventor	Mohajer, Keyvan Patel, Mehul

Abstract

Custom acoustic models can be configured by developers by providing audio files with custom recordings. The custom acoustic model is trained by tuning a baseline model using the audio files. Audio files may contain custom noise to apply to clean speech for training. The custom acoustic model is provided as an alternative to a standard acoustic model. A speech recognition system can select an acoustic model for use upon receiving metadata about the device conditions or type. Speech recognition is performed on speech audio using one or more acoustic models. The result can be provided to developers through the user interface, and an error rate can be computed and also provided.

IPC Classes ?

G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G06F 3/16 - Sound inputSound output
G10L 15/18 - Speech classification or search using natural language modelling

37. BUILDING A NATURAL LANGUAGE UNDERSTANDING APPLICATION USING A RECEIVED ELECTRONIC RECORD CONTAINING PROGRAMMING CODE INCLUDING AN INTERPRET-BLOCK, AN INTERPRET-STATEMENT, A PATTERN EXPRESSION AND AN ACTION STATEMENT

Application Number	18375906
Status	Pending
Filing Date	2023-10-02
First Publication Date	2024-01-25
Owner	SoundHound AI IP, LLC. (USA)
Inventor	Mont-Reynaud, Bernard Emami, Seyed M. Wilson, Chris Mohajer, Keyvan

Abstract

A method of building a natural language understanding application is provided. The method includes receiving at least one electronic record containing programming code and creating executable code from the programming code. Further, the executable code, when executed by a processor, causes the processor to create a parse and an interpretation of a sequence of input tokens, the programming code includes an interpret-block and the interpret-block includes an interpret-statement. Additionally, the interpret-statement includes a pattern expression and the interpret-statement includes an action statement.

IPC Classes ?

G10L 15/18 - Speech classification or search using natural language modelling
G06F 40/205 - Parsing
G06F 8/30 - Creation or generation of source code
G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice

38. NEURAL SPEECH-TO-MEANING

Application Number	18461212
Status	Pending
Filing Date	2023-09-05
First Publication Date	2023-12-28
Owner	SoundHound AI IP, LLC (USA)
Inventor	Krishnaswamy, Sudharsan Wieman, Maisy Probell, Jonah

Abstract

A neural speech-to-meaning system is trained on speech audio expressing specific intents. The system receives speech audio and produces indications of when the speech in the audio matches the intent. Intents may include variables that can have a large range of values, such as the names of places. The neural speech-to-meaning system simultaneously recognizes enumerated values of variables and general intents. Recognized variable values can serve as arguments to API requests made in response to recognized intents. Accordingly, neural speech-to-meaning supports voice virtual assistants that serve users based on API hits.

IPC Classes ?

G10L 15/26 - Speech to text systems
G06F 3/16 - Sound inputSound output
G10L 15/18 - Speech classification or search using natural language modelling
G10L 15/183 - Speech classification or search using natural language modelling using context dependencies, e.g. language models
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

39. PRE-WAKEWORD SPEECH PROCESSING

Application Number	17804544
Status	Pending
Filing Date	2022-05-27
First Publication Date	2023-11-30
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Stahl, Karl Mont-Reynaud, Bernard

Abstract

Methods and systems for pre-wakeword speech processing are disclosed. Speech audio, comprising command speech spoken before a wakeword, may be stored in a buffer in oldest to newest order. Upon detection of the wakeword, reverse acoustic models and language models, such as reverse automatic speech recognition (R-ASR) can be applied to the buffered audio, in newest to oldest order, starting from before the wakeword. The speech is converted into a sequence of words. Natural language grammar models, such as natural language understanding (NLU), can be applied to match the sequence of words to a complete command, the complete command being associated with invoking a computer operation.

IPC Classes ?

G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 15/08 - Speech classification or search
G10L 25/93 - Discriminating between voiced and unvoiced parts of speech signals

40. APPARATUS, PLATFORM, METHOD AND MEDIUM FOR INTENTION IMPORTANCE INFERENCE

Application Number	17820660
Status	Pending
Filing Date	2022-08-18
First Publication Date	2023-11-30
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Wang, Chong

Abstract

The application provides an apparatus, platform, method and medium for intention importance interference. The apparatus includes an interface configured to receive user-related information; and a processor coupled to the interface and configured to: extract data related to different aspects of a user from the user-related information; generate a plurality of intention probes based on the data related to different aspects of the user, each intention probe comprising an intention and associated data items; infer an importance of each intention probe by calculating a score of each associated data items of the intention probe based on the data related to different aspects of the user; and provide information associated with an intention probe with a highest importance.

IPC Classes ?

G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 15/16 - Speech classification or search using artificial neural networks
G06F 16/9535 - Search customisation based on user profiles and personalisation

41. Using semantic grammar extensibility for collective artificial intelligence

Application Number	17377375
Grant Number	11829724
Status	In Force
Filing Date	2021-07-16
First Publication Date	2023-11-28
Grant Date	2023-11-28
Owner	SOUNDHOUND AI IP, LLC (USA)
Inventor	Mont-Reynaud, Bernard Wilson, Christopher S. Mohajer, Keyvan

Abstract

IPC Classes ?

G06F 40/30 - Semantic analysis
G06F 8/20 - Software design

42. Content filtering in media playing devices

Application Number	18348249
Grant Number	12126868
Status	In Force
Filing Date	2023-07-06
First Publication Date	2023-11-02
Grant Date	2024-10-22
Owner	SoundHound AI IP, LLC. (USA)
Inventor	Khov, Thor S. Kong, Terry

Abstract

IPC Classes ?

H04N 21/454 - Content filtering, e.g. blocking advertisements
G06N 3/045 - Combinations of networks
G06V 20/40 - ScenesScene-specific elements in video content
H04N 21/44 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
H04N 21/466 - Learning process for intelligent management, e.g. learning user preferences for recommending movies

43. Method and system for acoustic model conditioning on non-phoneme information features

Application Number	18348259
Grant Number	12154546
Status	In Force
Filing Date	2023-07-06
First Publication Date	2023-11-02
Grant Date	2024-11-26
Owner	SoundHound AI IP, LLC. (USA)
Inventor	Gowayyed, Zizu Mohajer, Keyvan

Abstract

IPC Classes ?

G10L 15/00 - Speech recognition
G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
G10L 15/04 - SegmentationWord boundary detection
G10L 15/16 - Speech classification or search using artificial neural networks
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 15/08 - Speech classification or search
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks

44. SYSTEMS AND METHODS FOR GENERATING AND USING SHARED NATURAL LANGUAGE LIBRARIES

Application Number	18206567
Status	Pending
Filing Date	2023-06-06
First Publication Date	2023-10-12
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Mohajer, Keyvan

Abstract

Systems and methods for searching databases by sound data input are provided herein. A service provider may have a need to make their database(s) searchable through search technology. However, the service provider may not have the resources to implement such search technology. The search technology may allow for search queries using sound data input. The technology described herein provides a solution addressing the service provider’s need, by giving a search technology that furnishes search results in a fast, accurate manner. In further embodiments, systems and methods to monetize those search results are also described herein.

IPC Classes ?

G06F 16/33 - Querying
G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G10L 15/183 - Speech classification or search using natural language modelling using context dependencies, e.g. language models
G06F 16/174 - Redundancy elimination performed by the file system

45. Ordering from a menu using natural language

Application Number	17716482
Grant Number	12124804
Status	In Force
Filing Date	2022-04-08
First Publication Date	2023-09-14
Grant Date	2024-10-22
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Aung, Joe Kyaw Soe Garcia, Vincent Ren, Junru

Abstract

A computer system ingests a catalog of a plurality of items. The catalog is specific to a particular domain and including names for individual items of the plurality of items. One or more attributes are respectively associated to the individual items of the plurality of items. A specialist grammar specific to the particular domain of the catalog is obtained and a programming language code to interpret natural language input related to the catalog is generated using the specialist grammar, and the names for the individual items of the plurality of items and their associated one or more attributes.

IPC Classes ?

G06F 17/00 - Digital computing or data processing equipment or methods, specially adapted for specific functions
G06F 40/295 - Named entity recognition
G06F 40/40 - Processing or translation of natural language
G06N 5/022 - Knowledge engineeringKnowledge acquisition
G10L 15/26 - Speech to text systems

46. Multi-modal audio processing for voice-controlled devices

Application Number	18194885
Grant Number	11997448
Status	In Force
Filing Date	2023-04-03
First Publication Date	2023-08-10
Grant Date	2024-05-28
Owner	SOUNDHOUND AI IP, LLC (USA)
Inventor	Stahl, Karl

Abstract

A voice-controlled device includes a microphone to receive a set of sound waves that includes speech uttered by a user and other sound, and to output a first audio signal that includes a contribution from the speech uttered by the user and a contribution from the other sound. The device also includes a receiver to receive an electromagnetic signal and to output a second audio signal obtained from the electromagnetic signal. An audio pre-processor of the device processes the first audio signal using the second audio signal to reduce the contribution from the other sound in a processed audio signal. The voice-controlled device then provides the processed audio signal to a speech recognition module to determine a voice command issued by the user.

IPC Classes ?

H04R 25/00 - Deaf-aid sets
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 21/0316 - Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
G10L 25/06 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being correlation coefficients
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
H04R 1/08 - MouthpiecesAttachments therefor
H04R 1/10 - EarpiecesAttachments therefor
H04R 5/033 - Headphones for stereophonic communication

47. Token confidence scores for automatic speech recognition

Application Number	17649810
Grant Number	12223948
Status	In Force
Filing Date	2022-02-03
First Publication Date	2023-08-03
Grant Date	2025-02-11
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Singh, Pranav Mishra, Saraswati Na, Eunjee

Abstract

Methods and systems for correction of a likely erroneous word in a speech transcription are disclosed. By evaluating token confidence scores of individual words or phrases, the automatic speech recognition system can replace a low-confidence score word with a substitute word or phrase. Among various approaches, neural network models can be used to generate individual confidence scores. Such word substitution can enable the speech recognition system to automatically detect and correct likely errors in transcription. Furthermore, the system can indicate the token confidence scores on a graphic user interface for labeling and dictionary enhancement.

IPC Classes ?

G10L 15/18 - Speech classification or search using natural language modelling
G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
G10L 15/26 - Speech to text systems

48. VIDEO CONFERENCE CAPTIONING

Application Number	18298282
Status	Pending
Filing Date	2023-04-10
First Publication Date	2023-08-03
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Coeytaux, Ethan

Abstract

A video conferencing system, such as one implemented with a cloud server, receives audio streams from a plurality of endpoints. The system uses automatic speech recognition to transcribe speech in the audio streams. The system multiplexes the transcriptions into individual caption streams and sends them to the endpoints, but the caption stream to each endpoint omits the transcription of audio from the endpoint. Some systems allow muting of audio through an indication to the system. The system then omits sending the muted audio to other endpoints and also omits sending a transcription of the muted audio to other endpoints.

IPC Classes ?

G10L 15/26 - Speech to text systems
G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
G10L 19/005 - Correction of errors induced by the transmission channel, if related to the coding algorithm
G10L 15/19 - Grammatical context, e.g. disambiguation of recognition hypotheses based on word sequence rules
G10L 15/14 - Speech classification or search using statistical models, e.g. Hidden Markov Models [HMM]

49. METHOD AND APPARATUS FOR INTELLIGENT VOICE QUERY

Application Number	17654635
Status	Pending
Filing Date	2022-03-14
First Publication Date	2023-07-27
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Wang, Chong

Abstract

A method and an apparatus for processing an intelligent voice query. A voice query input is received from a user. Automatic speech recognition and natural language understanding generate structured query data. It is modified based on an input adaptation rule to obtain modified structured query data appropriate for a content providing server, which provides a query result output corresponding to the modified structured query data. Input adaptation rules may comprise rule sets based on behavior patterns of the user and/or business recommendations. The query result output can be used for natural language generation, which may have similar adaptation rules for output.

IPC Classes ?

G06F 16/2452 - Query translation
G06F 16/242 - Query formulation
G10L 15/18 - Speech classification or search using natural language modelling
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog

50. METHOD AND SYSTEM FOR ASSISTING A USER

Application Number	17561548
Status	Pending
Filing Date	2021-12-23
First Publication Date	2023-06-29
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Mohajer, Keyvan Kam, Kaishin Pierret, Christophe

Abstract

A method of assisting a user. The method including obtaining a plurality of rules having condition components and action components, the action components specifying conversation schemas, detecting, by a sensor, a fact related to an environment of the user, identifying a rule, of the plurality of rules, having a condition component that is satisfied by the detected fact, initiating a conversation with the user according to a conversation schema of the action component of the rule of the plurality of rules, and performing an action in response to a positive statement by the user.

IPC Classes ?

G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G01C 21/36 - Input/output arrangements for on-board computers

51. Multiple service levels for automatic speech recognition

Application Number	17447823
Grant Number	11978454
Status	In Force
Filing Date	2021-09-16
First Publication Date	2023-03-16
Grant Date	2024-05-07
Owner	SOUNDHOUND AI IP, LLC (USA)
Inventor	Stonehocker, Timothy P. Gowayyed, Zizu Eichstaedt, Matthias Emami, Seyed Majid Jiang, Evelyn Berryhill, Ryan Ramona, Mathieu Veira, Neil

Abstract

A system for performing automated speech recognition (ASR) on audio data includes a queue manager to receive a request to perform ASR on audio data, add the request to a queue of incoming requests, and determine a queue depth representing a number of requests in the queue at a given time. The system also includes a load supervisor to receive the request and the queue depth from the queue manager and assign a service level for the request based on the queue depth. In addition, the system includes a speech-to-text converter to receive the assigned service level for the request from the load supervisor, select an ASR model for the request based on the received service level, receive the audio data associated with the request, and perform ASR on the audio data using the selected ASR model.

IPC Classes ?

G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
G10L 15/16 - Speech classification or search using artificial neural networks
G10L 15/26 - Speech to text systems

52. CONTROLLING A GRAPHICAL USER INTERFACE BY TELEPHONE

Application Number	17408476
Status	Pending
Filing Date	2021-08-22
First Publication Date	2023-02-23
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Mohajer, Kamyar Mohajer, Keyvan Hom, James Jiang, Evelyn

Abstract

A method and system for controlling a GUI on a user's network-connected device, the control being provided by a telephone call between the user and a speech recognition and speech synthesis system. An example of a restaurant ordering system is provided. The user calls a phone number and is guided through a verbal ordering process that includes one or more of: adding an item, deleting an item, changing quantities, changing sizes, and changing details of an item. The user's choices are added to a display so that a current status of the order is visible to the user. The GUI is updated as changes are made to the order. The GUI can also request additional information, upsell items, and show menus. The GUI aids the user in confirming that the order is correct. The system provides the final order to a restaurant for fulfillment.

IPC Classes ?

G06Q 30/06 - Buying, selling or leasing transactions
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 13/02 - Methods for producing synthetic speechSpeech synthesisers
G06F 3/16 - Sound inputSound output
G06F 16/955 - Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
G06F 9/451 - Execution arrangements for user interfaces

53. Differential spatial rendering of audio sources

Application Number	17655650
Grant Number	11589184
Status	In Force
Filing Date	2022-03-21
First Publication Date	2023-02-21
Grant Date	2023-02-21
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Mont-Reynaud, Bernard

Abstract

Methods and systems for intuitive spatial audio rendering with improved intelligibility are disclosed. By establishing a virtual association between an audio source and a location in the listener's virtual audio space, a spatial audio rendering system can generate spatial audio signals that create a natural and immersive audio field for a listener. The system can receive the virtual location of the source as a parameter and map the source audio signal to a source-specific multi-channel audio signal. In addition, the spatial audio rendering system can be interactive and dynamically modify the rendering of the spatial audio in response to a user's active control or tracked movement.

IPC Classes ?

H04S 7/00 - Indicating arrangementsControl arrangements, e.g. balance control

54. Using a smartphone to control another device by voice

Application Number	17372123
Grant Number	11950300
Status	In Force
Filing Date	2021-07-09
First Publication Date	2023-01-12
Grant Date	2024-04-02
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Tsuchida, Keisuke

Abstract

A method and system for implementing a speech-enabled interface of a host device via an electronic mobile device in a network are provided. The method includes establishing a communication session between the host device and the mobile device via a session service provider. According to some embodiments, a barcode can be adopted to enable the pairing of the host device and mobile device. Furthermore, the present method and system employ the voice interface in conjunction with speech recognition systems and natural language processing to interpret voice input for the hosting device, which can be used to perform one or more actions related to the hosting device.

IPC Classes ?

H04W 76/11 - Allocation or use of connection identifiers
G10L 15/08 - Speech classification or search
H04W 4/50 - Service provisioning or reconfiguring

55. Sidebar conversations

Application Number	17353639
Grant Number	11539920
Status	In Force
Filing Date	2021-06-21
First Publication Date	2022-12-22
Grant Date	2022-12-27
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Stonehocker, Timothy P

Abstract

A system and a method are disclosed that enable sidebar conversations between two or more attendees that are participating in a primary or main meeting. The sidebar conversation occurs in conjunction or concurrently with the primary meeting. A first attendee provides commands to indicate a desire to initiate a sidebar conversation and information about a targeted attendee. The commands are analyzed to determine if a trigger phrase is included. The commands are analyzed to determine if there is an identification of a second (targeted) attendee, who is currently participating in the main meeting. If the second attendee is available, then the sidebar conversation is initiated. Additional attendees can be added to the sidebar conversation. Additional independent and simultaneous sidebar conversations can be initiated (by attendees currently participating in the active sidebar conversation), thereby allowing one attendee to conduct multiple simultaneous sidebar conversations while being able to switch between them.

IPC Classes ?

H04N 7/15 - Conference systems
H04L 65/403 - Arrangements for multi-party communication, e.g. for conferences
H04L 65/1069 - Session establishment or de-establishment
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 25/57 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination for processing of video signals
G06F 3/16 - Sound inputSound output
G10L 15/08 - Speech classification or search

56. Enabling natural language interactions with user interfaces for users of a software application

Application Number	17332927
Grant Number	12008991
Status	In Force
Filing Date	2021-05-27
First Publication Date	2022-12-01
Grant Date	2024-06-11
Owner	SoundHound AI IP, LLC (USA)
Inventor	Yabas, Utku Hubert, Philipp Stahl, Karl

Abstract

IPC Classes ?

G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G06F 40/211 - Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
G06F 40/284 - Lexical analysis, e.g. tokenisation or collocates
G10L 15/183 - Speech classification or search using natural language modelling using context dependencies, e.g. language models
G10L 15/26 - Speech to text systems

57. Method for providing information, method for generating database, and program

Application Number	17649052
Grant Number	11995143
Status	In Force
Filing Date	2022-01-26
First Publication Date	2022-12-01
Grant Date	2024-05-28
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Naito, Masaki Tsuchida, Keisuke Yoneyama, Jun Sawada, Kaku

Abstract

IPC Classes ?

G06F 16/95 - Retrieval from the web
G06F 16/33 - Querying
G06F 16/955 - Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
G06F 40/40 - Processing or translation of natural language
G10L 15/26 - Speech to text systems

58. API FOR SERVICE PROVIDER FULFILLMENT OF DATA PRIVACY REQUESTS

Application Number	17237705
Status	Pending
Filing Date	2021-04-22
First Publication Date	2022-10-27
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Qiu, Kevin Jiang, Evelyn Eichstaedt, Matthias Heit, Warren S.

Abstract

A system and method are disclosed for fulfilling GDPR and other privacy requests in a client device system as well as a downstream service provider with which the client device system partners. In examples, the downstream service provider may be a voice assistant service provider providing voice recognition and language understanding capabilities to an upstream client device system.

IPC Classes ?

G06F 21/62 - Protecting access to data via a platform, e.g. using keys or access control rules

59. ACTIVE ARBITRATION

Serial Number	97568867
Status	Pending
Filing Date	2022-08-29
Owner	SOUNDHOUND AI IP, LLC ()
NICE Classes ?	09 - Scientific and electric apparatus and instruments

Goods & Services

Recorded computer software for spotting wake words; Recorded computer software for recognizing speech, interpreting natural language, and providing virtual assistant functions; Downloadable computer software development kits (SDKs) for developing speech recognition, natural language understanding, and virtual assistant software; Recorded computer software for controlling speech recognition, natural language understanding, and virtual assistant cloud processing; Recorded computer software for performing text-to-speech voice audio synthesis; Downloadable electronic data files featuring neural network parameter sets for synthesizing text-to-speech voices; Downloadable electronic data files featuring neural network parameter sets for spotting wake words in audio; Recorded computer software for operating a virtual assistant device for hotels and restaurants; Recorded computer software for providing a virtual assistant using artificial intelligence technology for hotels and restaurants to make customer bookings and reservations, and answer other customer queries; Preinstalled software for operating a virtual assistant device for hotels and restaurants sold as a component of virtual assistant devices for hotels and restaurants; Recorded computer software for understanding speech for use with voice ordering kiosks, drive through ordering systems, and retail ordering systems; Recorded computer software for understanding speech for use with voice reservation kiosks; Recorded computer software for understanding speech for use with smart home devices; Recorded computer software for understanding speech for use with voice enabled robots

60. Wake suppression for audio playing and listening devices

Application Number	17736850
Grant Number	11922939
Status	In Force
Filing Date	2022-05-04
First Publication Date	2022-08-18
Grant Date	2024-03-05
Owner	SoundHound AI IP, LLC (USA)
Inventor	Yang, Hsuan Zhãng, Qindí Heit, Warren S.

Abstract

A system and method are disclosed for ignoring a wakeword received at a speech-enabled listening device when it is determined the wakeword is reproduced audio from an audio-playing device. Determination can be by detecting audio distortions, by an ignore flag sent locally between an audio-playing device and speech-enabled device, by and ignore flag sent from a server, by comparison of received audio played audio to a wakeword within an audio-playing device or a speech-enabled device, and other means.

IPC Classes ?

G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 15/08 - Speech classification or search
G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

61. Wakeword selection

Application Number	17709131
Grant Number	11948571
Status	In Force
Filing Date	2022-03-30
First Publication Date	2022-07-14
Grant Date	2024-04-02
Owner	SoundHound AI IP, LLC (USA)
Inventor	Mont-Reynaud, Bernard

Abstract

A system and method are disclosed capable of parsing a spoken utterance into a natural language request and a speech audio segment, where the natural language request directs the system to use the speech audio segment as a new wakeword. In response to this wakeword assignment directive, the system and method are further capable of immediately building a new wakeword spotter to activate the device upon matching the new wakeword in the input audio. Different approaches to promptly building a new wakeword spotter are described. Variations of wakeword assignment directives can make the new wakeword public or private. They can also add the new wakeword to earlier wakewords, or replace earlier wakewords.

IPC Classes ?

G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G06F 3/16 - Sound inputSound output
G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G10L 15/08 - Speech classification or search
G10L 17/04 - Training, enrolment or model building

62. Adapting an utterance cut-off period based on parse prefix detection

Application Number	17698623
Grant Number	11862162
Status	In Force
Filing Date	2022-03-18
First Publication Date	2022-06-30
Grant Date	2024-01-02
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Aguayo, Patricia Pozon Zhang, Jennifer Hee Young Probell, Jonah

Abstract

A processing system detects a period of non-voice activity and compares its duration to a cutoff period. The system adapts the cutoff period based on parsing previously-recognized speech to determine, according to a model, such as a machine-learned model, the probability that the speech recognized so far is a prefix to a longer complete utterance. The cutoff period is longer when a parse of previously recognized speech has a high probability of being a prefix of a longer utterance.

IPC Classes ?

G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 15/05 - Word boundary detection
G10L 25/78 - Detection of presence or absence of voice signals
G10L 15/18 - Speech classification or search using natural language modelling
G10L 15/08 - Speech classification or search
G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
G10L 15/19 - Grammatical context, e.g. disambiguation of recognition hypotheses based on word sequence rules

63. SOUNDHOUND EDGELITE

Serial Number	97479096
Status	Pending
Filing Date	2022-06-28
Owner	SOUNDHOUND AI IP, LLC ()
NICE Classes ?	09 - Scientific and electric apparatus and instruments 42 - Scientific, technological and industrial services, research and design

Goods & Services

Recorded computer software for spotting wake words; Recorded computer software for recognizing speech, interpreting natural language, and providing virtual assistant functions; Downloadable computer software development kits (SDKs) for developing speech recognition, natural language understanding, and virtual assistant software; Recorded computer software for performing text-to-speech voice audio synthesis; Downloadable electronic data files featuring neural network parameter sets for synthesizing text-to-speech voices; Downloadable electronic data files featuring neural network parameter sets for spotting wake words in audio; Recorded computer software for operating a virtual assistant device for hotels and restaurants; Recorded computer software for providing a virtual assistant using artificial intelligence technology for hotels and restaurants to make customer bookings and reservations, and answer other customer queries; Preinstalled software for operating a virtual assistant device for hotels and restaurants sold as a component of virtual assistant devices for hotels and restaurants; Recorded computer software for understanding speech for use with smart home devices; Recorded computer software for understanding speech for use with voice enabled robots; Recorded computer software for training of custom wake word spotters for virtual assistants; Recorded computer software for synthesis of text-to-speech voice audio Platform as a service (PaaS) featuring computer software platforms for configuring virtual assistants through a web interface; Platform as a service (PaaS) featuring computer software platforms for configuring domain-specific content for virtual assistants; Providing online non-downloadable computer software for training of custom wake word spotters for virtual assistants; Providing online non-downloadable computer software for synthesis of text-to-speech voice audio; Platform as a service (PaaS) featuring computer software platforms for configuring custom text-to-speech voices

64. SYSTEM AND METHOD FOR COMPUTING REGION CENTERS BY POINT CLUSTERING

Application Number	17549796
Status	Pending
Filing Date	2021-12-13
First Publication Date	2022-06-16
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Pierret, Christophe

Abstract

A system and a method are disclosed that calculate the center of a geographic region. A set of topological/geographical points is received. A set of clusters is determined. A weight for each cluster is computed. The highest weighted cluster is selected. The geographic region center is calculated using the selected cluster. The geographical points can include a key for each point and be filtered by an indicated key before calculating the center of a geographic location.

IPC Classes ?

G06K 9/62 - Methods or arrangements for recognition using electronic means

65. Meaning inference from speech audio

Application Number	17653365
Grant Number	11769488
Status	In Force
Filing Date	2022-03-03
First Publication Date	2022-06-16
Grant Date	2023-09-26
Owner	SoundHound AI IP, LLC (USA)
Inventor	Krishnaswamy, Sudharsan Wieman, Maisy Probell, Jonah

Abstract

IPC Classes ?

G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G10L 15/16 - Speech classification or search using artificial neural networks
G10L 15/18 - Speech classification or search using natural language modelling
G10L 13/02 - Methods for producing synthetic speechSpeech synthesisers
G10L 15/197 - Probabilistic grammars, e.g. word n-grams
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 15/187 - Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams

66. System and Method For Achieving Interoperability Through The Use of Interconnected Voice Verification System

Application Number	17108724
Status	Pending
Filing Date	2020-12-01
First Publication Date	2022-06-02
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Mohajer, Keyvan Heit, Warren S.

Abstract

A system and method are disclosed for achieving interoperability and access to a personal extension knowledge/preference database (PEKD) through interconnected voice verification systems. Devices from various different companies and systems can link to a voice verification system (VVS). Users can also enroll with the VSS so that the VSS can provide authentication of users by personal wake phrases. Thereafter users can access their PEKD from un-owned devices by speaking their wake phrase.

IPC Classes ?

G10L 17/24 - the user being prompted to utter a password or a predefined phrase
G10L 17/04 - Training, enrolment or model building
G06F 21/32 - User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
H04L 29/06 - Communication control; Communication processing characterised by a protocol
G06N 20/00 - Machine learning
G06F 16/25 - Integrating or interfacing systems involving database management systems

67. NEURAL SENTENCE GENERATOR FOR VIRTUAL ASSISTANTS

Application Number	17455727
Status	Pending
Filing Date	2021-11-19
First Publication Date	2022-05-26
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Singh, Pranav Mohajer, Keyvan Zhang, Yilun

Abstract

Methods and systems for automatically generating sample phrases or sentences that a user can say to invoke a set of defined actions performed by a virtual assistant are disclosed. By enabling finetuned general-purpose natural language models, the system can generate potential and accurate utterance sentences based on extracted keywords or the input utterance sentence. Furthermore, domain-specific datasets can be used to train the pre-trained, general-purpose natural language models via unsupervised learning. These generated sentences can improve the efficiency of configuring a virtual assistant. The system can further optimize the effectiveness of a virtual assistant in understanding the user, which can enhance the user experience of communicating with it.

IPC Classes ?

G10L 15/18 - Speech classification or search using natural language modelling
G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog

68. RECOMMENDATION ENGINE FOR UPSELLING IN RESTAURANT ORDERS

Application Number	17667535
Status	Pending
Filing Date	2022-02-08
First Publication Date	2022-05-26
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Mohajer, Kamyar Macrae, Robert

Abstract

A computer-implemented method is provided to support a food ordering system for food items from a menu of a restaurant using natural language. Expressions made for ordering are used to recommend a food item that a user has a high probability of wanting to include in an order. The recommendation engine is trained using machine learning. Expressions are collected and parsed to identify words that might indicate food items offered by the restaurant. The words are provided to a restaurant owner to identify food items on a menu, with which the words are associated.

IPC Classes ?

G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G06F 16/2457 - Query processing with adaptation to user needs
G10L 17/00 - Speaker identification or verification techniques
G10L 15/18 - Speech classification or search using natural language modelling
G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
G06F 16/242 - Query formulation
G06F 16/22 - IndexingData structures thereforStorage structures

69. Text-to-Speech Adapted by Machine Learning

Application Number	17580289
Status	Pending
Filing Date	2022-01-20
First Publication Date	2022-05-12
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Mont-Reynaud, Bernard Almudafar-Depeyrot, Monika

Abstract

Machine learned models take in vectors representing desired behaviors and generate voice vectors that provide the parameters for text-to-speech (TTS) synthesis. Models may be trained on behavior vectors that include user profile attributes, situational attributes, or semantic attributes. Situational attributes may include age of people present, music that is playing, location, noise, and mood. Semantic attributes may include presence of proper nouns, number of modifiers, emotional charge, and domain of discourse. TTS voice parameters may apply per utterance and per word as to enable contrastive emphasis.

IPC Classes ?

G10L 13/10 - Prosody rules derived from textStress or intonation
G10L 13/04 - Details of speech synthesis systems, e.g. synthesiser structure or memory management
G10L 13/033 - Voice editing, e.g. manipulating the voice of the synthesiser

70. Server supported recognition of wake phrases

Application Number	17584780
Grant Number	12051403
Status	In Force
Filing Date	2022-01-26
First Publication Date	2022-05-12
Grant Date	2024-07-30
Owner	SoundHound AI IP, LLC. (USA)
Inventor	Jain, Newton Zaheer, Sameer Syed

Abstract

IPC Classes ?

G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G06F 8/41 - Compilation
G10L 15/08 - Speech classification or search
G10L 15/16 - Speech classification or search using artificial neural networks

71. DRIVER INTERFACE WITH VOICE AND GESTURE CONTROL

Application Number	17547917
Status	Pending
Filing Date	2021-12-10
First Publication Date	2022-05-05
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Li, Zili Vasconcelos, Cristina

Abstract

A driver interface for use within an automobile provides responses to voice commands issued for example by a driver of the automobile. The interface includes a camera and microphone for capturing image data such as gestures and audio data from the automobile driver. The image data and audio data are processed to extract image and linguistic features from the image and audio data, which image and linguistic features are processed to interpret and infer a meaning of the voice command.

IPC Classes ?

G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
G10L 15/18 - Speech classification or search using natural language modelling
G10L 15/187 - Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
G10L 15/24 - Speech recognition using non-acoustical features
G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G06K 9/62 - Methods or arrangements for recognition using electronic means
G10L 15/16 - Speech classification or search using artificial neural networks
G06V 10/40 - Extraction of image or video features
G06V 10/70 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning
G06V 20/40 - ScenesScene-specific elements in video content

72. Training a device specific acoustic model

Application Number	17573551
Grant Number	11830472
Status	In Force
Filing Date	2022-01-11
First Publication Date	2022-04-28
Grant Date	2023-11-28
Owner	SOUNDHOUND AI IP, LLC (USA)
Inventor	Mohajer, Keyvan Patel, Mehul

Abstract

Developers can configure custom acoustic models by providing audio files with custom recordings. The custom acoustic model is trained by tuning a baseline model using the audio files. Audio files may contain custom noise to apply to clean speech for training. The custom acoustic model is provided as an alternative to a standard acoustic model. Device developers can select an acoustic model by a user interface. Speech recognition is performed on speech audio using one or more acoustic models. The result can be provided to developers through the user interface, and an error rate can be computed and also provided.

IPC Classes ?

G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G06F 3/16 - Sound inputSound output
G10L 15/18 - Speech classification or search using natural language modelling

73. Controlling an engagement state of an agent during a human-machine dialog

Application Number	17562891
Grant Number	12125484
Status	In Force
Filing Date	2021-12-27
First Publication Date	2022-04-21
Grant Date	2024-10-22
Owner	SoundHound AI IP, LLC (USA)
Inventor	Halstvedt, Scott Mohajer, Keyvan Mont-Reynaud, Bernard

Abstract

A method of controlling an engagement state of an agent during a human-machine dialog is provided. The method can include receiving a spoken request that is a conditional locking request, wherein the conditional locking request uses a natural language expression to explicitly specify a locking condition, which is a predicate, storing the predicate in a format that can be evaluated when needed by the agent, entering a conditionally locked state in response to the conditional locking request, in the conditionally locked state, receiving a multiplicity of requests without a need for a wakeup indicator, and for a request from the multiplicity of requests evaluating the predicate upon receiving the request, and processing the request if the predicate is true.

IPC Classes ?

G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G06F 3/16 - Sound inputSound output
G06F 21/32 - User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
G06V 40/16 - Human faces, e.g. facial parts, sketches or expressions
G10L 15/08 - Speech classification or search
G10L 17/00 - Speaker identification or verification techniques
G10L 17/04 - Training, enrolment or model building
G10L 17/06 - Decision making techniquesPattern matching strategies
G10L 17/22 - Interactive proceduresMan-machine interfaces

74. Method and system for conversation transcription with metadata

Application Number	17450551
Grant Number	12125487
Status	In Force
Filing Date	2021-10-11
First Publication Date	2022-04-14
Grant Date	2024-10-22
Owner	SoundHound AI IP, LLC. (USA)
Inventor	Bradley, Kiersten L. Coeytaux, Ethan Yin, Ziming

Abstract

IPC Classes ?

G10L 15/26 - Speech to text systems
G06F 40/134 - Hyperlinking
G06F 40/166 - Editing, e.g. inserting or deleting
G06F 40/284 - Lexical analysis, e.g. tokenisation or collocates
G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G10L 15/07 - Adaptation to the speaker

75. Method and system for conversation transcription with metadata

Application Number	17450552
Grant Number	12020708
Status	In Force
Filing Date	2021-10-11
First Publication Date	2022-04-14
Grant Date	2024-06-25
Owner	SoundHound AI IP, LLC. (USA)
Inventor	Bradley, Kiersten L. Coeytaux, Ethan Yin, Ziming

Abstract

IPC Classes ?

G10L 15/26 - Speech to text systems
G06F 40/134 - Hyperlinking
G06F 40/166 - Editing, e.g. inserting or deleting
G06F 40/284 - Lexical analysis, e.g. tokenisation or collocates
G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G10L 15/07 - Adaptation to the speaker

76. Using phonetic variants in a local context to improve natural language understanding

Application Number	16529689
Grant Number	11295730
Status	In Force
Filing Date	2019-08-01
First Publication Date	2022-04-05
Grant Date	2022-04-05
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Mohajer, Keyvan Wilson, Christopher Mont-Reynaud, Bernard

Abstract

A method is described that includes processing text and speech from an input utterance using local overrides of default dictionary pronunciations. Applying this method, a word-level grammar used to process the tokens specifies at least one local word phonetic variant that applies within a specific production rule and, within a local context of the specific production rule, the local word phonetic variant overrides one or more default dictionary phonetic versions of the word. This method can be applied to parsing utterances where the pronunciation of some words depends on their syntactic or semantic context.

IPC Classes ?

G10L 15/18 - Speech classification or search using natural language modelling
G10L 15/19 - Grammatical context, e.g. disambiguation of recognition hypotheses based on word sequence rules

77. System and method for voice morphing in a data annotator tool

Application Number	17539182
Grant Number	12086564
Status	In Force
Filing Date	2021-11-30
First Publication Date	2022-03-24
Grant Date	2024-09-10
Owner	SoundHound AI IP, LLC. (USA)
Inventor	Ross, Dylan H.

Abstract

IPC Classes ?

G10L 15/18 - Speech classification or search using natural language modelling
G06F 40/56 - Natural language generation
G06F 40/58 - Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G10L 19/125 - Pitch excitation, e.g. pitch synchronous innovation CELP [PSI-CELP]
G10L 19/26 - Pre-filtering or post-filtering
G10L 21/013 - Adapting to target pitch

78. System and method for providing natural language recommendations

Application Number	16447958
Grant Number	11276398
Status	In Force
Filing Date	2019-06-20
First Publication Date	2022-03-15
Grant Date	2022-03-15
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Macrae, Robert Mohajer, Kamyar

Abstract

A system that includes a stand-alone device or a server connected client device are in communication with a server and provide recommendations. The device includes an input component, a storage component, a processor and an output component. The server-connected client device includes an input component that receives the user's request, a communication component that communicates the request to the server and receives the recommendation from the server, and an output component that provides the recommendation to user.

IPC Classes ?

G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G06F 16/242 - Query formulation
G06F 16/2457 - Query processing with adaptation to user needs
G06F 16/22 - IndexingData structures thereforStorage structures
G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
G10L 15/18 - Speech classification or search using natural language modelling
G10L 17/00 - Speaker identification or verification techniques

79. Conditional responses to application commands in a client-server system

Application Number	16791421
Grant Number	11250217
Status	In Force
Filing Date	2020-02-14
First Publication Date	2022-02-15
Grant Date	2022-02-15
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Mohajer, Keyvan Wilson, Christopher S. Khov, Kheng Graves, Ian

Abstract

A client device receives a user request (e.g., in natural language form) to execute a command of an application. The client device delegates interpretation of the request to a response-processing server. Using domain knowledge previously provided by a developer of the application, the response-processing server determines the various possible responses that client devices could make in response to the request based on circumstances such as the capabilities of the client devices and the state of the application data. The response-processing server accordingly generates a response package that describes a number of different conditional responses that client devices could have to the request and provides the response package to the client device. The client device selects the appropriate response from the response package based on the circumstances as determined by the client device, executes the command (if possible), and provides the user with some representation of the response.

IPC Classes ?

G06F 40/30 - Semantic analysis
H04L 29/08 - Transmission control procedure, e.g. data link level control procedure

80. System and method for interpreting natural language commands with compound criteria

Application Number	17081996
Grant Number	11238101
Status	In Force
Filing Date	2020-10-27
First Publication Date	2022-02-01
Grant Date	2022-02-01
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Mohajer, Keyvan

Abstract

A command-processing server receives a natural language command from a user. The command-processing server has a set of domain command interpreters corresponding to different domains in which commands can be expressed, such as the domain of entertainment, or the domain of travel. Some or all of the domain command interpreters recognize user commands having a verbal prefix, an optional pre-filter, an object, and an optional post-filter; the pre- and post-filters may be compounded expressions involving multiple atomic filters. Different developers may independently specify the domain command interpreters and the sub-structure interpreters on which they are based.

IPC Classes ?

G06F 16/9032 - Query formulation
G10L 15/18 - Speech classification or search using natural language modelling
G06F 16/2457 - Query processing with adaptation to user needs
G06F 3/16 - Sound inputSound output
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
H04N 21/482 - End-user interface for program selection
G10L 15/26 - Speech to text systems

81. Support for grammar inflections within a software development framework

Application Number	17474680
Grant Number	11797777
Status	In Force
Filing Date	2021-09-14
First Publication Date	2021-12-30
Grant Date	2023-10-24
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Mont-Reynaud, Bernard Taron, Seth

Abstract

A natural language understanding server includes grammars specified in a modified extended Backus-Naur form (MEBNF) that includes an agglutination metasymbol not supported by conventional EBNF grammar parsers, as well as an agglutination preprocessor. The agglutination preprocessor applies one or more sets of agglutination rewrite rules to the MEBNF grammars, transforming them to EBNF grammars that can be processed by conventional EBNF grammar parsers. Permitting grammars to be specified in MEBNF form greatly simplifies the authoring and maintenance of grammars supporting inflected forms of words in the languages described by the grammars.

IPC Classes ?

G06F 40/30 - Semantic analysis
G10L 15/197 - Probabilistic grammars, e.g. word n-grams
G06F 8/41 - Compilation
G10L 15/18 - Speech classification or search using natural language modelling
G06F 40/205 - Parsing
G06F 40/253 - Grammatical analysisStyle critique

82. Machine learning system for digital assistants

Application Number	17350294
Grant Number	12067006
Status	In Force
Filing Date	2021-06-17
First Publication Date	2021-12-23
Grant Date	2024-08-20
Owner	SoundHound AI IP, LLC. (USA)
Inventor	Singh, Pranav Zhang, Yilun Mohajer, Keyvan Fazeli, Mohammadreza

Abstract

IPC Classes ?

G06F 16/242 - Query formulation
G06N 3/045 - Combinations of networks
G06N 3/088 - Non-supervised learning, e.g. competitive learning

83. Configurable neural speech synthesis

Application Number	17341082
Grant Number	11741941
Status	In Force
Filing Date	2021-06-07
First Publication Date	2021-12-16
Grant Date	2023-08-29
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Richards, Andrew

Abstract

A discriminator trained on labeled samples of speech can compute probabilities of voice properties. A speech synthesis generative neural network that takes in text and continuous scale values of voice properties is trained to synthesize speech audio that the discriminator will infer as matching the values of the input voice properties. Voice parameters can include speaker voice parameters, accents, and attitudes, among others. Training can be done by transfer learning from an existing neural speech synthesis model or such a model can be trained with a loss function that considers speech and parameter values. A graphical user interface can allow voice designers for products to synthesize speech with a desired voice or generate a speech synthesis engine with frozen voice parameters. A vector of parameters can be used for comparison to previously registered voices in databases such as ones for trademark registration.

IPC Classes ?

G10L 13/047 - Architecture of speech synthesisers
G10L 13/08 - Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
G10L 13/033 - Voice editing, e.g. manipulating the voice of the synthesiser
G10L 15/26 - Speech to text systems
G06N 3/084 - Backpropagation, e.g. using gradient descent
G06N 3/04 - Architecture, e.g. interconnection topology
G06F 3/16 - Sound inputSound output
G06F 3/04847 - Interaction techniques to control parameter settings, e.g. interaction with sliders or dials

84. Interpreting Queries According To Preferences

Application Number	17389847
Status	Pending
Filing Date	2021-07-30
First Publication Date	2021-11-18
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Mohajer, Keyvan Mont-Reynaud, Bernard Wilson, Christopher S.

Abstract

The present invention extends to methods, systems, and computer program products for interpreting queries according to preferences. Multi-domain natural language understanding systems can support a variety of different types of clients. Queries can be received and interpreted across one or more domains. Preferred query interpretations can be identified and query responses provided based on any of: domain preferences, preferences indicated by an identifier, or (e.g., weighted) scores exceeding a threshold.

IPC Classes ?

G06F 40/30 - Semantic analysis
G10L 15/18 - Speech classification or search using natural language modelling
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

85. Virtual assistant domain functionality

Application Number	17383097
Grant Number	11836453
Status	In Force
Filing Date	2021-07-22
First Publication Date	2021-11-11
Grant Date	2023-12-05
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Mohajer, Kamyar Mohajer, Keyvan Mont-Reynaud, Bernard Singh, Pranav

Abstract

IPC Classes ?

G06F 40/40 - Processing or translation of natural language
G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
G06Q 30/0283 - Price estimation or determination
G06Q 20/10 - Payment architectures specially adapted for electronic funds transfer [EFT] systemsPayment architectures specially adapted for home banking systems
G06F 40/211 - Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars

86. Method and system for acoustic model conditioning on non-phoneme information features

Application Number	17224967
Grant Number	11741943
Status	In Force
Filing Date	2021-04-07
First Publication Date	2021-10-28
Grant Date	2023-08-29
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Gowayyed, Zizu Mohajer, Keyvan

Abstract

IPC Classes ?

G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
G10L 15/04 - SegmentationWord boundary detection

87. Loudspeaker with transmitter

Application Number	17301308
Grant Number	11627405
Status	In Force
Filing Date	2021-03-31
First Publication Date	2021-10-07
Grant Date	2023-04-11
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Stahl, Karl

Abstract

A speaker device includes an electroacoustic transducer configured to convert an audio signal into a set of sound waves and a transmitter configured to transmit an electromagnetic signal that carries the audio signal for receipt at distances limited to an audibility range of the set of sound waves. The audibility range of the set of sound waves corresponds to a distance at which the set of sound waves is estimated to be below a predetermined sound level.

IPC Classes ?

H04R 25/00 - Deaf-aid sets
H04R 1/10 - EarpiecesAttachments therefor
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 21/0316 - Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
G10L 25/06 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being correlation coefficients
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
H04R 1/08 - MouthpiecesAttachments therefor
H04R 5/033 - Headphones for stereophonic communication

88. Automatic learning of entities, words, pronunciations, and parts of speech

Application Number	17146239
Grant Number	12080275
Status	In Force
Filing Date	2021-01-11
First Publication Date	2021-10-07
Grant Date	2024-09-03
Owner	SoundHound AI IP, LLC. (USA)
Inventor	Relin, Anton V.

Abstract

IPC Classes ?

G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
G10L 15/14 - Speech classification or search using statistical models, e.g. Hidden Markov Models [HMM]
G10L 15/19 - Grammatical context, e.g. disambiguation of recognition hypotheses based on word sequence rules

89. Framework for identifying distinct questions in a composite natural language query

Application Number	16292190
Grant Number	11138205
Status	In Force
Filing Date	2019-03-04
First Publication Date	2021-10-05
Grant Date	2021-10-05
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Mohajer, Keyvan Mont-Reynaud, Bernard Hubert, Philipp

Abstract

A query-processing server provides natural language services to applications. More specifically, the query-processing server receives and stores domain knowledge information from application developers, the domain knowledge information comprising a linguistic description of the natural language user queries that application developers wish their applications to support. A first portion of the domain knowledge information is applied to transform a natural language query received from an application to an ordered sequence of question elements. A second portion of the domain knowledge information is applied to group the ordered sequence of question elements into a plurality of distinct structured questions posed by the natural language query. The distinct structured questions may then be provided to the application, which may then execute them and obtain the corresponding data referenced by the questions.

IPC Classes ?

G06F 16/00 - Information retrievalDatabase structures thereforFile system structures therefor
G06F 16/2457 - Query processing with adaptation to user needs
G06F 16/2455 - Query execution
G06F 40/40 - Processing or translation of natural language

90. Framework for understanding complex natural language queries in a dialog context

Application Number	16363929
Grant Number	11132504
Status	In Force
Filing Date	2019-03-25
First Publication Date	2021-09-28
Grant Date	2021-09-28
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Mont-Reynaud, Bernard Wilson, Christopher S Mohajer, Keyvan

Abstract

A domain-independent framework parses and interprets compound natural language queries in the context of a conversation between a human and an agent. Generic grammar rules and corresponding semantics support the understanding of compound queries in the conversation context. The sub-queries themselves are from one or more domains, and they are parsed and interpreted by a pre-existing grammar, covering one or more pre-existing domains. The pre-existing grammar, extended by the generic rules, recognizes all compound queries based on any queries recognized by the pre-existing grammar. Use of the disclosed framework requires little or no change in the domain-specific NLU handling code. The framework defines a generic approach to propagating context data between sub-queries of a compound query. The framework can be further extended to propagate intra-query context data in, out and across query components. Complex query results, and other data such as accounting data, can also be propagated simultaneously with dialog context data in a consolidated intra-query context data structure.

IPC Classes ?

G06F 40/205 - Parsing
G06F 40/30 - Semantic analysis

91. Deriving acoustic features and linguistic features from received speech audio

Application Number	17325114
Grant Number	12175964
Status	In Force
Filing Date	2021-05-19
First Publication Date	2021-09-02
Grant Date	2024-12-24
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Lokeswarappa, Kiran Garaga Gedalius, Joel Mont-Reynaud, Bernard Huang, Jun

Abstract

IPC Classes ?

G10L 15/00 - Speech recognition
G06F 40/205 - Parsing
G06F 40/211 - Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
G06F 40/253 - Grammatical analysisStyle critique
G06N 20/00 - Machine learning
G06Q 30/0241 - Advertisements
G06Q 30/0251 - Targeted advertisements
G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G10L 15/18 - Speech classification or search using natural language modelling
G10L 25/90 - Pitch determination of speech signals
H04L 67/306 - User profiles
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 15/26 - Speech to text systems
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination

92. Semantic grammar extensibility within a software development framework

Application Number	16505185
Grant Number	11100291
Status	In Force
Filing Date	2019-07-08
First Publication Date	2021-08-24
Grant Date	2021-08-24
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Mohajer, Keyvan Wilson, Christopher S. Mont-Reynaud, Bernard

Abstract

A query-processing server that interprets natural language expressions supports the extension of a first semantic grammar (for a particular type of expression), which is declared extensible, by a second semantic grammar (for another type of expression). When an extension is requested, the query-processing server checks that the two semantic grammars have compatible semantic types. The developers need not have any knowledge of each other, or about their respective grammars. Performing an extension may be done by yet another party, such as the query-processing server, or another server, independently of all previous parties. The use of semantic grammar extensions provides a way to expand the coverage and functionality of natural language interpretation in a simple and flexible manner, so that new forms of expression may be supported, and seamlessly combined with pre-existing interpretations. Finally, in some implementations, this is done without loss of efficiency.

IPC Classes ?

G06F 40/30 - Semantic analysis
G06F 8/20 - Software design

93. Factored neural networks for language modeling

Application Number	16228278
Grant Number	11100288
Status	In Force
Filing Date	2018-12-20
First Publication Date	2021-08-24
Grant Date	2021-08-24
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Gowayyed, Zizu Mont-Reynaud, Bernard

Abstract

A factored neural network estimates a conditional distribution of token probabilities using two smaller models, a class model and an index model. Every token has a unique class, and a unique index in the class. The two smaller models are trained independently but cooperate at inference time. Factoring with more than two models is possible. Networks can be recurrent. Factored neural networks for statistical language modelling treat words as tokens. In that context, classes capture linguistic regularities. Partitioning of words into classes keeps the number of classes and the maximum size of a class both low. Optimization of partitioning is by iteratively splitting and assembling classes.

IPC Classes ?

G06F 40/30 - Semantic analysis
G06N 3/04 - Architecture, e.g. interconnection topology
G06N 3/08 - Learning methods
G06F 40/284 - Lexical analysis, e.g. tokenisation or collocates

94. Neural acoustic model

Application Number	16790643
Grant Number	11392833
Status	In Force
Filing Date	2020-02-13
First Publication Date	2021-08-19
Grant Date	2022-07-19
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Wieman, Maisy Spencer, Andrew Carl L{hacek Over (i)}, Zìlì Vasconcelos, Cristina

Abstract

An audio processing system is described. The audio processing system uses a convolutional neural network architecture to process audio data, a recurrent neural network architecture to process at least data derived from an output of the convolutional neural network architecture, and a feed-forward neural network architecture to process at least data derived from an output of the recurrent neural network architecture. The feed-forward neural network architecture is configured to output classification scores for a plurality of sound units associated with speech. The classification scores indicate a presence of one or more sound units in the audio data. The convolutional neural network architecture has a plurality of convolutional groups arranged in series, where a convolutional group includes a combination of two data mappings arranged in parallel.

IPC Classes ?

G06N 3/08 - Learning methods
G10L 15/16 - Speech classification or search using artificial neural networks
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G06N 3/04 - Architecture, e.g. interconnection topology

95. Wake suppression for audio playing and listening devices

Application Number	16781214
Grant Number	11328721
Status	In Force
Filing Date	2020-02-04
First Publication Date	2021-08-05
Grant Date	2022-05-10
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Yang, Hsuan Zhang, Qìndí Heit, Warren S.

Abstract

IPC Classes ?

G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
G10L 15/08 - Speech classification or search

96. Providing a platform for configuring device-specific speech recognition and using a platform for configuring device-specific speech recognition

Application Number	17237003
Grant Number	11367448
Status	In Force
Filing Date	2021-04-21
First Publication Date	2021-08-05
Grant Date	2022-06-21
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Mohajer, Keyvan Patel, Mehul

Abstract

A method of providing a platform for configuring device-specific speech recognition is provided. The method includes providing a user interface for developers to select a set of at least two acoustic models appropriate for a specific type of a device, receiving, from a developer, a selection of the set of the at least two acoustic models, and configuring a speech recognition system to perform device-specific speech recognition by using one acoustic model selected from the at least two acoustic models of the set.

IPC Classes ?

G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G06F 3/16 - Sound inputSound output
G10L 15/18 - Speech classification or search using natural language modelling

97. Building a natural language understanding application using a received electronic record containing programming code including an interpret-block, an interpret-statement, a pattern expression and an action statement

Application Number	17225997
Grant Number	11776533
Status	In Force
Filing Date	2021-04-08
First Publication Date	2021-07-22
Grant Date	2023-10-03
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Mont-Reynaud, Bernard Emami, Seyed M. Wilson, Chris Mohajer, Keyvan

Abstract

IPC Classes ?

G10L 15/00 - Speech recognition
G10L 15/18 - Speech classification or search using natural language modelling
G06F 40/205 - Parsing
G06F 8/30 - Creation or generation of source code
G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
H04M 3/493 - Interactive information services, e.g. directory enquiries

98. Voice morphing apparatus having adjustable parameters

Application Number	16740440
Grant Number	11600284
Status	In Force
Filing Date	2020-01-11
First Publication Date	2021-07-15
Grant Date	2023-03-07
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Pearson, Steve

Abstract

A voice morphing apparatus having adjustable parameters is described. The disclosed system and method include a voice morphing apparatus that morphs input audio to mask a speaker's identity. Parameter adjustment uses evaluation of an objective function that is based on the input audio and output of the voice morphing apparatus. The voice morphing apparatus includes objectives that are based adversarially on speaker identification and positively on audio fidelity. Thus, the voice morphing apparatus is adjusted to reduce identifiability of speakers while maintaining fidelity of the morphed audio. The voice morphing apparatus may be used as part of an automatic speech recognition system.

IPC Classes ?

G10L 21/013 - Adapting to target pitch
G10L 21/0208 - Noise filtering
G06N 20/00 - Machine learning
G06N 3/08 - Learning methods

99. Training a voice morphing apparatus

Application Number	16740378
Grant Number	11100940
Status	In Force
Filing Date	2020-01-10
First Publication Date	2021-06-24
Grant Date	2021-08-24
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Pearson, Steve

Abstract

Systems and methods for training a voice morphing apparatus are described. The voice morphing apparatus is trained to morph input audio data to mask an identity of a speaker. Training is performed by evaluating an objective function that is a function of the input audio data and an output of the voice morphing apparatus. The objective function may have a first term that is based on speaker identification and a second term that is based on audio fidelity. By optimizing the objective function, parameters of the voice morphing apparatus may be adjusted so as to reduce a confidence of speaker identification and maintain an audio fidelity of the morphed audio data. The voice morphing apparatus, once trained, may be used as part of an automatic speech recognition system.

IPC Classes ?

G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
G10L 17/04 - Training, enrolment or model building
G10L 17/00 - Speaker identification or verification techniques
G10L 15/26 - Speech to text systems
G10L 21/013 - Adapting to target pitch
G10L 25/18 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
G10L 21/0364 - Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility

100. Neural network training from private data

Application Number	16716497
Grant Number	11551083
Status	In Force
Filing Date	2019-12-17
First Publication Date	2021-06-17
Grant Date	2023-01-10
Owner	SOUNDHOUND AI IP, LLC (USA) SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor	Li, Zili Amirguliyev, Asif Probell, Jonah

Abstract

Training and enhancement of neural network models, such as from private data, are described. A slave device receives a version of a neural network model from a master. The slave accesses a local and/or private data source and uses the data to perform optimization of the neural network model. This can be done such as by computing gradients or performing knowledge distillation to locally train an enhanced second version of the model. The slave sends the gradients or enhanced neural network model to a master. The master may use the gradient or second version of the model to improve a master model.

IPC Classes ?

G06N 3/08 - Learning methods
H04L 67/10 - Protocols in which an application is distributed across nodes in the network
H04L 41/082 - Configuration setting characterised by the conditions triggering a change of settings the condition being updates or upgrades of network functionality
G06N 3/04 - Architecture, e.g. interconnection topology

1 2 Next Page

Soundhound AI IP, LLC

1.

DERIVING ACOUSTIC FEATURES AND LINGUISTIC FEATURES FROM RECEIVED SPEECH AUDIO

2.

METHOD AND SYSTEM FOR ACOUSTIC MODEL CONDITIONING ON NON-PHONEME INFORMATION FEATURES

3.

ARTIFICIAL INTELLIGENCE SMART ANSWERING ARCHITECTURE

4.

ARTIFICIAL INTELLIGENCE SMART ANSWERING ARCHITECTURE

5.

METHOD AND SYSTEM FOR CONVERSATION TRANSCRIPTION WITH METADATA

6.

USING A SPECIALIST GRAMMAR TO ENABLE ORDERING FROM A MENU USING NATURAL LANGUAGE

7.

CONTENT FILTERING IN MEDIA PLAYING DEVICES

8.

QUERY-SPECIFIC TARGETED AD DELIVERY

9.

MACHINE LEARNING SYSTEM FOR DIGITAL ASSISTANTS

10.

AUTOMATIC LEARNING OF ENTITIES, WORDS, PRONUNCIATIONS, AND PARTS OF SPEECH

11.

SYSTEM AND METHOD FOR VOICE MORPHING IN A DATA ANNOTATOR TOOL

12.

SERVER SUPPORTED RECOGNITION OF WAKE PHRASES

13.

SPONSORED SEARCH RANKING SIMULATION FOR PATTERNS TRIGGERED BY NATURAL LANGUAGE QUERIES

14.

AUTOMATIC SYNCHRONIZATION FOR AN OFFLINE VIRTUAL ASSISTANT

15.

ENABLING NATURAL LANGUAGE INTERACTIONS WITH USER INTERFACES FOR USERS OF A SOFTWARE APPLICATION

16.

METHOD AND SYSTEM FOR CONVERSATION TRANSCRIPTION WITH METADATA

17.

DYNAMIC SERVICE LEVEL ASSIGNMENT SYSTEM FOR DATA PROCESSING MANAGER

18.

METHOD FOR PROVIDING INFORMATION, METHOD FOR GENERATING DATABASE, AND PROGRAM

19.

MULTI-MODAL AUDIO PROCESSING

20.

SEMANTICALLY CONDITIONED VOICE ACTIVITY DETECTION

21.

MULTI-PARTICIPANT VOICE ORDERING

22.

MULTI-PARTICIPANT VOICE ORDERING

23.

Sponsored search ranking simulation for patterns triggered by natural language queries

24.

SYSTEM AND METHOD FOR ADAPTED INTERACTIVE EXPERIENCES

25.

REAL-TIME NATURAL LANGUAGE PROCESSING AND FULFILLMENT

26.

REAL-TIME NATURAL LANGUAGE PROCESSING AND FULFILLMENT

27.

DOMAIN SPECIFIC NEURAL SENTENCE GENERATOR FOR MULTI-DOMAIN VIRTUAL ASSISTANTS

28.

TEXT-TO-SPEECH SYSTEM WITH VARIABLE FRAME RATE

29.

ADAPTING AN UTTERANCE CUT-OFF PERIOD WITH USER SPECIFIC PROFILE DATA

30.

Automatic Speech Recognition with Voice Personalization and Generalization

31.

MESSAGE PROCESSING METHOD, INFORMATION PROCESSING APPARATUS, AND PROGRAM

32.

VIRTUAL ASSISTANT DOMAIN FUNCTIONALITY

33.

Authorization of Action by Voice Identification

34.

USING SEMANTIC GRAMMAR EXTENSIBILITY FOR COLLECTIVE ARTIFICIAL INTELLIGENCE

35.

MEANING INFERENCE FROM SPEECH AUDIO

36.

TRAINING A DEVICE SPECIFIC ACOUSTIC MODEL

37.

BUILDING A NATURAL LANGUAGE UNDERSTANDING APPLICATION USING A RECEIVED ELECTRONIC RECORD CONTAINING PROGRAMMING CODE INCLUDING AN INTERPRET-BLOCK, AN INTERPRET-STATEMENT, A PATTERN EXPRESSION AND AN ACTION STATEMENT

38.

NEURAL SPEECH-TO-MEANING

39.

PRE-WAKEWORD SPEECH PROCESSING

40.