Soundhound AI IP, LLC

United States of America

Back to Profile

1-100 of 190 for Soundhound AI IP, LLC Sort by
Query
Aggregations
IP Type
        Patent 186
        Trademark 4
Jurisdiction
        United States 187
        World 3
Date
New (last 4 weeks) 2
2025 February 2
2025 January 4
2024 December 2
2024 November 3
See more
IPC Class
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog 75
G10L 15/18 - Speech classification or search using natural language modelling 54
G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice 32
G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications 32
G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit 29
See more
NICE Class
09 - Scientific and electric apparatus and instruments 3
42 - Scientific, technological and industrial services, research and design 3
38 - Telecommunications services 1
Status
Pending 52
Registered / In Force 138
  1     2        Next Page

1.

DERIVING ACOUSTIC FEATURES AND LINGUISTIC FEATURES FROM RECEIVED SPEECH AUDIO

      
Application Number 18945442
Status Pending
Filing Date 2024-11-12
First Publication Date 2025-02-27
Owner SoundHound AI IP, LLC (USA)
Inventor
  • Lokeswarappa, Kiran Garaga
  • Gedalius, Joel
  • Mont-Reynaud, Bernard
  • Huang, Jun

Abstract

A computer-implemented method is provided. The method including receiving speech audio of dictation associated with a user ID, deriving acoustic features from the speech audio, storing the derived acoustic features in a user profile associated with the user ID, receiving a request for acoustic features through an application programming interface (API), the request including the user ID, and sending the derived acoustic features through the API.

IPC Classes  ?

  • G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
  • G06F 40/205 - Parsing
  • G06F 40/211 - Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
  • G06F 40/253 - Grammatical analysisStyle critique
  • G06N 20/00 - Machine learning
  • G06Q 30/0241 - Advertisements
  • G06Q 30/0251 - Targeted advertisements
  • G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
  • G10L 15/18 - Speech classification or search using natural language modelling
  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G10L 15/26 - Speech to text systems
  • G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
  • G10L 25/90 - Pitch determination of speech signals
  • H04L 67/306 - User profiles

2.

METHOD AND SYSTEM FOR ACOUSTIC MODEL CONDITIONING ON NON-PHONEME INFORMATION FEATURES

      
Application Number 18928627
Status Pending
Filing Date 2024-10-28
First Publication Date 2025-02-13
Owner SoundHound AI IP, LLC. (USA)
Inventor
  • Gowayyed, Zizu
  • Mohajer, Keyvan

Abstract

A method and system for acoustic model conditioning on non-phoneme information features for optimized automatic speech recognition is provided. The method includes using an encoder model to encode sound embedding from a known key phrase of speech and conditioning an acoustic model with the sound embedding to optimize its performance in inferring the probabilities of phonemes in the speech. The sound embedding can comprise non-phoneme information related to the key phrase and the following utterance. Further, the encoder model and the acoustic model can be neural networks that are jointly trained with audio data.

IPC Classes  ?

  • G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
  • G10L 15/04 - SegmentationWord boundary detection
  • G10L 15/08 - Speech classification or search
  • G10L 15/16 - Speech classification or search using artificial neural networks
  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks

3.

ARTIFICIAL INTELLIGENCE SMART ANSWERING ARCHITECTURE

      
Application Number US2024038711
Publication Number 2025/024260
Status In Force
Filing Date 2024-07-19
Publication Date 2025-01-30
Owner SOUNDHOUND AI IP, LLC (USA)
Inventor
  • Stonehocker, Timothy P.
  • Mohajer, Kamyar

Abstract

An automated answering system and method are disclosed for use in providing automated customer service. The automated answering system uses generative artificial intelligence to aid in forming a knowledgebase of information regarding a merchant's business that is used in answering the customer queries. The automated answering system of the present technology also uses generative artificial intelligence to aid in formulating a response to queries using the formed knowledgebase.

IPC Classes  ?

4.

ARTIFICIAL INTELLIGENCE SMART ANSWERING ARCHITECTURE

      
Application Number 18356659
Status Pending
Filing Date 2023-07-21
First Publication Date 2025-01-23
Owner SoundHound AI IP, LLC (USA)
Inventor
  • Stonehocker, Timothy P.
  • Mohajer, Kamyar

Abstract

An automated answering system and method are disclosed for use in providing automated customer service. The automated answering system uses generative artificial intelligence to aid in forming a knowledgebase of information regarding a merchant's business that is used in answering the customer queries. The automated answering system of the present technology also uses generative artificial intelligence to aid in formulating a response to queries using the formed knowledgebase.

IPC Classes  ?

5.

METHOD AND SYSTEM FOR CONVERSATION TRANSCRIPTION WITH METADATA

      
Application Number 18889219
Status Pending
Filing Date 2024-09-18
First Publication Date 2025-01-09
Owner SoundHound AI IP, LLC. (USA)
Inventor
  • Bradley, Kiersten L.
  • Coeytaux, Ethan
  • Yin, Ziming

Abstract

Methods and systems for enabling an efficient review of meeting content via a metadata-enriched, speaker-attributed and multiuser-editable transcript are disclosed. By incorporating speaker diarization and other metadata, the system can provide a structured and effective way to review and/or edit the transcript by one or more editors. One type of metadata can be image or video data to represent the meeting content. Furthermore, the present subject matter utilizes a multimodal diarization model to identify and label different speakers. The system can synchronize various sources of data, e.g., audio channel data, voice feature vectors, acoustic beamforming, image identification, and extrinsic data, to implement speaker diarization.

IPC Classes  ?

  • G10L 15/26 - Speech to text systems
  • G06F 40/134 - Hyperlinking
  • G06F 40/166 - Editing, e.g. inserting or deleting
  • G06F 40/284 - Lexical analysis, e.g. tokenisation or collocates
  • G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
  • G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
  • G10L 15/07 - Adaptation to the speaker

6.

USING A SPECIALIST GRAMMAR TO ENABLE ORDERING FROM A MENU USING NATURAL LANGUAGE

      
Application Number 18891119
Status Pending
Filing Date 2024-09-20
First Publication Date 2025-01-09
Owner SOUNDHOUND AI IP, LLC (USA)
Inventor
  • Aung, Joe Kyaw Soe
  • Garcia, Vincent
  • Ren, Junru

Abstract

A computer system ingests a catalog of a plurality of items. The catalog is specific to a particular domain and including names for individual items of the plurality of items. One or more attributes are respectively associated to the individual items of the plurality of items. A specialist grammar specific to the particular domain of the catalog is obtained and used to interpret natural language input related to the catalog based on the names for the individual items of the plurality of items and their associated one or more attributes.

IPC Classes  ?

7.

CONTENT FILTERING IN MEDIA PLAYING DEVICES

      
Application Number 18823308
Status Pending
Filing Date 2024-09-03
First Publication Date 2024-12-26
Owner SoundHound AI IP, LLC. (USA)
Inventor
  • Khov, Thor S.
  • Kong, Terry

Abstract

Various approaches relate to user defined content filtering in media playing devices of undesirable content represented in stored and real-time content from content providers. For example, video, image, and/or audio data can be analyzed to identify and classify content included in the data using various classification models and object and text recognition approaches. Thereafter, the identification and classification can be used to control presentation and/or access to the content and/or portions of the content. For example, based on the classification, portions of the content can be modified (e.g., replaced, removed, degraded, etc.) using one or more techniques (e.g., media replacement, media removal, media degradation, etc.) and then presented.

IPC Classes  ?

  • H04N 21/454 - Content filtering, e.g. blocking advertisements
  • G06N 3/045 - Combinations of networks
  • G06V 20/40 - ScenesScene-specific elements in video content
  • H04N 21/44 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
  • H04N 21/466 - Learning process for intelligent management, e.g. learning user preferences for recommending movies

8.

QUERY-SPECIFIC TARGETED AD DELIVERY

      
Application Number 18811530
Status Pending
Filing Date 2024-08-21
First Publication Date 2024-12-12
Owner SoundHound AI IP, LLC (USA)
Inventor
  • Master, Aaron
  • Mohajer, Keyvan

Abstract

An audio recognition system provides for delivery of promotional content to its user. A user interface device, locally or with the assistance of a network-connected server, performs recognition of audio in response to queries. Recognition can be through a method such as processing features extracted from the audio. Audio can comprise recorded music, singing or humming, instrumental music, vocal music, spoken voice, or other recognizable types of audio. Campaign managers provide promotional content for delivery in response to audio recognized in queries.

IPC Classes  ?

  • G06Q 30/0251 - Targeted advertisements
  • G06F 16/60 - Information retrievalDatabase structures thereforFile system structures therefor of audio data
  • G06Q 30/0241 - Advertisements
  • G06Q 30/0273 - Determination of fees for advertising

9.

MACHINE LEARNING SYSTEM FOR DIGITAL ASSISTANTS

      
Application Number 18780970
Status Pending
Filing Date 2024-07-23
First Publication Date 2024-11-14
Owner SoundHound AI IP, LLC. (USA)
Inventor
  • Singh, Pranav
  • Zhang, Yilun
  • Mohajer, Keyvan
  • Fazeli, Mohammadreza

Abstract

A machine learning system for a digital assistant is described, together with a method of training such a system. The machine learning system is based on an encoder-decoder sequence-to-sequence neural network architecture trained to map input sequence data to output sequence data, where the input sequence data relates to an initial query and the output sequence data represents canonical data representation for the query. The method of training involves generating a training dataset for the machine learning system. The method involves clustering vector representations of the query data samples to generate canonical-query original-query pairs in training the machine learning system.

IPC Classes  ?

10.

AUTOMATIC LEARNING OF ENTITIES, WORDS, PRONUNCIATIONS, AND PARTS OF SPEECH

      
Application Number 18783423
Status Pending
Filing Date 2024-07-25
First Publication Date 2024-11-14
Owner SoundHound AI IP, LLC. (USA)
Inventor Relin, Anton V.

Abstract

Systems for automatic speech recognition and/or natural language understanding automatically learn new words by finding subsequences of phonemes that, if they were a new word, would enable a successful tokenization of a phoneme sequence. Systems can learn alternate pronunciations of words by finding phoneme sequences with a small edit distance to existing pronunciations. Systems can learn the part of speech of words by finding part-of-speech variations that would enable parses by syntactic grammars. Systems can learn what types of entities a word describes by finding sentences that could be parsed by a semantic grammar but for the words not being on an entity list.

IPC Classes  ?

  • G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
  • G10L 15/14 - Speech classification or search using statistical models, e.g. Hidden Markov Models [HMM]
  • G10L 15/19 - Grammatical context, e.g. disambiguation of recognition hypotheses based on word sequence rules

11.

SYSTEM AND METHOD FOR VOICE MORPHING IN A DATA ANNOTATOR TOOL

      
Application Number 18778301
Status Pending
Filing Date 2024-07-19
First Publication Date 2024-11-07
Owner SoundHound AI IP, LLC. (USA)
Inventor Ross, Dylan H.

Abstract

A system and method for masking an identity of a speaker of natural language speech, such as speech clips to be labeled by humans in a system generating voice transcriptions for training an automatic speech recognition model. The natural language speech is morphed prior to being presented to the human for labeling. In one embodiment, morphing comprises pitch shifting the speech randomly either up or down, then frequency shifting the speech, then pitch shifting the speech in a direction opposite the first pitch shift. Labeling the morphed speech comprises at least one or more of transcribing the morphed speech, identifying a gender of the speaker, identifying an accent of the speaker, and identifying a noise type of the morphed speech.

IPC Classes  ?

  • G06F 40/56 - Natural language generation
  • G06F 40/58 - Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
  • G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
  • G10L 15/18 - Speech classification or search using natural language modelling
  • G10L 19/125 - Pitch excitation, e.g. pitch synchronous innovation CELP [PSI-CELP]
  • G10L 19/26 - Pre-filtering or post-filtering
  • G10L 21/013 - Adapting to target pitch

12.

SERVER SUPPORTED RECOGNITION OF WAKE PHRASES

      
Application Number 18771489
Status Pending
Filing Date 2024-07-12
First Publication Date 2024-10-31
Owner SoundHound AI IP, LLC. (USA)
Inventor
  • Jain, Newton
  • Zaheer, Sameer Syed

Abstract

A server supports multiple virtual assistants. It receives requests that include wake phrase audio and an identification of the source of the request, such as a virtual assistant device. Based on the identification, the server searches a database for a wake phrase detector appropriate for the identified source. The server then applies the wake phrase detector to the received wake phrase audio. If the wake phrase audio triggers the wake phrase detector, the server provides an appropriate response to the source.

IPC Classes  ?

  • G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
  • G06F 8/41 - Compilation
  • G10L 15/08 - Speech classification or search
  • G10L 15/16 - Speech classification or search using artificial neural networks

13.

SPONSORED SEARCH RANKING SIMULATION FOR PATTERNS TRIGGERED BY NATURAL LANGUAGE QUERIES

      
Application Number 18665264
Status Pending
Filing Date 2024-05-15
First Publication Date 2024-10-17
Owner SoundHound AI IP, LLC (USA)
Inventor
  • Mont-Reynaud, Bernard
  • Mohajer, Keyvan
  • Mohajer, Kamyar
  • Wilson, Chris

Abstract

The technology disclosed relates to natural language understanding-based search engines, ranking sponsored search results and simulated ranking of sponsored search results. Tools and methods describe how to simulate the ranking of sponsored search results. The tools further identify instances of user queries within the scope of trigger patterns, optionally providing examples both of user queries for which a sponsored search result is likely to be displayed and examples for which the sponsored search result will not rank highly enough to be displayed, at least on the first page of search results.

IPC Classes  ?

14.

AUTOMATIC SYNCHRONIZATION FOR AN OFFLINE VIRTUAL ASSISTANT

      
Application Number 18752481
Status Pending
Filing Date 2024-06-24
First Publication Date 2024-10-17
Owner SoundHound AI IP, LLC (USA)
Inventor Stahl, Karl

Abstract

[Object] Technology is provided to enable a mobile terminal to function as a digital assistant even when the mobile terminal is in a state where it cannot communicate with a server apparatus. [Solution] When a user terminal 200 receives a query A from a user, user terminal 200 sends query A to a server 100. Server 100 interprets the meaning of query A using a grammar A. Server 100 obtains a response to query A based on the meaning of query A and sends the response to user terminal 200. Server 100 further sends grammar A to user terminal 200. That is, server 100 sends to user terminal 200 a grammar used to interpret the query received from user terminal 200.

IPC Classes  ?

  • G10L 15/19 - Grammatical context, e.g. disambiguation of recognition hypotheses based on word sequence rules
  • G06F 16/242 - Query formulation
  • G06F 40/253 - Grammatical analysisStyle critique
  • G10L 15/07 - Adaptation to the speaker
  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

15.

ENABLING NATURAL LANGUAGE INTERACTIONS WITH USER INTERFACES FOR USERS OF A SOFTWARE APPLICATION

      
Application Number 18739011
Status Pending
Filing Date 2024-06-10
First Publication Date 2024-10-03
Owner SoundHound AI IP, LLC (USA)
Inventor
  • Yabas, Utku
  • Hubert, Philipp
  • Stahl, Karl

Abstract

A user specifies a natural language command to a device. Software on the device generates contextual metadata about the user interface of the device, such as data about all visible elements of the user interface, and sends the contextual metadata along with the natural language command to a natural language understanding engine. The natural language understanding engine parses the natural language query using a stored grammar (e.g., a grammar provided by a maker of the device) and as a result of the parsing identifies information about the command (e.g., the user interface elements referenced by the command) and provides that information to the device. The device uses that provided information to respond to the command.

IPC Classes  ?

  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G06F 40/211 - Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
  • G06F 40/284 - Lexical analysis, e.g. tokenisation or collocates
  • G10L 15/183 - Speech classification or search using natural language modelling using context dependencies, e.g. language models
  • G10L 15/26 - Speech to text systems

16.

METHOD AND SYSTEM FOR CONVERSATION TRANSCRIPTION WITH METADATA

      
Application Number 18743562
Status Pending
Filing Date 2024-06-14
First Publication Date 2024-10-03
Owner SoundHound AI IP, LLC. (USA)
Inventor
  • Bradley, Kiersten L.
  • Coeytaux, Ethan
  • Yin, Ziming

Abstract

Methods and systems for enabling an efficient review of meeting content via a metadata-enriched, speaker-attributed transcript are disclosed. By incorporating speaker diarization and other metadata, the system can provide a structured and effective way to review and/or edit the transcript. One type of metadata can be image or video data to represent the meeting content. Furthermore, the present subject matter utilizes a multimodal diarization model to identify and label different speakers. The system can synchronize various sources of data, e.g., audio channel data, voice feature vectors, acoustic beamforming, image identification, and extrinsic data, to implement speaker diarization.

IPC Classes  ?

  • G10L 15/26 - Speech to text systems
  • G06F 40/134 - Hyperlinking
  • G06F 40/166 - Editing, e.g. inserting or deleting
  • G06F 40/284 - Lexical analysis, e.g. tokenisation or collocates
  • G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
  • G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
  • G10L 15/07 - Adaptation to the speaker

17.

DYNAMIC SERVICE LEVEL ASSIGNMENT SYSTEM FOR DATA PROCESSING MANAGER

      
Application Number 18637771
Status Pending
Filing Date 2024-04-17
First Publication Date 2024-09-05
Owner SOUNDHOUND AI IP, LLC (USA)
Inventor
  • Stonehocker, Tim
  • Gowayyed, Zizo
  • Emami, Mijad
  • Eichstaedt, Matthias
  • Jiang, Evelyn
  • Berryhill, Ryan
  • Ramona, Mathieu
  • Veira, Neil

Abstract

A data processing system includes a queue manager receiving data processing requests and determining a queue depth representing the number of pending requests. A load supervisor assigns a service level to each request based on the queue depth when the request is at the head of the queue. The system offers two service levels, with the second level requiring fewer computing resources than the first. This dynamic management system optimizes resource allocation by adjusting service levels based on the workload, ensuring efficient processing of data requests.

IPC Classes  ?

  • G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
  • G10L 15/16 - Speech classification or search using artificial neural networks
  • G10L 15/26 - Speech to text systems

18.

METHOD FOR PROVIDING INFORMATION, METHOD FOR GENERATING DATABASE, AND PROGRAM

      
Application Number 18662973
Status Pending
Filing Date 2024-05-13
First Publication Date 2024-09-05
Owner SoundHound AI IP, LLC. (USA)
Inventor
  • Naito, Masaki
  • Tsuchida, Keisuke
  • Yoneyama, Jun
  • Sawada, Kaku

Abstract

As audio (1) is input to an extension of a browser, the extension transmits the audio (1) to a language processing server. A speech recognition unit obtains a text (1) corresponding to the audio (1), and transmits the text (1) to a natural language understanding unit. In the natural language understanding unit, an information processing unit identifies a URL (1) corresponding to the text (1), and transmits the URL (1) to the browser. The extension passes the URL (1) to a browsing function. The browsing function uses the URL (1) to access a web server. The web server transmits a web page (1) corresponding to the URL (1) to the browser. The browsing function shows a screen corresponding to the web page (1) on a display.

IPC Classes  ?

  • G06F 16/955 - Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
  • G06F 16/33 - Querying
  • G06F 40/40 - Processing or translation of natural language
  • G10L 15/26 - Speech to text systems

19.

MULTI-MODAL AUDIO PROCESSING

      
Application Number 18642492
Status Pending
Filing Date 2024-04-22
First Publication Date 2024-08-15
Owner SOUNDHOUND AI IP, LLC (USA)
Inventor Stahl, Karl

Abstract

A method for processing an audio signal involves receiving sound waves at a microphone, converting them into a first audio signal, and extracting a second audio signal from an electromagnetic signal received at a receiver. The first audio signal is correlated with the second audio signal to calculate a correlation value. If the correlation value exceeds a threshold, the first audio signal is processed using the second audio signal to reduce unwanted sound contributions, resulting in a processed audio signal. Further processing is then performed on the processed audio signal to determine a characteristic of the desired sound.

IPC Classes  ?

  • H04R 1/10 - EarpiecesAttachments therefor
  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G10L 21/0316 - Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
  • G10L 25/06 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being correlation coefficients
  • G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
  • H04R 1/08 - MouthpiecesAttachments therefor
  • H04R 5/033 - Headphones for stereophonic communication

20.

SEMANTICALLY CONDITIONED VOICE ACTIVITY DETECTION

      
Application Number 18047650
Status Pending
Filing Date 2022-10-19
First Publication Date 2024-07-11
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor Leitman, Victor

Abstract

A method includes recognizing words comprised by a first utterance; interpreting the recognized words according to a grammar comprised by a domain; from the interpreting of the recognized words, determining a timeout period for the first utterance based on the domain of the first utterance; detecting end of voice activity in the first utterance; executing an instruction following an amount of time after detecting end of voice activity of the first utterance in response to the amount of time exceeding the timeout period, the executed instruction based at least in part on interpreting the recognized words.

IPC Classes  ?

  • G10L 15/197 - Probabilistic grammars, e.g. word n-grams
  • G10L 15/18 - Speech classification or search using natural language modelling
  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G10L 25/78 - Detection of presence or absence of voice signals

21.

MULTI-PARTICIPANT VOICE ORDERING

      
Application Number 18391886
Status Pending
Filing Date 2023-12-21
First Publication Date 2024-06-27
Owner SoundHound AI IP, LLC (USA)
Inventor
  • Macrae, Robert
  • Grossman, Jon
  • Halstvedt, Scott

Abstract

A voice interface recognizes spoken utterances from multiple users. It responds to the utterances in ways such as modifying the attributes of instances of items. The voice interface computes a voice vector for each utterance and associates it with the item instance that is modified. For following utterances with a closely matching voice vector, the voice interface modifies the same instance. For following utterances with a voice vector that is not a close match to one stored for any item instance, the voice interface modifies a different item instance.

IPC Classes  ?

  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G06F 3/16 - Sound inputSound output
  • G06Q 50/12 - Hotels or restaurants
  • G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit

22.

MULTI-PARTICIPANT VOICE ORDERING

      
Application Number US2023085627
Publication Number 2024/138102
Status In Force
Filing Date 2023-12-22
Publication Date 2024-06-27
Owner SOUNDHOUND AI IP, LLC (USA)
Inventor
  • Macrae, Robert
  • Grossman, Jon
  • Halstvedt, Scott

Abstract

A voice interface recognizes spoken utterances from multiple users. It responds to the utterances in ways such as modifying the attributes of instances of items. The voice interface computes a voice vector for each utterance and associates it with the item instance that is modified. For following utterances with a closely matching voice vector, the voice interface modifies the same instance. For following utterances with a voice vector that is not a close match to one stored for any item instance, the voice interface modifies a different item instance.

IPC Classes  ?

  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G06F 3/16 - Sound inputSound output
  • G06Q 30/0601 - Electronic shopping [e-shopping]
  • G10L 17/00 - Speaker identification or verification techniques

23.

Sponsored search ranking simulation for patterns triggered by natural language queries

      
Application Number 16728389
Grant Number 12013862
Status In Force
Filing Date 2019-12-27
First Publication Date 2024-06-18
Grant Date 2024-06-18
Owner SoundHound AI IP, LLC (USA)
Inventor
  • Mont-Reynaud, Bernard
  • Mohajer, Keyvan
  • Mohajer, Kamyar
  • Wilson, Chris

Abstract

The technology disclosed relates to natural language understanding-based search engines, ranking sponsored search results and simulated ranking of sponsored search results. Tools and methods describe how to simulate the ranking of sponsored search results. The tools further identify instances of user queries within the scope of trigger patterns, optionally providing examples both of user queries for which a sponsored search result is likely to be displayed and examples for which the sponsored search result will not rank highly enough to be displayed, at least on the first page of search results.

IPC Classes  ?

24.

SYSTEM AND METHOD FOR ADAPTED INTERACTIVE EXPERIENCES

      
Application Number 18440935
Status Pending
Filing Date 2024-02-13
First Publication Date 2024-06-06
Owner SoundHound AI IP, LLC (USA)
Inventor
  • Mckenzie, Joel
  • Zhang, Qindi

Abstract

Natural language grammars interpret expressions at the conversational human-machine interfaces of devices. Under conditions favoring engagement, as specified in a unit of conversational code, the device initiates a discussion using one or more of TTS, images, video, audio, and animation depending on the device capabilities of screen and audio output. Conversational code units specify conditions based on conversation state, mood, and privacy. Grammars provide intents that cause calls to system functions. Units can provide scripts for guiding the conversation. The device, or supporting server system, can provide feedback to creators of the conversational code units for analysis and machine learning.

IPC Classes  ?

  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G06Q 30/0251 - Targeted advertisements
  • G10L 15/18 - Speech classification or search using natural language modelling
  • G10L 15/19 - Grammatical context, e.g. disambiguation of recognition hypotheses based on word sequence rules

25.

REAL-TIME NATURAL LANGUAGE PROCESSING AND FULFILLMENT

      
Application Number US2023079577
Publication Number 2024/107682
Status In Force
Filing Date 2023-11-14
Publication Date 2024-05-23
Owner SOUNDHOUND AI IP, LLC (USA)
Inventor
  • Grossmann, Jon
  • Macrae, Robert
  • Halstvedt, Scott
  • Mohajer, Keyvan

Abstract

A system and method of real-time feedback confirmation to solicit a virtual assistant response from an evolving semantic state of at least a portion of an utterance. A user accesses a virtual assistant on an electronic device having the system and/or method configured to capture a command, a question, and/or a fulfillment request from audio such as, the speech emitted from the speaking user. The speech may be intercepted by a speech engine configured to transcribe the speech into text that is matched with the fragment pattern's regular expression to generate a fragment and/or the speech may be processed with a machine learning model to identify fragments. The fragments are identified by a domain handler configured to update a data structure of the current semantic state of the utterance in real-time on an interface of an electronic device.

IPC Classes  ?

  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog

26.

REAL-TIME NATURAL LANGUAGE PROCESSING AND FULFILLMENT

      
Application Number 18055821
Status Pending
Filing Date 2022-11-15
First Publication Date 2024-05-16
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Grossmann, Jon
  • Macrae, Robert
  • Halstvedt, Scott
  • Mohajer, Keyvan

Abstract

A system and method of real-time feedback confirmation to solicit a virtual assistant response from an evolving semantic state of at least a portion of an utterance. A user accesses a virtual assistant on an electronic device having the system and/or method configured to capture a command, a question, and/or a fulfillment request from audio such as, the speech emitted from the speaking user. The speech may be intercepted by a speech engine configured to transcribe the speech into text that is matched with the fragment pattern's regular expression to generate a fragment and/or the speech may be processed with a machine learning model to identify fragments. The fragments are identified by a domain handler configured to update a data structure of the current semantic state of the utterance in real-time on an interface of an electronic device.

IPC Classes  ?

  • G10L 15/18 - Speech classification or search using natural language modelling
  • G06F 3/16 - Sound inputSound output
  • G06F 40/30 - Semantic analysis
  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog

27.

DOMAIN SPECIFIC NEURAL SENTENCE GENERATOR FOR MULTI-DOMAIN VIRTUAL ASSISTANTS

      
Application Number 18050182
Status Pending
Filing Date 2022-10-27
First Publication Date 2024-05-02
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Singh, Pranav
  • Zhang, Yilun
  • Na, Eunjee
  • Bettaglio, Olivia

Abstract

Automatically generating sentences that a user can say to invoke a set of defined actions performed by a virtual assistant are disclosed. A sentence is received and keywords are extracted from the sentence. Based on the keywords, additional sentences are generated. A classifier model is applied to the generated sentences to determine a sentence that satisfies a threshold. In the situation a sentence satisfies the threshold, an intent associated with the classifier model can be invoked. In the situation the sentences fail to satisfy the classifier model, the virtual assistant can attempt to interpret the received sentence according to the most likely intent by invoking a sentence generation model fine-tuned for a particular domain, generate additional sentences with a high probability of having the same intent and fulfill the specific action defined by the intent.

IPC Classes  ?

  • G10L 15/18 - Speech classification or search using natural language modelling
  • G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog

28.

TEXT-TO-SPEECH SYSTEM WITH VARIABLE FRAME RATE

      
Application Number 18051507
Status Pending
Filing Date 2022-10-31
First Publication Date 2024-05-02
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Pearson, Steve
  • Grossman, Jon

Abstract

A neural TTS system is trained to generate key acoustic frames at variable rates while omitting other frames. The frame skipping depends on the acoustic features to be generated for the input text. The TTS system can interpolate frames between the key frames at a target rate for a vocoder to synthesis audio samples.

IPC Classes  ?

  • G10L 13/047 - Architecture of speech synthesisers
  • G10L 13/06 - Elementary speech units used in speech synthesisersConcatenation rules

29.

ADAPTING AN UTTERANCE CUT-OFF PERIOD WITH USER SPECIFIC PROFILE DATA

      
Application Number 18401770
Status Pending
Filing Date 2024-01-02
First Publication Date 2024-04-25
Owner
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
  • SOUNDHOUND AI IP, LLC (USA)
Inventor
  • Aguayo, Patricia Pozon
  • Zhang, Jennifer Hee Young
  • Probell, Jonah

Abstract

A system detects a period of non-voice activity and compares its duration to a cutoff period. The system adapts the cutoff period based on parsing previously-recognized speech of a user that is stored on a user's device or the system, which detects the voice activity, to determine according to a model, such as a machine-learned model, the probability that the speech recognized so far is a prefix to a longer complete utterance. The cutoff period is longer when a parse of previously recognized speech, which is based on the user profile, has a high probability of being a prefix of a longer utterance.

IPC Classes  ?

  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G10L 15/05 - Word boundary detection
  • G10L 15/18 - Speech classification or search using natural language modelling
  • G10L 25/78 - Detection of presence or absence of voice signals

30.

Automatic Speech Recognition with Voice Personalization and Generalization

      
Application Number 18046137
Status Pending
Filing Date 2022-10-12
First Publication Date 2024-04-18
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor Mohajer, Keyvan

Abstract

A voice morphing model can transform diverse voices to one or a small number of target voices. An acoustic model can be trained for high accuracy on the target voices. Speech recognition on diverse voices can be performed by morphing it to a target voice and then performing recognition on audio with the target voice. The morphing model and an acoustic model for speech recognition can be trained separately or jointly. A voice morphing model can transform diverse voices to one or a small number of target voices. An acoustic model can be trained for high accuracy on the target voices. Speech recognition on diverse voices can be performed by morphing it to a target voice and then performing recognition on audio with the target voice. The morphing model and an acoustic model for speech recognition can be trained separately or jointly. A source of requests for speech recognition can pass audio and a voiceprint with requests. Speech recognition can run with improved accuracy by biasing an acoustic model for the voice in the audio using the voiceprint. The audio can be used to calculate a new voiceprint, which can be used to update the voiceprint included with the audio. The updated voiceprint can be sent back to the source and then used with future speech recognition requests.

IPC Classes  ?

  • G10L 15/18 - Speech classification or search using natural language modelling
  • G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice

31.

MESSAGE PROCESSING METHOD, INFORMATION PROCESSING APPARATUS, AND PROGRAM

      
Application Number 18456219
Status Pending
Filing Date 2023-08-25
First Publication Date 2024-02-29
Owner SoundHound AI IP, LLC. (USA)
Inventor
  • Matsuda, Yuki
  • Tsuchida, Keisuke

Abstract

[Object] To provide a technique for more accurate interpretation of a message inputted by a user. [Object] To provide a technique for more accurate interpretation of a message inputted by a user. [Solving Means] An information processing server 300 obtains a first message from a user in a thread 001, has a context of the first message stored in a context database 500 in association with the thread 001, obtains a second message from the user in the thread 001, and provides the second message to a conversation server 400 together with the context of the first message.

IPC Classes  ?

  • H04L 51/04 - Real-time or near real-time messaging, e.g. instant messaging [IM]
  • H04L 51/02 - User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail using automatic reactions or user delegation, e.g. automatic replies or chatbot-generated messages
  • H04L 51/216 - Handling conversation history, e.g. grouping of messages in sessions or threads

32.

VIRTUAL ASSISTANT DOMAIN FUNCTIONALITY

      
Application Number 18493522
Status Pending
Filing Date 2023-10-24
First Publication Date 2024-02-15
Owner SoundHound AI IP, LLC (USA)
Inventor
  • Mohajer, Kamyar
  • Mohajer, Keyvan
  • Mont-Reynaud, Bernard
  • Singh, Pranav

Abstract

Aspects include methods, systems, and computer-program products providing virtual assistant domain functionality. A natural language query including one or more words is received. A collection of natural language modules is accessed. The collection natural language modules are configured to process sets of natural language queries. A natural language module, from the collection of natural language modules, is identified to interpret the natural language query. An interpretation of the natural language query is computed using the identified natural language module. A response to the natural language query is returned using the computed interpretation.

IPC Classes  ?

  • G06F 40/40 - Processing or translation of natural language
  • G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
  • G06Q 30/0283 - Price estimation or determination
  • G06Q 20/10 - Payment architectures specially adapted for electronic funds transfer [EFT] systemsPayment architectures specially adapted for home banking systems

33.

Authorization of Action by Voice Identification

      
Application Number 17818628
Status Pending
Filing Date 2022-08-09
First Publication Date 2024-02-15
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Hassan, Ahmadul
  • Hom, James

Abstract

Actions are authorized by computing a confidence score that exceeds a threshold. The confidence score is based on a match between metadata about requests and fields in corresponding database records. The confidences score weights matches by the dependability of the metadata for authentication. The confidence score is further based on the closeness of a sample of speech audio to a stored voiceprint. Additional identification may be required for authorization. The confidence score requirement may be relaxed based on identification in a buffer of recent action requests.

IPC Classes  ?

  • G06F 21/32 - User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
  • G10L 17/12 - Score normalisation
  • G06F 3/16 - Sound inputSound output

34.

USING SEMANTIC GRAMMAR EXTENSIBILITY FOR COLLECTIVE ARTIFICIAL INTELLIGENCE

      
Application Number 18381593
Status Pending
Filing Date 2023-10-18
First Publication Date 2024-02-08
Owner SOUNDHOUND AI IP, LLC (USA)
Inventor
  • Mont-Reynaud, Bernard
  • Wilson, Christopher S.
  • Mohajer, Keyvan

Abstract

Support for natural language expressions is provided by the use of semantic grammars that describe the structure of expressions in that grammar and that construct the meaning of a corresponding natural language expression. A semantic grammar extension mechanism is provided, which allows one semantic grammar to be used in the place of another semantic grammar. This enriches the expressivity of semantic grammars in a simple, natural, and decoupled manner.

IPC Classes  ?

35.

MEANING INFERENCE FROM SPEECH AUDIO

      
Application Number 18474853
Status Pending
Filing Date 2023-09-26
First Publication Date 2024-02-08
Owner SoundHound AI IP, LLC (USA)
Inventor
  • Krishnaswamy, Sudharsan
  • Wieman, Maisy
  • Probell, Jonah

Abstract

A system and method invoke virtual assistant action, which may comprise an argument. From audio, a probability of an intent is inferred. A probability of a domain and a plurality of variable values may also be inferred. Invoking the action is in response to the intent probability exceeding a threshold. Invoking the action may also be in response to the domain probability exceeding a threshold, a variable value probability exceeding a threshold, detecting an end of utterance, and a specific amount of time having elapsed. The intent probability may increase when the audio includes speech of words with the same meaning in multiple natural languages. Invoking the action may also be conditional on the variable value exceeding its threshold within a certain period of time of the intent probability exceeding its threshold.

IPC Classes  ?

  • G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
  • G10L 15/16 - Speech classification or search using artificial neural networks
  • G10L 15/18 - Speech classification or search using natural language modelling
  • G10L 13/02 - Methods for producing synthetic speechSpeech synthesisers
  • G10L 15/197 - Probabilistic grammars, e.g. word n-grams
  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G10L 15/187 - Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams

36.

TRAINING A DEVICE SPECIFIC ACOUSTIC MODEL

      
Application Number 18379618
Status Pending
Filing Date 2023-10-12
First Publication Date 2024-02-01
Owner SoundHound AI IP, LLC (USA)
Inventor
  • Mohajer, Keyvan
  • Patel, Mehul

Abstract

Custom acoustic models can be configured by developers by providing audio files with custom recordings. The custom acoustic model is trained by tuning a baseline model using the audio files. Audio files may contain custom noise to apply to clean speech for training. The custom acoustic model is provided as an alternative to a standard acoustic model. A speech recognition system can select an acoustic model for use upon receiving metadata about the device conditions or type. Speech recognition is performed on speech audio using one or more acoustic models. The result can be provided to developers through the user interface, and an error rate can be computed and also provided.

IPC Classes  ?

  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G06F 3/16 - Sound inputSound output
  • G10L 15/18 - Speech classification or search using natural language modelling

37.

BUILDING A NATURAL LANGUAGE UNDERSTANDING APPLICATION USING A RECEIVED ELECTRONIC RECORD CONTAINING PROGRAMMING CODE INCLUDING AN INTERPRET-BLOCK, AN INTERPRET-STATEMENT, A PATTERN EXPRESSION AND AN ACTION STATEMENT

      
Application Number 18375906
Status Pending
Filing Date 2023-10-02
First Publication Date 2024-01-25
Owner SoundHound AI IP, LLC. (USA)
Inventor
  • Mont-Reynaud, Bernard
  • Emami, Seyed M.
  • Wilson, Chris
  • Mohajer, Keyvan

Abstract

A method of building a natural language understanding application is provided. The method includes receiving at least one electronic record containing programming code and creating executable code from the programming code. Further, the executable code, when executed by a processor, causes the processor to create a parse and an interpretation of a sequence of input tokens, the programming code includes an interpret-block and the interpret-block includes an interpret-statement. Additionally, the interpret-statement includes a pattern expression and the interpret-statement includes an action statement.

IPC Classes  ?

  • G10L 15/18 - Speech classification or search using natural language modelling
  • G06F 40/205 - Parsing
  • G06F 8/30 - Creation or generation of source code
  • G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice

38.

NEURAL SPEECH-TO-MEANING

      
Application Number 18461212
Status Pending
Filing Date 2023-09-05
First Publication Date 2023-12-28
Owner SoundHound AI IP, LLC (USA)
Inventor
  • Krishnaswamy, Sudharsan
  • Wieman, Maisy
  • Probell, Jonah

Abstract

A neural speech-to-meaning system is trained on speech audio expressing specific intents. The system receives speech audio and produces indications of when the speech in the audio matches the intent. Intents may include variables that can have a large range of values, such as the names of places. The neural speech-to-meaning system simultaneously recognizes enumerated values of variables and general intents. Recognized variable values can serve as arguments to API requests made in response to recognized intents. Accordingly, neural speech-to-meaning supports voice virtual assistants that serve users based on API hits.

IPC Classes  ?

  • G10L 15/26 - Speech to text systems
  • G06F 3/16 - Sound inputSound output
  • G10L 15/18 - Speech classification or search using natural language modelling
  • G10L 15/183 - Speech classification or search using natural language modelling using context dependencies, e.g. language models
  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

39.

PRE-WAKEWORD SPEECH PROCESSING

      
Application Number 17804544
Status Pending
Filing Date 2022-05-27
First Publication Date 2023-11-30
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Stahl, Karl
  • Mont-Reynaud, Bernard

Abstract

Methods and systems for pre-wakeword speech processing are disclosed. Speech audio, comprising command speech spoken before a wakeword, may be stored in a buffer in oldest to newest order. Upon detection of the wakeword, reverse acoustic models and language models, such as reverse automatic speech recognition (R-ASR) can be applied to the buffered audio, in newest to oldest order, starting from before the wakeword. The speech is converted into a sequence of words. Natural language grammar models, such as natural language understanding (NLU), can be applied to match the sequence of words to a complete command, the complete command being associated with invoking a computer operation.

IPC Classes  ?

  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G10L 15/08 - Speech classification or search
  • G10L 25/93 - Discriminating between voiced and unvoiced parts of speech signals

40.

APPARATUS, PLATFORM, METHOD AND MEDIUM FOR INTENTION IMPORTANCE INFERENCE

      
Application Number 17820660
Status Pending
Filing Date 2022-08-18
First Publication Date 2023-11-30
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor Wang, Chong

Abstract

The application provides an apparatus, platform, method and medium for intention importance interference. The apparatus includes an interface configured to receive user-related information; and a processor coupled to the interface and configured to: extract data related to different aspects of a user from the user-related information; generate a plurality of intention probes based on the data related to different aspects of the user, each intention probe comprising an intention and associated data items; infer an importance of each intention probe by calculating a score of each associated data items of the intention probe based on the data related to different aspects of the user; and provide information associated with an intention probe with a highest importance.

IPC Classes  ?

  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G10L 15/16 - Speech classification or search using artificial neural networks
  • G06F 16/9535 - Search customisation based on user profiles and personalisation

41.

Using semantic grammar extensibility for collective artificial intelligence

      
Application Number 17377375
Grant Number 11829724
Status In Force
Filing Date 2021-07-16
First Publication Date 2023-11-28
Grant Date 2023-11-28
Owner SOUNDHOUND AI IP, LLC (USA)
Inventor
  • Mont-Reynaud, Bernard
  • Wilson, Christopher S.
  • Mohajer, Keyvan

Abstract

Support for natural language expressions is provided by the use of semantic grammars that describe the structure of expressions in that grammar and that construct the meaning of a corresponding natural language expression. A semantic grammar extension mechanism is provided, which allows one semantic grammar to be used in the place of another semantic grammar. This enriches the expressivity of semantic grammars in a simple, natural, and decoupled manner.

IPC Classes  ?

42.

Content filtering in media playing devices

      
Application Number 18348249
Grant Number 12126868
Status In Force
Filing Date 2023-07-06
First Publication Date 2023-11-02
Grant Date 2024-10-22
Owner SoundHound AI IP, LLC. (USA)
Inventor
  • Khov, Thor S.
  • Kong, Terry

Abstract

Various approaches relate to user defined content filtering in media playing devices of undesirable content represented in stored and real-time content from content providers. For example, video, image, and/or audio data can be analyzed to identify and classify content included in the data using various classification models and object and text recognition approaches. Thereafter, the identification and classification can be used to control presentation and/or access to the content and/or portions of the content. For example, based on the classification, portions of the content can be modified (e.g., replaced, removed, degraded, etc.) using one or more techniques (e.g., media replacement, media removal, media degradation, etc.) and then presented.

IPC Classes  ?

  • H04N 21/454 - Content filtering, e.g. blocking advertisements
  • G06N 3/045 - Combinations of networks
  • G06V 20/40 - ScenesScene-specific elements in video content
  • H04N 21/44 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
  • H04N 21/466 - Learning process for intelligent management, e.g. learning user preferences for recommending movies

43.

Method and system for acoustic model conditioning on non-phoneme information features

      
Application Number 18348259
Grant Number 12154546
Status In Force
Filing Date 2023-07-06
First Publication Date 2023-11-02
Grant Date 2024-11-26
Owner SoundHound AI IP, LLC. (USA)
Inventor
  • Gowayyed, Zizu
  • Mohajer, Keyvan

Abstract

A method and system for acoustic model conditioning on non-phoneme information features for optimized automatic speech recognition is provided. The method includes using an encoder model to encode sound embedding from a known key phrase of speech and conditioning an acoustic model with the sound embedding to optimize its performance in inferring the probabilities of phonemes in the speech. The sound embedding can comprise non-phoneme information related to the key phrase and the following utterance. Further, the encoder model and the acoustic model can be neural networks that are jointly trained with audio data.

IPC Classes  ?

  • G10L 15/00 - Speech recognition
  • G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
  • G10L 15/04 - SegmentationWord boundary detection
  • G10L 15/16 - Speech classification or search using artificial neural networks
  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G10L 15/08 - Speech classification or search
  • G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks

44.

SYSTEMS AND METHODS FOR GENERATING AND USING SHARED NATURAL LANGUAGE LIBRARIES

      
Application Number 18206567
Status Pending
Filing Date 2023-06-06
First Publication Date 2023-10-12
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor Mohajer, Keyvan

Abstract

Systems and methods for searching databases by sound data input are provided herein. A service provider may have a need to make their database(s) searchable through search technology. However, the service provider may not have the resources to implement such search technology. The search technology may allow for search queries using sound data input. The technology described herein provides a solution addressing the service provider’s need, by giving a search technology that furnishes search results in a fast, accurate manner. In further embodiments, systems and methods to monetize those search results are also described herein.

IPC Classes  ?

  • G06F 16/33 - Querying
  • G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
  • G10L 15/183 - Speech classification or search using natural language modelling using context dependencies, e.g. language models
  • G06F 16/174 - Redundancy elimination performed by the file system

45.

Ordering from a menu using natural language

      
Application Number 17716482
Grant Number 12124804
Status In Force
Filing Date 2022-04-08
First Publication Date 2023-09-14
Grant Date 2024-10-22
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Aung, Joe Kyaw Soe
  • Garcia, Vincent
  • Ren, Junru

Abstract

A computer system ingests a catalog of a plurality of items. The catalog is specific to a particular domain and including names for individual items of the plurality of items. One or more attributes are respectively associated to the individual items of the plurality of items. A specialist grammar specific to the particular domain of the catalog is obtained and a programming language code to interpret natural language input related to the catalog is generated using the specialist grammar, and the names for the individual items of the plurality of items and their associated one or more attributes.

IPC Classes  ?

  • G06F 17/00 - Digital computing or data processing equipment or methods, specially adapted for specific functions
  • G06F 40/295 - Named entity recognition
  • G06F 40/40 - Processing or translation of natural language
  • G06N 5/022 - Knowledge engineeringKnowledge acquisition
  • G10L 15/26 - Speech to text systems

46.

Multi-modal audio processing for voice-controlled devices

      
Application Number 18194885
Grant Number 11997448
Status In Force
Filing Date 2023-04-03
First Publication Date 2023-08-10
Grant Date 2024-05-28
Owner SOUNDHOUND AI IP, LLC (USA)
Inventor Stahl, Karl

Abstract

A voice-controlled device includes a microphone to receive a set of sound waves that includes speech uttered by a user and other sound, and to output a first audio signal that includes a contribution from the speech uttered by the user and a contribution from the other sound. The device also includes a receiver to receive an electromagnetic signal and to output a second audio signal obtained from the electromagnetic signal. An audio pre-processor of the device processes the first audio signal using the second audio signal to reduce the contribution from the other sound in a processed audio signal. The voice-controlled device then provides the processed audio signal to a speech recognition module to determine a voice command issued by the user.

IPC Classes  ?

  • H04R 25/00 - Deaf-aid sets
  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G10L 21/0316 - Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
  • G10L 25/06 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being correlation coefficients
  • G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
  • H04R 1/08 - MouthpiecesAttachments therefor
  • H04R 1/10 - EarpiecesAttachments therefor
  • H04R 5/033 - Headphones for stereophonic communication

47.

Token confidence scores for automatic speech recognition

      
Application Number 17649810
Grant Number 12223948
Status In Force
Filing Date 2022-02-03
First Publication Date 2023-08-03
Grant Date 2025-02-11
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Singh, Pranav
  • Mishra, Saraswati
  • Na, Eunjee

Abstract

Methods and systems for correction of a likely erroneous word in a speech transcription are disclosed. By evaluating token confidence scores of individual words or phrases, the automatic speech recognition system can replace a low-confidence score word with a substitute word or phrase. Among various approaches, neural network models can be used to generate individual confidence scores. Such word substitution can enable the speech recognition system to automatically detect and correct likely errors in transcription. Furthermore, the system can indicate the token confidence scores on a graphic user interface for labeling and dictionary enhancement.

IPC Classes  ?

  • G10L 15/18 - Speech classification or search using natural language modelling
  • G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
  • G10L 15/26 - Speech to text systems

48.

VIDEO CONFERENCE CAPTIONING

      
Application Number 18298282
Status Pending
Filing Date 2023-04-10
First Publication Date 2023-08-03
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor Coeytaux, Ethan

Abstract

A video conferencing system, such as one implemented with a cloud server, receives audio streams from a plurality of endpoints. The system uses automatic speech recognition to transcribe speech in the audio streams. The system multiplexes the transcriptions into individual caption streams and sends them to the endpoints, but the caption stream to each endpoint omits the transcription of audio from the endpoint. Some systems allow muting of audio through an indication to the system. The system then omits sending the muted audio to other endpoints and also omits sending a transcription of the muted audio to other endpoints.

IPC Classes  ?

  • G10L 15/26 - Speech to text systems
  • G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
  • G10L 19/005 - Correction of errors induced by the transmission channel, if related to the coding algorithm
  • G10L 15/19 - Grammatical context, e.g. disambiguation of recognition hypotheses based on word sequence rules
  • G10L 15/14 - Speech classification or search using statistical models, e.g. Hidden Markov Models [HMM]

49.

METHOD AND APPARATUS FOR INTELLIGENT VOICE QUERY

      
Application Number 17654635
Status Pending
Filing Date 2022-03-14
First Publication Date 2023-07-27
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor Wang, Chong

Abstract

A method and an apparatus for processing an intelligent voice query. A voice query input is received from a user. Automatic speech recognition and natural language understanding generate structured query data. It is modified based on an input adaptation rule to obtain modified structured query data appropriate for a content providing server, which provides a query result output corresponding to the modified structured query data. Input adaptation rules may comprise rule sets based on behavior patterns of the user and/or business recommendations. The query result output can be used for natural language generation, which may have similar adaptation rules for output.

IPC Classes  ?

  • G06F 16/2452 - Query translation
  • G06F 16/242 - Query formulation
  • G10L 15/18 - Speech classification or search using natural language modelling
  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog

50.

METHOD AND SYSTEM FOR ASSISTING A USER

      
Application Number 17561548
Status Pending
Filing Date 2021-12-23
First Publication Date 2023-06-29
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Mohajer, Keyvan
  • Kam, Kaishin
  • Pierret, Christophe

Abstract

A method of assisting a user. The method including obtaining a plurality of rules having condition components and action components, the action components specifying conversation schemas, detecting, by a sensor, a fact related to an environment of the user, identifying a rule, of the plurality of rules, having a condition component that is satisfied by the detected fact, initiating a conversation with the user according to a conversation schema of the action component of the rule of the plurality of rules, and performing an action in response to a positive statement by the user.

IPC Classes  ?

  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G01C 21/36 - Input/output arrangements for on-board computers

51.

Multiple service levels for automatic speech recognition

      
Application Number 17447823
Grant Number 11978454
Status In Force
Filing Date 2021-09-16
First Publication Date 2023-03-16
Grant Date 2024-05-07
Owner SOUNDHOUND AI IP, LLC (USA)
Inventor
  • Stonehocker, Timothy P.
  • Gowayyed, Zizu
  • Eichstaedt, Matthias
  • Emami, Seyed Majid
  • Jiang, Evelyn
  • Berryhill, Ryan
  • Ramona, Mathieu
  • Veira, Neil

Abstract

A system for performing automated speech recognition (ASR) on audio data includes a queue manager to receive a request to perform ASR on audio data, add the request to a queue of incoming requests, and determine a queue depth representing a number of requests in the queue at a given time. The system also includes a load supervisor to receive the request and the queue depth from the queue manager and assign a service level for the request based on the queue depth. In addition, the system includes a speech-to-text converter to receive the assigned service level for the request from the load supervisor, select an ASR model for the request based on the received service level, receive the audio data associated with the request, and perform ASR on the audio data using the selected ASR model.

IPC Classes  ?

  • G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
  • G10L 15/16 - Speech classification or search using artificial neural networks
  • G10L 15/26 - Speech to text systems

52.

CONTROLLING A GRAPHICAL USER INTERFACE BY TELEPHONE

      
Application Number 17408476
Status Pending
Filing Date 2021-08-22
First Publication Date 2023-02-23
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Mohajer, Kamyar
  • Mohajer, Keyvan
  • Hom, James
  • Jiang, Evelyn

Abstract

A method and system for controlling a GUI on a user's network-connected device, the control being provided by a telephone call between the user and a speech recognition and speech synthesis system. An example of a restaurant ordering system is provided. The user calls a phone number and is guided through a verbal ordering process that includes one or more of: adding an item, deleting an item, changing quantities, changing sizes, and changing details of an item. The user's choices are added to a display so that a current status of the order is visible to the user. The GUI is updated as changes are made to the order. The GUI can also request additional information, upsell items, and show menus. The GUI aids the user in confirming that the order is correct. The system provides the final order to a restaurant for fulfillment.

IPC Classes  ?

  • G06Q 30/06 - Buying, selling or leasing transactions
  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G10L 13/02 - Methods for producing synthetic speechSpeech synthesisers
  • G06F 3/16 - Sound inputSound output
  • G06F 16/955 - Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
  • G06F 9/451 - Execution arrangements for user interfaces

53.

Differential spatial rendering of audio sources

      
Application Number 17655650
Grant Number 11589184
Status In Force
Filing Date 2022-03-21
First Publication Date 2023-02-21
Grant Date 2023-02-21
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor Mont-Reynaud, Bernard

Abstract

Methods and systems for intuitive spatial audio rendering with improved intelligibility are disclosed. By establishing a virtual association between an audio source and a location in the listener's virtual audio space, a spatial audio rendering system can generate spatial audio signals that create a natural and immersive audio field for a listener. The system can receive the virtual location of the source as a parameter and map the source audio signal to a source-specific multi-channel audio signal. In addition, the spatial audio rendering system can be interactive and dynamically modify the rendering of the spatial audio in response to a user's active control or tracked movement.

IPC Classes  ?

  • H04S 7/00 - Indicating arrangementsControl arrangements, e.g. balance control

54.

Using a smartphone to control another device by voice

      
Application Number 17372123
Grant Number 11950300
Status In Force
Filing Date 2021-07-09
First Publication Date 2023-01-12
Grant Date 2024-04-02
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor Tsuchida, Keisuke

Abstract

A method and system for implementing a speech-enabled interface of a host device via an electronic mobile device in a network are provided. The method includes establishing a communication session between the host device and the mobile device via a session service provider. According to some embodiments, a barcode can be adopted to enable the pairing of the host device and mobile device. Furthermore, the present method and system employ the voice interface in conjunction with speech recognition systems and natural language processing to interpret voice input for the hosting device, which can be used to perform one or more actions related to the hosting device.

IPC Classes  ?

  • H04W 76/11 - Allocation or use of connection identifiers
  • G10L 15/08 - Speech classification or search
  • H04W 4/50 - Service provisioning or reconfiguring

55.

Sidebar conversations

      
Application Number 17353639
Grant Number 11539920
Status In Force
Filing Date 2021-06-21
First Publication Date 2022-12-22
Grant Date 2022-12-27
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor Stonehocker, Timothy P

Abstract

A system and a method are disclosed that enable sidebar conversations between two or more attendees that are participating in a primary or main meeting. The sidebar conversation occurs in conjunction or concurrently with the primary meeting. A first attendee provides commands to indicate a desire to initiate a sidebar conversation and information about a targeted attendee. The commands are analyzed to determine if a trigger phrase is included. The commands are analyzed to determine if there is an identification of a second (targeted) attendee, who is currently participating in the main meeting. If the second attendee is available, then the sidebar conversation is initiated. Additional attendees can be added to the sidebar conversation. Additional independent and simultaneous sidebar conversations can be initiated (by attendees currently participating in the active sidebar conversation), thereby allowing one attendee to conduct multiple simultaneous sidebar conversations while being able to switch between them.

IPC Classes  ?

  • H04N 7/15 - Conference systems
  • H04L 65/403 - Arrangements for multi-party communication, e.g. for conferences
  • H04L 65/1069 - Session establishment or de-establishment
  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G10L 25/57 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination for processing of video signals
  • G06F 3/16 - Sound inputSound output
  • G10L 15/08 - Speech classification or search

56.

Enabling natural language interactions with user interfaces for users of a software application

      
Application Number 17332927
Grant Number 12008991
Status In Force
Filing Date 2021-05-27
First Publication Date 2022-12-01
Grant Date 2024-06-11
Owner SoundHound AI IP, LLC (USA)
Inventor
  • Yabas, Utku
  • Hubert, Philipp
  • Stahl, Karl

Abstract

A user specifies a natural language command to a device. Software on the device generates contextual metadata about the user interface of the device, such as data about all visible elements of the user interface, and sends the contextual metadata along with the natural language command to a natural language understanding engine. The natural language understanding engine parses the natural language query using a stored grammar (e.g., a grammar provided by a maker of the device) and as a result of the parsing identifies information about the command (e.g., the user interface elements referenced by the command) and provides that information to the device. The device uses that provided information to respond to the command.

IPC Classes  ?

  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G06F 40/211 - Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
  • G06F 40/284 - Lexical analysis, e.g. tokenisation or collocates
  • G10L 15/183 - Speech classification or search using natural language modelling using context dependencies, e.g. language models
  • G10L 15/26 - Speech to text systems

57.

Method for providing information, method for generating database, and program

      
Application Number 17649052
Grant Number 11995143
Status In Force
Filing Date 2022-01-26
First Publication Date 2022-12-01
Grant Date 2024-05-28
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Naito, Masaki
  • Tsuchida, Keisuke
  • Yoneyama, Jun
  • Sawada, Kaku

Abstract

As audio (1) is input to an extension of a browser, the extension transmits the audio (1) to a language processing server. A speech recognition unit obtains a text (1) corresponding to the audio (1), and transmits the text (1) to a natural language understanding unit. In the natural language understanding unit, an information processing unit identifies a URL (1) corresponding to the text (1), and transmits the URL (1) to the browser. The extension passes the URL (1) to a browsing function. The browsing function uses the URL (1) to access a web server. The web server transmits a web page (1) corresponding to the URL (1) to the browser. The browsing function shows a screen corresponding to the web page (1) on a display.

IPC Classes  ?

  • G06F 16/95 - Retrieval from the web
  • G06F 16/33 - Querying
  • G06F 16/955 - Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
  • G06F 40/40 - Processing or translation of natural language
  • G10L 15/26 - Speech to text systems

58.

API FOR SERVICE PROVIDER FULFILLMENT OF DATA PRIVACY REQUESTS

      
Application Number 17237705
Status Pending
Filing Date 2021-04-22
First Publication Date 2022-10-27
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Qiu, Kevin
  • Jiang, Evelyn
  • Eichstaedt, Matthias
  • Heit, Warren S.

Abstract

A system and method are disclosed for fulfilling GDPR and other privacy requests in a client device system as well as a downstream service provider with which the client device system partners. In examples, the downstream service provider may be a voice assistant service provider providing voice recognition and language understanding capabilities to an upstream client device system.

IPC Classes  ?

  • G06F 21/62 - Protecting access to data via a platform, e.g. using keys or access control rules

59.

ACTIVE ARBITRATION

      
Serial Number 97568867
Status Pending
Filing Date 2022-08-29
Owner SOUNDHOUND AI IP, LLC ()
NICE Classes  ? 09 - Scientific and electric apparatus and instruments

Goods & Services

Recorded computer software for spotting wake words; Recorded computer software for recognizing speech, interpreting natural language, and providing virtual assistant functions; Downloadable computer software development kits (SDKs) for developing speech recognition, natural language understanding, and virtual assistant software; Recorded computer software for controlling speech recognition, natural language understanding, and virtual assistant cloud processing; Recorded computer software for performing text-to-speech voice audio synthesis; Downloadable electronic data files featuring neural network parameter sets for synthesizing text-to-speech voices; Downloadable electronic data files featuring neural network parameter sets for spotting wake words in audio; Recorded computer software for operating a virtual assistant device for hotels and restaurants; Recorded computer software for providing a virtual assistant using artificial intelligence technology for hotels and restaurants to make customer bookings and reservations, and answer other customer queries; Preinstalled software for operating a virtual assistant device for hotels and restaurants sold as a component of virtual assistant devices for hotels and restaurants; Recorded computer software for understanding speech for use with voice ordering kiosks, drive through ordering systems, and retail ordering systems; Recorded computer software for understanding speech for use with voice reservation kiosks; Recorded computer software for understanding speech for use with smart home devices; Recorded computer software for understanding speech for use with voice enabled robots

60.

Wake suppression for audio playing and listening devices

      
Application Number 17736850
Grant Number 11922939
Status In Force
Filing Date 2022-05-04
First Publication Date 2022-08-18
Grant Date 2024-03-05
Owner SoundHound AI IP, LLC (USA)
Inventor
  • Yang, Hsuan
  • Zhãng, Qindí
  • Heit, Warren S.

Abstract

A system and method are disclosed for ignoring a wakeword received at a speech-enabled listening device when it is determined the wakeword is reproduced audio from an audio-playing device. Determination can be by detecting audio distortions, by an ignore flag sent locally between an audio-playing device and speech-enabled device, by and ignore flag sent from a server, by comparison of received audio played audio to a wakeword within an audio-playing device or a speech-enabled device, and other means.

IPC Classes  ?

  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G10L 15/08 - Speech classification or search
  • G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

61.

Wakeword selection

      
Application Number 17709131
Grant Number 11948571
Status In Force
Filing Date 2022-03-30
First Publication Date 2022-07-14
Grant Date 2024-04-02
Owner SoundHound AI IP, LLC (USA)
Inventor Mont-Reynaud, Bernard

Abstract

A system and method are disclosed capable of parsing a spoken utterance into a natural language request and a speech audio segment, where the natural language request directs the system to use the speech audio segment as a new wakeword. In response to this wakeword assignment directive, the system and method are further capable of immediately building a new wakeword spotter to activate the device upon matching the new wakeword in the input audio. Different approaches to promptly building a new wakeword spotter are described. Variations of wakeword assignment directives can make the new wakeword public or private. They can also add the new wakeword to earlier wakewords, or replace earlier wakewords.

IPC Classes  ?

  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G06F 3/16 - Sound inputSound output
  • G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
  • G10L 15/08 - Speech classification or search
  • G10L 17/04 - Training, enrolment or model building

62.

Adapting an utterance cut-off period based on parse prefix detection

      
Application Number 17698623
Grant Number 11862162
Status In Force
Filing Date 2022-03-18
First Publication Date 2022-06-30
Grant Date 2024-01-02
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Aguayo, Patricia Pozon
  • Zhang, Jennifer Hee Young
  • Probell, Jonah

Abstract

A processing system detects a period of non-voice activity and compares its duration to a cutoff period. The system adapts the cutoff period based on parsing previously-recognized speech to determine, according to a model, such as a machine-learned model, the probability that the speech recognized so far is a prefix to a longer complete utterance. The cutoff period is longer when a parse of previously recognized speech has a high probability of being a prefix of a longer utterance.

IPC Classes  ?

  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G10L 15/05 - Word boundary detection
  • G10L 25/78 - Detection of presence or absence of voice signals
  • G10L 15/18 - Speech classification or search using natural language modelling
  • G10L 15/08 - Speech classification or search
  • G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
  • G10L 15/19 - Grammatical context, e.g. disambiguation of recognition hypotheses based on word sequence rules

63.

SOUNDHOUND EDGELITE

      
Serial Number 97479096
Status Pending
Filing Date 2022-06-28
Owner SOUNDHOUND AI IP, LLC ()
NICE Classes  ?
  • 09 - Scientific and electric apparatus and instruments
  • 42 - Scientific, technological and industrial services, research and design

Goods & Services

Recorded computer software for spotting wake words; Recorded computer software for recognizing speech, interpreting natural language, and providing virtual assistant functions; Downloadable computer software development kits (SDKs) for developing speech recognition, natural language understanding, and virtual assistant software; Recorded computer software for performing text-to-speech voice audio synthesis; Downloadable electronic data files featuring neural network parameter sets for synthesizing text-to-speech voices; Downloadable electronic data files featuring neural network parameter sets for spotting wake words in audio; Recorded computer software for operating a virtual assistant device for hotels and restaurants; Recorded computer software for providing a virtual assistant using artificial intelligence technology for hotels and restaurants to make customer bookings and reservations, and answer other customer queries; Preinstalled software for operating a virtual assistant device for hotels and restaurants sold as a component of virtual assistant devices for hotels and restaurants; Recorded computer software for understanding speech for use with smart home devices; Recorded computer software for understanding speech for use with voice enabled robots; Recorded computer software for training of custom wake word spotters for virtual assistants; Recorded computer software for synthesis of text-to-speech voice audio Platform as a service (PaaS) featuring computer software platforms for configuring virtual assistants through a web interface; Platform as a service (PaaS) featuring computer software platforms for configuring domain-specific content for virtual assistants; Providing online non-downloadable computer software for training of custom wake word spotters for virtual assistants; Providing online non-downloadable computer software for synthesis of text-to-speech voice audio; Platform as a service (PaaS) featuring computer software platforms for configuring custom text-to-speech voices

64.

SYSTEM AND METHOD FOR COMPUTING REGION CENTERS BY POINT CLUSTERING

      
Application Number 17549796
Status Pending
Filing Date 2021-12-13
First Publication Date 2022-06-16
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor Pierret, Christophe

Abstract

A system and a method are disclosed that calculate the center of a geographic region. A set of topological/geographical points is received. A set of clusters is determined. A weight for each cluster is computed. The highest weighted cluster is selected. The geographic region center is calculated using the selected cluster. The geographical points can include a key for each point and be filtered by an indicated key before calculating the center of a geographic location.

IPC Classes  ?

  • G06K 9/62 - Methods or arrangements for recognition using electronic means

65.

Meaning inference from speech audio

      
Application Number 17653365
Grant Number 11769488
Status In Force
Filing Date 2022-03-03
First Publication Date 2022-06-16
Grant Date 2023-09-26
Owner SoundHound AI IP, LLC (USA)
Inventor
  • Krishnaswamy, Sudharsan
  • Wieman, Maisy
  • Probell, Jonah

Abstract

A system and method invoke virtual assistant action, which may comprise an argument. From audio, a probability of an intent is inferred. A probability of a domain and a plurality of variable values may also be inferred. Invoking the action is in response to the intent probability exceeding a threshold. Invoking the action may also be in response to the domain probability exceeding a threshold, a variable value probability exceeding a threshold, detecting an end of utterance, and a specific amount of time having elapsed. The intent probability may increase when the audio includes speech of words with the same meaning in multiple natural languages. Invoking the action may also be conditional on the variable value exceeding its threshold within a certain period of time of the intent probability exceeding its threshold.

IPC Classes  ?

  • G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
  • G10L 15/16 - Speech classification or search using artificial neural networks
  • G10L 15/18 - Speech classification or search using natural language modelling
  • G10L 13/02 - Methods for producing synthetic speechSpeech synthesisers
  • G10L 15/197 - Probabilistic grammars, e.g. word n-grams
  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G10L 15/187 - Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams

66.

System and Method For Achieving Interoperability Through The Use of Interconnected Voice Verification System

      
Application Number 17108724
Status Pending
Filing Date 2020-12-01
First Publication Date 2022-06-02
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Mohajer, Keyvan
  • Heit, Warren S.

Abstract

A system and method are disclosed for achieving interoperability and access to a personal extension knowledge/preference database (PEKD) through interconnected voice verification systems. Devices from various different companies and systems can link to a voice verification system (VVS). Users can also enroll with the VSS so that the VSS can provide authentication of users by personal wake phrases. Thereafter users can access their PEKD from un-owned devices by speaking their wake phrase.

IPC Classes  ?

  • G10L 17/24 - the user being prompted to utter a password or a predefined phrase
  • G10L 17/04 - Training, enrolment or model building
  • G06F 21/32 - User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
  • H04L 29/06 - Communication control; Communication processing characterised by a protocol
  • G06N 20/00 - Machine learning
  • G06F 16/25 - Integrating or interfacing systems involving database management systems

67.

NEURAL SENTENCE GENERATOR FOR VIRTUAL ASSISTANTS

      
Application Number 17455727
Status Pending
Filing Date 2021-11-19
First Publication Date 2022-05-26
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Singh, Pranav
  • Mohajer, Keyvan
  • Zhang, Yilun

Abstract

Methods and systems for automatically generating sample phrases or sentences that a user can say to invoke a set of defined actions performed by a virtual assistant are disclosed. By enabling finetuned general-purpose natural language models, the system can generate potential and accurate utterance sentences based on extracted keywords or the input utterance sentence. Furthermore, domain-specific datasets can be used to train the pre-trained, general-purpose natural language models via unsupervised learning. These generated sentences can improve the efficiency of configuring a virtual assistant. The system can further optimize the effectiveness of a virtual assistant in understanding the user, which can enhance the user experience of communicating with it.

IPC Classes  ?

  • G10L 15/18 - Speech classification or search using natural language modelling
  • G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
  • G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog

68.

RECOMMENDATION ENGINE FOR UPSELLING IN RESTAURANT ORDERS

      
Application Number 17667535
Status Pending
Filing Date 2022-02-08
First Publication Date 2022-05-26
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Mohajer, Kamyar
  • Macrae, Robert

Abstract

A computer-implemented method is provided to support a food ordering system for food items from a menu of a restaurant using natural language. Expressions made for ordering are used to recommend a food item that a user has a high probability of wanting to include in an order. The recommendation engine is trained using machine learning. Expressions are collected and parsed to identify words that might indicate food items offered by the restaurant. The words are provided to a restaurant owner to identify food items on a menu, with which the words are associated.

IPC Classes  ?

  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G06F 16/2457 - Query processing with adaptation to user needs
  • G10L 17/00 - Speaker identification or verification techniques
  • G10L 15/18 - Speech classification or search using natural language modelling
  • G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
  • G06F 16/242 - Query formulation
  • G06F 16/22 - IndexingData structures thereforStorage structures

69.

Text-to-Speech Adapted by Machine Learning

      
Application Number 17580289
Status Pending
Filing Date 2022-01-20
First Publication Date 2022-05-12
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Mont-Reynaud, Bernard
  • Almudafar-Depeyrot, Monika

Abstract

Machine learned models take in vectors representing desired behaviors and generate voice vectors that provide the parameters for text-to-speech (TTS) synthesis. Models may be trained on behavior vectors that include user profile attributes, situational attributes, or semantic attributes. Situational attributes may include age of people present, music that is playing, location, noise, and mood. Semantic attributes may include presence of proper nouns, number of modifiers, emotional charge, and domain of discourse. TTS voice parameters may apply per utterance and per word as to enable contrastive emphasis.

IPC Classes  ?

  • G10L 13/10 - Prosody rules derived from textStress or intonation
  • G10L 13/04 - Details of speech synthesis systems, e.g. synthesiser structure or memory management
  • G10L 13/033 - Voice editing, e.g. manipulating the voice of the synthesiser

70.

Server supported recognition of wake phrases

      
Application Number 17584780
Grant Number 12051403
Status In Force
Filing Date 2022-01-26
First Publication Date 2022-05-12
Grant Date 2024-07-30
Owner SoundHound AI IP, LLC. (USA)
Inventor
  • Jain, Newton
  • Zaheer, Sameer Syed

Abstract

A server supports multiple virtual assistants. It receives requests that include wake phrase audio and an identification of the source of the request, such as a virtual assistant device. Based on the identification, the server searches a database for a wake phrase detector appropriate for the identified source. The server then applies the wake phrase detector to the received wake phrase audio. If the wake phrase audio triggers the wake phrase detector, the server provides an appropriate response to the source.

IPC Classes  ?

  • G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
  • G06F 8/41 - Compilation
  • G10L 15/08 - Speech classification or search
  • G10L 15/16 - Speech classification or search using artificial neural networks

71.

DRIVER INTERFACE WITH VOICE AND GESTURE CONTROL

      
Application Number 17547917
Status Pending
Filing Date 2021-12-10
First Publication Date 2022-05-05
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Li, Zili
  • Vasconcelos, Cristina

Abstract

A driver interface for use within an automobile provides responses to voice commands issued for example by a driver of the automobile. The interface includes a camera and microphone for capturing image data such as gestures and audio data from the automobile driver. The image data and audio data are processed to extract image and linguistic features from the image and audio data, which image and linguistic features are processed to interpret and infer a meaning of the voice command.

IPC Classes  ?

  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
  • G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
  • G10L 15/18 - Speech classification or search using natural language modelling
  • G10L 15/187 - Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
  • G10L 15/24 - Speech recognition using non-acoustical features
  • G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
  • G06K 9/62 - Methods or arrangements for recognition using electronic means
  • G10L 15/16 - Speech classification or search using artificial neural networks
  • G06V 10/40 - Extraction of image or video features
  • G06V 10/70 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning
  • G06V 20/40 - ScenesScene-specific elements in video content

72.

Training a device specific acoustic model

      
Application Number 17573551
Grant Number 11830472
Status In Force
Filing Date 2022-01-11
First Publication Date 2022-04-28
Grant Date 2023-11-28
Owner SOUNDHOUND AI IP, LLC (USA)
Inventor
  • Mohajer, Keyvan
  • Patel, Mehul

Abstract

Developers can configure custom acoustic models by providing audio files with custom recordings. The custom acoustic model is trained by tuning a baseline model using the audio files. Audio files may contain custom noise to apply to clean speech for training. The custom acoustic model is provided as an alternative to a standard acoustic model. Device developers can select an acoustic model by a user interface. Speech recognition is performed on speech audio using one or more acoustic models. The result can be provided to developers through the user interface, and an error rate can be computed and also provided.

IPC Classes  ?

  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G06F 3/16 - Sound inputSound output
  • G10L 15/18 - Speech classification or search using natural language modelling

73.

Controlling an engagement state of an agent during a human-machine dialog

      
Application Number 17562891
Grant Number 12125484
Status In Force
Filing Date 2021-12-27
First Publication Date 2022-04-21
Grant Date 2024-10-22
Owner SoundHound AI IP, LLC (USA)
Inventor
  • Halstvedt, Scott
  • Mohajer, Keyvan
  • Mont-Reynaud, Bernard

Abstract

A method of controlling an engagement state of an agent during a human-machine dialog is provided. The method can include receiving a spoken request that is a conditional locking request, wherein the conditional locking request uses a natural language expression to explicitly specify a locking condition, which is a predicate, storing the predicate in a format that can be evaluated when needed by the agent, entering a conditionally locked state in response to the conditional locking request, in the conditionally locked state, receiving a multiplicity of requests without a need for a wakeup indicator, and for a request from the multiplicity of requests evaluating the predicate upon receiving the request, and processing the request if the predicate is true.

IPC Classes  ?

  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G06F 3/16 - Sound inputSound output
  • G06F 21/32 - User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
  • G06V 40/16 - Human faces, e.g. facial parts, sketches or expressions
  • G10L 15/08 - Speech classification or search
  • G10L 17/00 - Speaker identification or verification techniques
  • G10L 17/04 - Training, enrolment or model building
  • G10L 17/06 - Decision making techniquesPattern matching strategies
  • G10L 17/22 - Interactive proceduresMan-machine interfaces

74.

Method and system for conversation transcription with metadata

      
Application Number 17450551
Grant Number 12125487
Status In Force
Filing Date 2021-10-11
First Publication Date 2022-04-14
Grant Date 2024-10-22
Owner SoundHound AI IP, LLC. (USA)
Inventor
  • Bradley, Kiersten L.
  • Coeytaux, Ethan
  • Yin, Ziming

Abstract

Methods and systems for enabling an efficient review of meeting content via a metadata-enriched, speaker-attributed and multiuser-editable transcript are disclosed. By incorporating speaker diarization and other metadata, the system can provide a structured and effective way to review and/or edit the transcript by one or more editors. One type of metadata can be image or video data to represent the meeting content. Furthermore, the present subject matter utilizes a multimodal diarization model to identify and label different speakers. The system can synchronize various sources of data, e.g., audio channel data, voice feature vectors, acoustic beamforming, image identification, and extrinsic data, to implement speaker diarization.

IPC Classes  ?

  • G10L 15/26 - Speech to text systems
  • G06F 40/134 - Hyperlinking
  • G06F 40/166 - Editing, e.g. inserting or deleting
  • G06F 40/284 - Lexical analysis, e.g. tokenisation or collocates
  • G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
  • G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
  • G10L 15/07 - Adaptation to the speaker

75.

Method and system for conversation transcription with metadata

      
Application Number 17450552
Grant Number 12020708
Status In Force
Filing Date 2021-10-11
First Publication Date 2022-04-14
Grant Date 2024-06-25
Owner SoundHound AI IP, LLC. (USA)
Inventor
  • Bradley, Kiersten L.
  • Coeytaux, Ethan
  • Yin, Ziming

Abstract

Methods and systems for enabling an efficient review of meeting content via a metadata-enriched, speaker-attributed transcript are disclosed. By incorporating speaker diarization and other metadata, the system can provide a structured and effective way to review and/or edit the transcript. One type of metadata can be image or video data to represent the meeting content. Furthermore, the present subject matter utilizes a multimodal diarization model to identify and label different speakers. The system can synchronize various sources of data, e.g., audio channel data, voice feature vectors, acoustic beamforming, image identification, and extrinsic data, to implement speaker diarization.

IPC Classes  ?

  • G10L 15/26 - Speech to text systems
  • G06F 40/134 - Hyperlinking
  • G06F 40/166 - Editing, e.g. inserting or deleting
  • G06F 40/284 - Lexical analysis, e.g. tokenisation or collocates
  • G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
  • G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
  • G10L 15/07 - Adaptation to the speaker

76.

Using phonetic variants in a local context to improve natural language understanding

      
Application Number 16529689
Grant Number 11295730
Status In Force
Filing Date 2019-08-01
First Publication Date 2022-04-05
Grant Date 2022-04-05
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Mohajer, Keyvan
  • Wilson, Christopher
  • Mont-Reynaud, Bernard

Abstract

A method is described that includes processing text and speech from an input utterance using local overrides of default dictionary pronunciations. Applying this method, a word-level grammar used to process the tokens specifies at least one local word phonetic variant that applies within a specific production rule and, within a local context of the specific production rule, the local word phonetic variant overrides one or more default dictionary phonetic versions of the word. This method can be applied to parsing utterances where the pronunciation of some words depends on their syntactic or semantic context.

IPC Classes  ?

  • G10L 15/18 - Speech classification or search using natural language modelling
  • G10L 15/19 - Grammatical context, e.g. disambiguation of recognition hypotheses based on word sequence rules

77.

System and method for voice morphing in a data annotator tool

      
Application Number 17539182
Grant Number 12086564
Status In Force
Filing Date 2021-11-30
First Publication Date 2022-03-24
Grant Date 2024-09-10
Owner SoundHound AI IP, LLC. (USA)
Inventor Ross, Dylan H.

Abstract

A system and method for masking an identity of a speaker of natural language speech, such as speech clips to be labeled by humans in a system generating voice transcriptions for training an automatic speech recognition model. The natural language speech is morphed prior to being presented to the human for labeling. In one embodiment, morphing comprises pitch shifting the speech randomly either up or down, then frequency shifting the speech, then pitch shifting the speech in a direction opposite the first pitch shift. Labeling the morphed speech comprises at least one or more of transcribing the morphed speech, identifying a gender of the speaker, identifying an accent of the speaker, and identifying a noise type of the morphed speech.

IPC Classes  ?

  • G10L 15/18 - Speech classification or search using natural language modelling
  • G06F 40/56 - Natural language generation
  • G06F 40/58 - Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
  • G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
  • G10L 19/125 - Pitch excitation, e.g. pitch synchronous innovation CELP [PSI-CELP]
  • G10L 19/26 - Pre-filtering or post-filtering
  • G10L 21/013 - Adapting to target pitch

78.

System and method for providing natural language recommendations

      
Application Number 16447958
Grant Number 11276398
Status In Force
Filing Date 2019-06-20
First Publication Date 2022-03-15
Grant Date 2022-03-15
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Macrae, Robert
  • Mohajer, Kamyar

Abstract

A system that includes a stand-alone device or a server connected client device are in communication with a server and provide recommendations. The device includes an input component, a storage component, a processor and an output component. The server-connected client device includes an input component that receives the user's request, a communication component that communicates the request to the server and receives the recommendation from the server, and an output component that provides the recommendation to user.

IPC Classes  ?

  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G06F 16/242 - Query formulation
  • G06F 16/2457 - Query processing with adaptation to user needs
  • G06F 16/22 - IndexingData structures thereforStorage structures
  • G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
  • G10L 15/18 - Speech classification or search using natural language modelling
  • G10L 17/00 - Speaker identification or verification techniques

79.

Conditional responses to application commands in a client-server system

      
Application Number 16791421
Grant Number 11250217
Status In Force
Filing Date 2020-02-14
First Publication Date 2022-02-15
Grant Date 2022-02-15
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Mohajer, Keyvan
  • Wilson, Christopher S.
  • Khov, Kheng
  • Graves, Ian

Abstract

A client device receives a user request (e.g., in natural language form) to execute a command of an application. The client device delegates interpretation of the request to a response-processing server. Using domain knowledge previously provided by a developer of the application, the response-processing server determines the various possible responses that client devices could make in response to the request based on circumstances such as the capabilities of the client devices and the state of the application data. The response-processing server accordingly generates a response package that describes a number of different conditional responses that client devices could have to the request and provides the response package to the client device. The client device selects the appropriate response from the response package based on the circumstances as determined by the client device, executes the command (if possible), and provides the user with some representation of the response.

IPC Classes  ?

  • G06F 40/30 - Semantic analysis
  • H04L 29/08 - Transmission control procedure, e.g. data link level control procedure

80.

System and method for interpreting natural language commands with compound criteria

      
Application Number 17081996
Grant Number 11238101
Status In Force
Filing Date 2020-10-27
First Publication Date 2022-02-01
Grant Date 2022-02-01
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor Mohajer, Keyvan

Abstract

A command-processing server receives a natural language command from a user. The command-processing server has a set of domain command interpreters corresponding to different domains in which commands can be expressed, such as the domain of entertainment, or the domain of travel. Some or all of the domain command interpreters recognize user commands having a verbal prefix, an optional pre-filter, an object, and an optional post-filter; the pre- and post-filters may be compounded expressions involving multiple atomic filters. Different developers may independently specify the domain command interpreters and the sub-structure interpreters on which they are based.

IPC Classes  ?

  • G06F 16/9032 - Query formulation
  • G10L 15/18 - Speech classification or search using natural language modelling
  • G06F 16/2457 - Query processing with adaptation to user needs
  • G06F 3/16 - Sound inputSound output
  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • H04N 21/482 - End-user interface for program selection
  • G10L 15/26 - Speech to text systems

81.

Support for grammar inflections within a software development framework

      
Application Number 17474680
Grant Number 11797777
Status In Force
Filing Date 2021-09-14
First Publication Date 2021-12-30
Grant Date 2023-10-24
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Mont-Reynaud, Bernard
  • Taron, Seth

Abstract

A natural language understanding server includes grammars specified in a modified extended Backus-Naur form (MEBNF) that includes an agglutination metasymbol not supported by conventional EBNF grammar parsers, as well as an agglutination preprocessor. The agglutination preprocessor applies one or more sets of agglutination rewrite rules to the MEBNF grammars, transforming them to EBNF grammars that can be processed by conventional EBNF grammar parsers. Permitting grammars to be specified in MEBNF form greatly simplifies the authoring and maintenance of grammars supporting inflected forms of words in the languages described by the grammars.

IPC Classes  ?

82.

Machine learning system for digital assistants

      
Application Number 17350294
Grant Number 12067006
Status In Force
Filing Date 2021-06-17
First Publication Date 2021-12-23
Grant Date 2024-08-20
Owner SoundHound AI IP, LLC. (USA)
Inventor
  • Singh, Pranav
  • Zhang, Yilun
  • Mohajer, Keyvan
  • Fazeli, Mohammadreza

Abstract

A machine learning system for a digital assistant is described, together with a method of training such a system. The machine learning system is based on an encoder-decoder sequence-to-sequence neural network architecture trained to map input sequence data to output sequence data, where the input sequence data relates to an initial query and the output sequence data represents canonical data representation for the query. The method of training involves generating a training dataset for the machine learning system. The method involves clustering vector representations of the query data samples to generate canonical-query original-query pairs in training the machine learning system.

IPC Classes  ?

83.

Configurable neural speech synthesis

      
Application Number 17341082
Grant Number 11741941
Status In Force
Filing Date 2021-06-07
First Publication Date 2021-12-16
Grant Date 2023-08-29
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor Richards, Andrew

Abstract

A discriminator trained on labeled samples of speech can compute probabilities of voice properties. A speech synthesis generative neural network that takes in text and continuous scale values of voice properties is trained to synthesize speech audio that the discriminator will infer as matching the values of the input voice properties. Voice parameters can include speaker voice parameters, accents, and attitudes, among others. Training can be done by transfer learning from an existing neural speech synthesis model or such a model can be trained with a loss function that considers speech and parameter values. A graphical user interface can allow voice designers for products to synthesize speech with a desired voice or generate a speech synthesis engine with frozen voice parameters. A vector of parameters can be used for comparison to previously registered voices in databases such as ones for trademark registration.

IPC Classes  ?

  • G10L 13/047 - Architecture of speech synthesisers
  • G10L 13/08 - Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
  • G10L 13/033 - Voice editing, e.g. manipulating the voice of the synthesiser
  • G10L 15/26 - Speech to text systems
  • G06N 3/084 - Backpropagation, e.g. using gradient descent
  • G06N 3/04 - Architecture, e.g. interconnection topology
  • G06F 3/16 - Sound inputSound output
  • G06F 3/04847 - Interaction techniques to control parameter settings, e.g. interaction with sliders or dials

84.

Interpreting Queries According To Preferences

      
Application Number 17389847
Status Pending
Filing Date 2021-07-30
First Publication Date 2021-11-18
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Mohajer, Keyvan
  • Mont-Reynaud, Bernard
  • Wilson, Christopher S.

Abstract

The present invention extends to methods, systems, and computer program products for interpreting queries according to preferences. Multi-domain natural language understanding systems can support a variety of different types of clients. Queries can be received and interpreted across one or more domains. Preferred query interpretations can be identified and query responses provided based on any of: domain preferences, preferences indicated by an identifier, or (e.g., weighted) scores exceeding a threshold.

IPC Classes  ?

  • G06F 40/30 - Semantic analysis
  • G10L 15/18 - Speech classification or search using natural language modelling
  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

85.

Virtual assistant domain functionality

      
Application Number 17383097
Grant Number 11836453
Status In Force
Filing Date 2021-07-22
First Publication Date 2021-11-11
Grant Date 2023-12-05
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Mohajer, Kamyar
  • Mohajer, Keyvan
  • Mont-Reynaud, Bernard
  • Singh, Pranav

Abstract

Aspects include methods, systems, and computer-program products providing virtual assistant domain functionality. A natural language query including one or more words is received. A collection of natural language modules is accessed. The collection natural language modules are configured to process sets of natural language queries. A natural language module, from the collection of natural language modules, is identified to interpret the natural language query. An interpretation of the natural language query is computed using the identified natural language module. A response to the natural language query is returned using the computed interpretation.

IPC Classes  ?

  • G06F 40/40 - Processing or translation of natural language
  • G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
  • G06Q 30/0283 - Price estimation or determination
  • G06Q 20/10 - Payment architectures specially adapted for electronic funds transfer [EFT] systemsPayment architectures specially adapted for home banking systems
  • G06F 40/211 - Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars

86.

Method and system for acoustic model conditioning on non-phoneme information features

      
Application Number 17224967
Grant Number 11741943
Status In Force
Filing Date 2021-04-07
First Publication Date 2021-10-28
Grant Date 2023-08-29
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Gowayyed, Zizu
  • Mohajer, Keyvan

Abstract

A method and system for acoustic model conditioning on non-phoneme information features for optimized automatic speech recognition is provided. The method includes using an encoder model to encode sound embedding from a known key phrase of speech and conditioning an acoustic model with the sound embedding to optimize its performance in inferring the probabilities of phonemes in the speech. The sound embedding can comprise non-phoneme information related to the key phrase and the following utterance. Further, the encoder model and the acoustic model can be neural networks that are jointly trained with audio data.

IPC Classes  ?

  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
  • G10L 15/04 - SegmentationWord boundary detection

87.

Loudspeaker with transmitter

      
Application Number 17301308
Grant Number 11627405
Status In Force
Filing Date 2021-03-31
First Publication Date 2021-10-07
Grant Date 2023-04-11
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor Stahl, Karl

Abstract

A speaker device includes an electroacoustic transducer configured to convert an audio signal into a set of sound waves and a transmitter configured to transmit an electromagnetic signal that carries the audio signal for receipt at distances limited to an audibility range of the set of sound waves. The audibility range of the set of sound waves corresponds to a distance at which the set of sound waves is estimated to be below a predetermined sound level.

IPC Classes  ?

  • H04R 25/00 - Deaf-aid sets
  • H04R 1/10 - EarpiecesAttachments therefor
  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G10L 21/0316 - Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
  • G10L 25/06 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being correlation coefficients
  • G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
  • H04R 1/08 - MouthpiecesAttachments therefor
  • H04R 5/033 - Headphones for stereophonic communication

88.

Automatic learning of entities, words, pronunciations, and parts of speech

      
Application Number 17146239
Grant Number 12080275
Status In Force
Filing Date 2021-01-11
First Publication Date 2021-10-07
Grant Date 2024-09-03
Owner SoundHound AI IP, LLC. (USA)
Inventor Relin, Anton V.

Abstract

Systems for automatic speech recognition and/or natural language understanding automatically learn new words by finding subsequences of phonemes that, if they were a new word, would enable a successful tokenization of a phoneme sequence. Systems can learn alternate pronunciations of words by finding phoneme sequences with a small edit distance to existing pronunciations. Systems can learn the part of speech of words by finding part-of-speech variations that would enable parses by syntactic grammars. Systems can learn what types of entities a word describes by finding sentences that could be parsed by a semantic grammar but for the words not being on an entity list.

IPC Classes  ?

  • G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
  • G10L 15/14 - Speech classification or search using statistical models, e.g. Hidden Markov Models [HMM]
  • G10L 15/19 - Grammatical context, e.g. disambiguation of recognition hypotheses based on word sequence rules

89.

Framework for identifying distinct questions in a composite natural language query

      
Application Number 16292190
Grant Number 11138205
Status In Force
Filing Date 2019-03-04
First Publication Date 2021-10-05
Grant Date 2021-10-05
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Mohajer, Keyvan
  • Mont-Reynaud, Bernard
  • Hubert, Philipp

Abstract

A query-processing server provides natural language services to applications. More specifically, the query-processing server receives and stores domain knowledge information from application developers, the domain knowledge information comprising a linguistic description of the natural language user queries that application developers wish their applications to support. A first portion of the domain knowledge information is applied to transform a natural language query received from an application to an ordered sequence of question elements. A second portion of the domain knowledge information is applied to group the ordered sequence of question elements into a plurality of distinct structured questions posed by the natural language query. The distinct structured questions may then be provided to the application, which may then execute them and obtain the corresponding data referenced by the questions.

IPC Classes  ?

  • G06F 16/00 - Information retrievalDatabase structures thereforFile system structures therefor
  • G06F 16/2457 - Query processing with adaptation to user needs
  • G06F 16/2455 - Query execution
  • G06F 40/40 - Processing or translation of natural language

90.

Framework for understanding complex natural language queries in a dialog context

      
Application Number 16363929
Grant Number 11132504
Status In Force
Filing Date 2019-03-25
First Publication Date 2021-09-28
Grant Date 2021-09-28
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Mont-Reynaud, Bernard
  • Wilson, Christopher S
  • Mohajer, Keyvan

Abstract

A domain-independent framework parses and interprets compound natural language queries in the context of a conversation between a human and an agent. Generic grammar rules and corresponding semantics support the understanding of compound queries in the conversation context. The sub-queries themselves are from one or more domains, and they are parsed and interpreted by a pre-existing grammar, covering one or more pre-existing domains. The pre-existing grammar, extended by the generic rules, recognizes all compound queries based on any queries recognized by the pre-existing grammar. Use of the disclosed framework requires little or no change in the domain-specific NLU handling code. The framework defines a generic approach to propagating context data between sub-queries of a compound query. The framework can be further extended to propagate intra-query context data in, out and across query components. Complex query results, and other data such as accounting data, can also be propagated simultaneously with dialog context data in a consolidated intra-query context data structure.

IPC Classes  ?

91.

Deriving acoustic features and linguistic features from received speech audio

      
Application Number 17325114
Grant Number 12175964
Status In Force
Filing Date 2021-05-19
First Publication Date 2021-09-02
Grant Date 2024-12-24
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Lokeswarappa, Kiran Garaga
  • Gedalius, Joel
  • Mont-Reynaud, Bernard
  • Huang, Jun

Abstract

A computer-implemented method is provided. The method including receiving speech audio of dictation associated with a user ID, deriving acoustic features from the speech audio, storing the derived acoustic features in a user profile associated with the user ID, receiving a request for acoustic features through an application programming interface (API), the request including the user ID, and sending the derived acoustic features through the API.

IPC Classes  ?

  • G10L 15/00 - Speech recognition
  • G06F 40/205 - Parsing
  • G06F 40/211 - Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
  • G06F 40/253 - Grammatical analysisStyle critique
  • G06N 20/00 - Machine learning
  • G06Q 30/0241 - Advertisements
  • G06Q 30/0251 - Targeted advertisements
  • G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
  • G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
  • G10L 15/18 - Speech classification or search using natural language modelling
  • G10L 25/90 - Pitch determination of speech signals
  • H04L 67/306 - User profiles
  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G10L 15/26 - Speech to text systems
  • G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination

92.

Semantic grammar extensibility within a software development framework

      
Application Number 16505185
Grant Number 11100291
Status In Force
Filing Date 2019-07-08
First Publication Date 2021-08-24
Grant Date 2021-08-24
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Mohajer, Keyvan
  • Wilson, Christopher S.
  • Mont-Reynaud, Bernard

Abstract

A query-processing server that interprets natural language expressions supports the extension of a first semantic grammar (for a particular type of expression), which is declared extensible, by a second semantic grammar (for another type of expression). When an extension is requested, the query-processing server checks that the two semantic grammars have compatible semantic types. The developers need not have any knowledge of each other, or about their respective grammars. Performing an extension may be done by yet another party, such as the query-processing server, or another server, independently of all previous parties. The use of semantic grammar extensions provides a way to expand the coverage and functionality of natural language interpretation in a simple and flexible manner, so that new forms of expression may be supported, and seamlessly combined with pre-existing interpretations. Finally, in some implementations, this is done without loss of efficiency.

IPC Classes  ?

93.

Factored neural networks for language modeling

      
Application Number 16228278
Grant Number 11100288
Status In Force
Filing Date 2018-12-20
First Publication Date 2021-08-24
Grant Date 2021-08-24
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Gowayyed, Zizu
  • Mont-Reynaud, Bernard

Abstract

A factored neural network estimates a conditional distribution of token probabilities using two smaller models, a class model and an index model. Every token has a unique class, and a unique index in the class. The two smaller models are trained independently but cooperate at inference time. Factoring with more than two models is possible. Networks can be recurrent. Factored neural networks for statistical language modelling treat words as tokens. In that context, classes capture linguistic regularities. Partitioning of words into classes keeps the number of classes and the maximum size of a class both low. Optimization of partitioning is by iteratively splitting and assembling classes.

IPC Classes  ?

94.

Neural acoustic model

      
Application Number 16790643
Grant Number 11392833
Status In Force
Filing Date 2020-02-13
First Publication Date 2021-08-19
Grant Date 2022-07-19
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Wieman, Maisy
  • Spencer, Andrew Carl
  • L{hacek Over (i)}, Zìlì
  • Vasconcelos, Cristina

Abstract

An audio processing system is described. The audio processing system uses a convolutional neural network architecture to process audio data, a recurrent neural network architecture to process at least data derived from an output of the convolutional neural network architecture, and a feed-forward neural network architecture to process at least data derived from an output of the recurrent neural network architecture. The feed-forward neural network architecture is configured to output classification scores for a plurality of sound units associated with speech. The classification scores indicate a presence of one or more sound units in the audio data. The convolutional neural network architecture has a plurality of convolutional groups arranged in series, where a convolutional group includes a combination of two data mappings arranged in parallel.

IPC Classes  ?

  • G06N 3/08 - Learning methods
  • G10L 15/16 - Speech classification or search using artificial neural networks
  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G06N 3/04 - Architecture, e.g. interconnection topology

95.

Wake suppression for audio playing and listening devices

      
Application Number 16781214
Grant Number 11328721
Status In Force
Filing Date 2020-02-04
First Publication Date 2021-08-05
Grant Date 2022-05-10
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Yang, Hsuan
  • Zhang, Qìndí
  • Heit, Warren S.

Abstract

A system and method are disclosed for ignoring a wakeword received at a speech-enabled listening device when it is determined the wakeword is reproduced audio from an audio-playing device. Determination can be by detecting audio distortions, by an ignore flag sent locally between an audio-playing device and speech-enabled device, by and ignore flag sent from a server, by comparison of received audio played audio to a wakeword within an audio-playing device or a speech-enabled device, and other means.

IPC Classes  ?

  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G10L 15/30 - Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
  • G10L 15/08 - Speech classification or search

96.

Providing a platform for configuring device-specific speech recognition and using a platform for configuring device-specific speech recognition

      
Application Number 17237003
Grant Number 11367448
Status In Force
Filing Date 2021-04-21
First Publication Date 2021-08-05
Grant Date 2022-06-21
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Mohajer, Keyvan
  • Patel, Mehul

Abstract

A method of providing a platform for configuring device-specific speech recognition is provided. The method includes providing a user interface for developers to select a set of at least two acoustic models appropriate for a specific type of a device, receiving, from a developer, a selection of the set of the at least two acoustic models, and configuring a speech recognition system to perform device-specific speech recognition by using one acoustic model selected from the at least two acoustic models of the set.

IPC Classes  ?

  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G06F 3/16 - Sound inputSound output
  • G10L 15/18 - Speech classification or search using natural language modelling

97.

Building a natural language understanding application using a received electronic record containing programming code including an interpret-block, an interpret-statement, a pattern expression and an action statement

      
Application Number 17225997
Grant Number 11776533
Status In Force
Filing Date 2021-04-08
First Publication Date 2021-07-22
Grant Date 2023-10-03
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Mont-Reynaud, Bernard
  • Emami, Seyed M.
  • Wilson, Chris
  • Mohajer, Keyvan

Abstract

A method of building a natural language understanding application is provided. The method includes receiving at least one electronic record containing programming code and creating executable code from the programming code. Further, the executable code, when executed by a processor, causes the processor to create a parse and an interpretation of a sequence of input tokens, the programming code includes an interpret-block and the interpret-block includes an interpret-statement. Additionally, the interpret-statement includes a pattern expression and the interpret-statement includes an action statement.

IPC Classes  ?

  • G10L 15/00 - Speech recognition
  • G10L 15/18 - Speech classification or search using natural language modelling
  • G06F 40/205 - Parsing
  • G06F 8/30 - Creation or generation of source code
  • G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • H04M 3/493 - Interactive information services, e.g. directory enquiries

98.

Voice morphing apparatus having adjustable parameters

      
Application Number 16740440
Grant Number 11600284
Status In Force
Filing Date 2020-01-11
First Publication Date 2021-07-15
Grant Date 2023-03-07
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor Pearson, Steve

Abstract

A voice morphing apparatus having adjustable parameters is described. The disclosed system and method include a voice morphing apparatus that morphs input audio to mask a speaker's identity. Parameter adjustment uses evaluation of an objective function that is based on the input audio and output of the voice morphing apparatus. The voice morphing apparatus includes objectives that are based adversarially on speaker identification and positively on audio fidelity. Thus, the voice morphing apparatus is adjusted to reduce identifiability of speakers while maintaining fidelity of the morphed audio. The voice morphing apparatus may be used as part of an automatic speech recognition system.

IPC Classes  ?

99.

Training a voice morphing apparatus

      
Application Number 16740378
Grant Number 11100940
Status In Force
Filing Date 2020-01-10
First Publication Date 2021-06-24
Grant Date 2021-08-24
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor Pearson, Steve

Abstract

Systems and methods for training a voice morphing apparatus are described. The voice morphing apparatus is trained to morph input audio data to mask an identity of a speaker. Training is performed by evaluating an objective function that is a function of the input audio data and an output of the voice morphing apparatus. The objective function may have a first term that is based on speaker identification and a second term that is based on audio fidelity. By optimizing the objective function, parameters of the voice morphing apparatus may be adjusted so as to reduce a confidence of speaker identification and maintain an audio fidelity of the morphed audio data. The voice morphing apparatus, once trained, may be used as part of an automatic speech recognition system.

IPC Classes  ?

  • G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
  • G10L 17/04 - Training, enrolment or model building
  • G10L 17/00 - Speaker identification or verification techniques
  • G10L 15/26 - Speech to text systems
  • G10L 21/013 - Adapting to target pitch
  • G10L 25/18 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
  • G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
  • G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
  • G10L 21/0364 - Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility

100.

Neural network training from private data

      
Application Number 16716497
Grant Number 11551083
Status In Force
Filing Date 2019-12-17
First Publication Date 2021-06-17
Grant Date 2023-01-10
Owner
  • SOUNDHOUND AI IP, LLC (USA)
  • SOUNDHOUND AI IP HOLDING, LLC (USA)
Inventor
  • Li, Zili
  • Amirguliyev, Asif
  • Probell, Jonah

Abstract

Training and enhancement of neural network models, such as from private data, are described. A slave device receives a version of a neural network model from a master. The slave accesses a local and/or private data source and uses the data to perform optimization of the neural network model. This can be done such as by computing gradients or performing knowledge distillation to locally train an enhanced second version of the model. The slave sends the gradients or enhanced neural network model to a master. The master may use the gradient or second version of the model to improve a master model.

IPC Classes  ?

  • G06N 3/08 - Learning methods
  • H04L 67/10 - Protocols in which an application is distributed across nodes in the network
  • H04L 41/082 - Configuration setting characterised by the conditions triggering a change of settings the condition being updates or upgrades of network functionality
  • G06N 3/04 - Architecture, e.g. interconnection topology
  1     2        Next Page