Apparatuses, systems, and techniques to obtain one or more captions for a video using machine learning. In at least one embodiment, at least one machine learning processes is used to infer at least one output caption using at least one image caption and at least one video caption. In at least one embodiment, the at least one image caption is based at least part on a plurality of image captions generated by one ormore first machine learning processes using a plurality of images obtained from the video. In at least one embodiment, the at least one video caption is generated by one or more second machine learning processes using the video.
H04N 21/44 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
Apparatuses, systems, and techniques to identify wireless cells. In at least one embodiment, encoded information from a first wireless cell identified by a first wireless cell identifier are decoded based, at least in part, on a second wireless cell identifier corresponding to the first wireless cell.
H04W 8/02 - Processing of mobility data, e.g. registration information at HLR [Home Location Register] or VLR [Visitor Location Register]Transfer of mobility data, e.g. between HLR, VLR or external networks
3.
TECHNIQUES FOR ROBOTIC ASSEMBLY USING SPECIALIST AND GENERALIST POLICIES
The disclosed method for training a machine learning model to control a robot includes performing, based on demonstration data associated with one or more assembly tasks, one or more first training operations to generate one or more first trained machine learning models, wherein each first trained machine learning included in the one or more first trained machine learning models is trained to control a robot to perform a different assembly task, and performing, based on the one or more first trained machine learning models and one or more geometries associated with one or more parts, one or more second training operations to generate a second trained machine learning model, wherein the second trained machine learning model is trained to control the robot to perform a plurality of assembly tasks.
In various examples, a control stack may include a sequence of machine learning models (MLMs) respectively predicting a sequence of differentiable outputs to determine one or more control sequences. Disclosed approaches may be used to implement an AV stack that is differentiable and modular end-to-end-allowing for interpretability of the outputs and propagation of gradients backwards so that upstream predictions are learned with respect to downstream decision making. The disclosure provides various approaches for interfacing perception with motion prediction in a differentiable manner, as well as for interfacing motion prediction with motion planning and motion control in a differentiable manner.
The disclosed method for training a robot control model includes generating, using one or more simulations, a plurality of disassembly trajectories along which a first part is disassembled from a second part; reversing the plurality of disassembly trajectories to generate a plurality of reversed disassembly trajectories; and performing, based on the plurality of reversed disassembly trajectories, one or more operations to train an untrained machine learning model to generate a trained machine learning model, wherein the trained machine learning model is trained to control a robot to assemble the first part and the second part.
Embodiments of the present disclosure relate to dynamic bit inversion for reduced overhead DC balanced coding. A computed polarity bit controls whether or not bits are inverted and is inserted in the data stream whenever DC balancing is triggered. Fixed overhead codes are commonly used to provide DC balance. However, one can do much better than a fixed overhead code by observing that many data streams are inherently DC balanced, and a random data stream does not need a correction to its disparity every byte. For example, a string of alternating 1s and 0s is naturally DC balanced. The dynamic bit inversion code has the property that overhead is inserted only when needed. For a string of alternating 1s and 0s there is no overhead. In an example, the overhead is about 4%, about ⅓ that of a fixed overhead 8 bit/9 bit code with the same maximum output disparity.
H04B 1/38 - Transceivers, i.e. devices in which transmitter and receiver form a structural unit and in which at least one part is used for functions of transmitting and receiving
Apparatuses, systems, and techniques to identify wireless cells. In at least one embodiment, encoded information from a first wireless cell identified by a first wireless cell identifier are decoded based, at least in part, on a second wireless cell identifier corresponding to the first wireless cell.
Apparatuses, systems, and techniques to cancel pending GPU thread work to allow said work to be assumed by running thread clusters. In at least one embodiment, processors comprising one or more circuits to perform an application programming interface (API) to indicate one or more software threads that have been prevented from being performed by one or more processors.
According to one or more embodiments of the present disclosure, an ellipse model may be applied to detection results included in ultrasonic sensor data (USS data) to identify one or more objects that may be indicated by the detection results. The identification of the objects may include determining the locations, shapes, and/or classifications of at least portions of the objects. For example, in some embodiments, the ellipse model may be applied to detection arcs indicated in USS data and corresponding to detection of an object using multiple ultrasonic sensors. In some embodiments, the application of the ellipse model may include fitting an ellipse to the detection arcs. The resulting ellipse and/or its corresponding ellipsoidal parameters may indicate one or more properties about the object.
G01S 15/931 - Sonar systems specially adapted for specific applications for anti-collision purposes of land vehicles
G01S 7/539 - Details of systems according to groups , , of systems according to group using analysis of echo signal for target characterisationTarget signatureTarget cross-section
10.
APPLICATION PROGRAMMING INTERFACE TO IDENTIFY THREAD PREVENTION
Apparatuses, systems, and techniques to cancel pending GPU thread work to allow said work to be assumed by running thread clusters. In at least one embodiment, processors comprising one or more circuits to perform an application programming interface (API) to cause one or more processors to indicate whether one or more software threads have been prevented from being performed.
Apparatuses, systems, and techniques to cancel pending GPU thread work to allow said work to be assumed by running thread clusters. In at least one embodiment, processors comprising one or more circuits to perform an application programming interface (API) to cause one or more software threads identified by the API to be prevented from being performed by one or more processors.
One or more new coding instructions are generated using a language model (LM) prompted to perform one or more genetic operations on one or more seed coding instructions of an initial set of coding instruction-snippet pairs. One or more respective coding snippets are generated to implement the one or more new coding instructions using a LM prompted to generate coding snippets for the one or more new coding instructions. A generational set of coding instruction-snippet pairs comprising the initial set of coding instruction-snippet pairs and a new set of coding instruction-snippet pairs comprising the one or more new coding instructions and the one or more respective coding snippets is created.
Apparatuses, systems, and techniques including APIs, subscription services, and controllers to enable one or more fifth generation new radio (5G-NR) networks to share information. For example, a processor comprising one or more circuits can perform an API or subscription service to cause a device in a radio access network (RAN) to share its analytic data with a device in a transport network, and said device in said transport network can use said analytic data to adjust its network settings to improve performance.
H04L 67/12 - Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
14.
OPTIMIZING COMPUTATIONAL GRAPHS FOR VISUAL SCRIPTING AND DISTRIBUTED CONTENT CREATION
In various examples, data required by some nodes of a visual scripting computational graph may be defined by the attributes of the target prims it is operating on, and that computational graph may be organized in memory by querying this target prim data to identify a count of matching prims and locations of the target prim data, allocating memory for the graph based on the matching prim count, reading the prim data from the identified locations instead of copying it into memory, and writing the results of node operations into the allocated memory. The results of the last node operation may be written directly back to the storage locations of the target prim data. As such, the present techniques may be used to avoid copying prim data to and/or from allocated memory for the graph.
Network switches are devices that connect multiple devices together on a computer network, using packet switching to receive, process, and forward data to the destination device. Each switch typically contains multiple ports, which are the points of connection for network cables. These ports can be in an active state, where they are ready to transmit data, or in an idle state, where they consume less power. Power consumption in datacenters has been a topic of concern due to the increasing demand for data processing and storage. One approach to reducing power consumption involves managing the power state of the switch ports. However, current power saving policies focus on making decisions for one type of traffic pattern or for a single port at a time, and therefore cannot intelligently or dynamically adapt to a multitude of network parameters affecting traffic flows. The present disclosure uses artificial intelligence to more intelligently transition ports between different modes of operation.
In various examples, causal ordering of concurrent updates for map resources may be enforced using geometric-based locks such that disparate systems may update the map resources concurrently. For instance, the disclosed systems and methods may lock first map resources corresponding to a first area of an environment so a first client may exclusively update the first map resources. While the first map resources are locked for updating by the first client, a request may be received from a second client to update second map resources. In some instances, the second map resources may correspond to a second area that overlaps the first area, and a timeout period—which may be extended by the first client—may be established. If the timeout period is met, the first map resources may be unlocked and the second client may resubmit the request to lock the second map resources for exclusive updating.
A semiconductor device with first and second portions of a redistribution layer with arrays of through via elements carrying electrical ground connections that includes one or more contact pads that have one or more extensions projecting therefrom into a null space between through via elements carrying electrical signals. A method of manufacturing the semiconductor device is also disclosed.
Systems and methods for sharing security capabilities of network devices are disclosed. A system for a first network device includes a memory. The system also includes one or more processors, coupled to the memory, to receive, at the first network device, security capabilities of a second network device of a network. The network includes the first network device and the second network device. The one or more processors are further to modify a routing table of the first network device based on the security capabilities of the second network device and transmit a data packet based on the modified routing table.
A sparse dot product and/or matrix multiply is computed by subdividing each vector and simultaneously performing operations to generate output matrix elements. In an embodiment, a bit mask is computed that includes one bit for each element of an input matrix, each bit indicating whether the element is non-zero or zero. In an embodiment, the element values are stored in a packed format, where all of the non-zero values are packed together and the remaining storage for the matrix contains zeros (or any other values). The bit mask can then be used to determine the location of each non-zero element in the packed storage. Rather than reading all of the elements, only the non-zero elements that will be multiplied by a non-zero element from the other input vector should be read. Any multiplication by a zero element from either input vector or matrix is unnecessary.
Apparatuses, systems, and techniques to determine a trajectory (e.g., to be used to control a device). In at least one embodiment, an autonomous or semi-autonomous machine (e.g., a vehicle) is controlled based, at least in part on, for example, one or more machine learning processes, such as one or more neural networks. In at least one embodiment, a trajectory is predicted using one or more first machine learning processes trained to imitate real-world observations, and one or more second machine learning processes trained to imitate results obtained by performing at least one simulation. In at least one embodiment, a computing system causes at least one device to move in accordance with the predicted trajectory.
Embodiments of the present disclosure relate to surface estimation using stereo imaging and surface disparities. For example, a surface disparity field representing a surface in the environment (e.g., the ground) may be estimated from stereo image data and used for various downstream tasks. For example, the difference between a stereo disparity field and a ground disparity field may be used to detect objects, a representation of a navigable space may be generated by radially casting 2D rays in the ground disparity field, the ground disparity field may be used to compensate ego-motion for high dynamic attitude changes, and/or the ground disparity field may be lifted to 3D and used to fit a surface profile to points sampled from the lifted point cloud.
B60W 60/00 - Drive control systems specially adapted for autonomous road vehicles
G01S 17/89 - Lidar systems, specially adapted for specific applications for mapping or imaging
G01S 17/931 - Lidar systems, specially adapted for specific applications for anti-collision purposes of land vehicles
G06V 10/762 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
In various embodiments, a computer-implemented method for controlling a vehicle includes performing a visual-language alignment operation based on a set of multi-view image features and a three-dimensional position encoding to generate a set of aligned image features, causing a language model to generate a driving plan for operating the vehicle based on the set of aligned image features, wherein the driving plan includes a description of a three-dimensional trajectory for the vehicle; and controlling the vehicle to move based on the driving plan.
The disclosed method for training multimodal models includes performing one or more operations to train a plurality of vision language models to generate a plurality of trained vision language models, where each trained vision language model included in the plurality of trained vision language models comprises a different vision encoder and a first language model, and performing one or more operations to train a multimodal model to generate a trained multimodal model, where the trained multimodal model comprises the different vision encoders and a second language model.
The disclosed method for training a robot control model includes performing, based on a plurality of multi-view images that have been masked, one or more operations to train a first untrained machine learning model to generate a first trained machine learning model that comprises a trained encoder, where the first trained machine learning model is trained to generate a plurality of reconstructions of the plurality of multi-view images prior to being masked; and performing, based on robot demonstration data, one or more operations to train a second untrained machine learning model that comprises the trained encoder to generate a second trained machine learning model, where the second trained machine learning model is trained to control a robot to perform at least part of a task.
Video compression systems based on a variational autoencoder, the variational autoencoder including an encoder and a decoder coupled via a latent space embedding component, the encoder configured to transform an input video into a feature maps of the input video at different feature resolution scales, the latent space embedding component configured to transform the feature maps into a latent space parameter distribution, and the decoder configured to sample the latent space parameter distribution to generate a compressed version of the input video.
A device includes receiver circuitry to receive incoming signals on a clock lane and data lanes and detection circuitry. The detection circuitry is to monitor the incoming signals on the clock lane, and determine that an incoming pattern of the incoming signals on the clock lane does not correspond to a clock pattern associated with communicating data on the data lanes. The detection circuitry is to initiate a power-down sequence in response to determining that the incoming pattern does not correspond to the clock pattern.
An over-voltage detection circuit for use with an IC is disclosed. The over-voltage detection circuit comprises first and second portions. The first portion includes a plurality of MOSFET transistors connected in series to ground, a first resistor connected between a virtual supply voltage (VVDD) and the plurality of MOSFET transistors, and a first inverter with an input connected between the first resistor and the plurality of transistors, where VVDD is a scaled down version of voltage applied to the IC (VDD) The second portion includes a second transistor connected to VDD applied to the IC, a pair of MOSFET transistors connected in series between the second transistor and ground, and a second inverter with an input connected between the second resistor and the pair of MOSFET transistors. An output of the second inverter indicates VDD is above an over-voltage level represented by VVDD.
G06F 21/81 - Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer by operating on the power supply, e.g. enabling or disabling power-on, sleep or resume operations
G06F 21/55 - Detecting local intrusion or implementing counter-measures
G06F 21/75 - Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information by inhibiting the analysis of circuitry or operation, e.g. to counteract reverse engineering
G06F 21/78 - Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure storage of data
H02H 1/00 - Details of emergency protective circuit arrangements
H02H 3/22 - Emergency protective circuit arrangements for automatic disconnection directly responsive to an undesired change from normal electric working condition, with or without subsequent reconnection responsive to excess voltage of short duration, e.g. lightning
28.
SCALARIZATION OF INSTRUCTIONS FOR SIMT ARCHITECTURES
Apparatuses, systems, and techniques to adapt instructions in a SIMT architecture for execution on serial execution units. In at least one embodiment, a predicate mask is initialized to identify a group of active threads associated with an instruction. The predicate mask is initialized with an inherited predicate of the instruction. The instruction is executed for a set of one or more threads selected from the group of active threads using a serial execution unit.
A ray tracing method forms a first accumulation of importance values of non-clamped pixels in an image and forms a second accumulation of waste importance of clamped pixels in the image. The first accumulation and the second accumulation are applied to set an updated average sample count for pixels in the image, and the ray tracer generates a number of sampling rays for particular pixels by applying the updated average sample count to a per-pixel importance setting.
In various examples, systems and methods are disclosed relating to jointly pruning channels, layers, and/or blocks of neural networks according to target latency constraints. One or more circuits can determine a plurality of importance scores for a plurality of layers of a neural network and can generate a latency cost data structure for the neural network. The one or more circuits can prune the neural network based at least on the plurality of importance scores, the latency cost data structure, and a target latency value.
Systems and methods for sharing security capabilities of network devices are disclosed. A system for a first network device includes a memory. The system also includes one or more processors, coupled to the memory, to determine, at the first network device, security capabilities of the first network device, transmit the security capabilities of the first network device to a network controller, and receive, from the network controller, a first routing table reflecting the security capabilities of the first network device.
In various examples, controlling dialogue using contextual information for conversational artificial intelligence (AI) systems and applications is described herein. Systems and methods are disclosed that use various sources of contextual information, along with textual inputs (e.g., queries), to generate textual outputs (e.g., responses) associated with a dialogue between a user (e.g., a user's character) and another character (e.g., a non-playable character) of an application. For instance, the contextual information may be stored in one or more databases, such as one or more vector databases, and/or in a specific form, such as embeddings that represent the contextual information. One or more language models may then process a textual input and/or at least a portion of the stored contextual information in order to generate a textual output. This textual output may then be used to generate speech that is output by the other character.
G10L 13/08 - Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
G06F 40/40 - Processing or translation of natural language
Technology related to calibrating synchronization of a data signal to a forwarded clock signal is described. A circuit includes a receiver circuit and a clock-to-data synchronization circuit. The receiver circuit receives a data signal and includes two inverters and two resistors. The clock-to-data synchronization circuit receives the data signal and the clock signal and determines an indication of synchronization between the data and clock signals. The clock-to-data synchronization circuit adjusts, using the indication, a parameter of the receiver circuit to synchronize the data signal and the clock signal.
In various examples, a system can perform multimodal selection of data to generate and/or enrich efficient datasets. The system can retrieve clusters of image frames generated according to semantic characteristics, such as semantic embeddings, of the image frames. The system can selectively filter out image frames from the clusters that are visually similar to other image frames in the clusters, which can reduce the size of the resulting dataset while maintaining target amounts of semantic information in the dataset. The system can selectively add new image frames to the dataset, such as new image frames that have semantic differences from the images of the dataset. The system can update any of various AI models, such as to fine-tune a neural network-based model, suing the dataset.
G06V 10/762 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
G06V 20/70 - Labelling scene content, e.g. deriving syntactic or semantic representations
35.
GROUNDED PROMPTING AND ADAPTATION FOR REFERRING VIDEO OBJECT SEGMENTATION
Referring Video Object Segmentation (RVOS) aims to segment an object referred to by a sentence query throughout an entire video. In contrast to Referring Image Segmentation (RIS), RVOS is particularly faced with dynamic visual challenges, such as position and size variation, pose deformation, object occlusion or exit, and scene variation. Moreover, the referring sentence may contain long-term motions or actions, which may not be easily recognized from a single frame. Existing works that address this challenging task generally require end-to-end training for vision-language models, which can be computationally expensive and time-consuming, while the requirement of dense mask annotations for training impedes the scalability of those approaches. The present disclosure uses grounded prompting to adapt image-based segmentation models to video object segmentation tasks, which can be achieved with relying only on weak supervision.
G06V 10/26 - Segmentation of patterns in the image fieldCutting or merging of image elements to establish the pattern region, e.g. clustering-based techniquesDetection of occlusion
G06T 7/70 - Determining position or orientation of objects or cameras
G06V 10/25 - Determination of region of interest [ROI] or a volume of interest [VOI]
G06V 10/40 - Extraction of image or video features
G06V 10/774 - Generating sets of training patternsBootstrap methods, e.g. bagging or boosting
Reinforcement learning, which is a machine learning technique where a model learns to make decisions that maximize a reward, has shown great promise in various domains that involve sequential decision making, including for many real-world tasks, such as inventory management, traffic signal optimization, network optimization, resource allocation, and robotics. However, current neural network (NN) based solutions for reinforcement learning struggle with interpretability, handling categorical data, and supporting light implementations suitable for low-compute devices. The present disclosure provides a gradient boosting trees (GBT) framework that is tailored for reinforcement learning, which may enable interpretability, may be well suited for real-world tasks with structured data, and may be capable of deployment on low-compute devices.
Apparatuses, systems, and techniques to identify orientations of objects within images. In at least one embodiment, one or more neural networks are trained to identify an orientations of one or more objects based, at least in part, on one or more characteristics of the object other than the object's orientation.
G06T 7/73 - Determining position or orientation of objects or cameras using feature-based methods
G06V 10/44 - Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersectionsConnectivity analysis, e.g. of connected components
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
G06V 10/778 - Active pattern-learning, e.g. online learning of image or video features
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
G06V 20/56 - Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
In various embodiments, a computer-implemented method for training vision language models includes generating, based on a set of key frames that include sensor data captured during operation of a vehicle, a subset of key frames that meets a diversity criterion, generating, based on the set of key frames, a set of prompts that describe the operation of the vehicle, generating, based on the subset of key frames and the set of prompts, a set of conversations that include one or more questions and one or more corresponding answers associated with operation of the vehicle, generating training data that includes the subset of key frames, the set of prompts, and the set of conversations, and performing, based on the training data, one or more operations to train a vision language model to generate a trained vision language model.
G06V 10/25 - Determination of region of interest [ROI] or a volume of interest [VOI]
G06V 10/44 - Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersectionsConnectivity analysis, e.g. of connected components
G06V 10/762 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
G06V 20/70 - Labelling scene content, e.g. deriving syntactic or semantic representations
39.
TECHNIQUES FOR IMPLEMENTING MULTIMODAL LARGE LANGUAGE MODELS WITH MIXTURES OF VISION ENCODERS
The disclosed method for training multimodal models includes performing one or more operations to train a plurality of vision language models to generate a plurality of trained vision language models, where each trained vision language model included in the plurality of trained vision language models comprises a different vision encoder and a first language model, and performing one or more operations to train a multimodal model to generate a trained multimodal model, where the trained multimodal model comprises the different vision encoders and a second language model.
Apparatuses, systems, and techniques including APIs, subscription services, and controllers to enable one or more fifth generation new radio (5G-NR) networks to share information. For example, a processor comprising one or more circuits can perform an API or subscription service to cause a device in a radio access network (RAN) to share its analytic data with a device in a transport network, and said device in said transport network can use said analytic data to adjust its network settings to improve performance.
H04L 67/12 - Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
Apparatuses, systems, and techniques to select fifth-generation (5G) new radio data. In at least one embodiment, a processor includes one or more circuits to select 5G new radio signal information in parallel.
H04L 1/00 - Arrangements for detecting or preventing errors in the information received
G06F 9/38 - Concurrent instruction execution, e.g. pipeline or look ahead
H03M 13/00 - Coding, decoding or code conversion, for error detection or error correctionCoding theory basic assumptionsCoding boundsError probability evaluation methodsChannel modelsSimulation or testing of codes
H03M 13/11 - Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits using multiple parity bits
42.
CUSTOMIZING TEXT-TO-SPEECH LANGUAGE MODELS USING ADAPTERS FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS
In various examples, one or more text-to-speech machine learning models may be customized or adapted to accommodate new or additional speakers or speaker voices without requiring a full re-training of the models. For example, a base model may be trained on a set of one or more speakers and, after training or deployment, the model may be adapted to support one or more other speakers. To do this, one or more additional layers (e.g., adapter layers) may be added to the model, and the model may be re-trained or updated—e.g., by freezing parameters of the base model while updating parameters of the adapter layers—to generate an adapted model that can support the one or more original speakers of the base model in addition to the one or more additional speakers corresponding to the adapter layers.
G10L 13/00 - Speech synthesisText to speech systems
G10L 17/02 - Preprocessing operations, e.g. segment selectionPattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal componentsFeature selection or extraction
Apparatuses, systems, and techniques to update a machine learning model associated with an object. In at least one embodiment, the machine learning model is updated based at least in part on, for example, one or more distributions associated with the machine learning model.
Embodiments of the present disclosure relate to ground surface estimation using localized surface fitting. A three-dimensional (3D) surface structure (e.g., a road surface profile) may be estimated using a nonlinear optimization to fit height values to (e.g., accumulated, bias-corrected) LiDAR detections (e.g., sampled in localized regions along one or more predicted trajectories). For example, LiDAR data (e.g., detected 3D point clouds) may be ego-motion compensated, corrected for measurement bias, accumulated, and sampled along one or more predicted trajectories, and the height of each trajectory point may be fitted to the heights of the corresponding sampled points using a nonlinear optimization. As such, the resulting road surface profile (e.g., modeled along the wheel track(s)) may be provided to an adaptive suspension control system to modulate the damping characteristic of the suspension system to counteract indentations (e.g., potholes) or protrusions (e.g., speed bumps) represented in the road surface profile.
Embodiments of the present disclosure relate to correction of LiDAR measurement bias. In some embodiments, a LiDAR measurement bias such as a range-dependent height offset and/or a reflectivity-dependent height offset may be estimated in an offline process, the estimated biases may be stored in any suitable way (e.g., in one or more look up tables, indexed by range and/or reflectivity), and LiDAR points measured during an online process may be compensated by looking up and subtracting a range-dependent height bias corresponding to the measured range, and/or by looking up and subtracting a reflectivity-dependent height bias corresponding to the measured reflectivity.
Embodiments of the present disclosure relate to surface estimation using stereo imaging and surface disparities. For example, a three-dimensional (3D) surface structure may be modeled as a disparity field, and a surface disparity field representing a surface in the environment (e.g., the ground) may be generated using a constrained nonlinear hierarchical optimization to process stereo image data and iteratively refine estimated surface disparity values based on weights that guide the optimization to expected surface values (e.g., ground, road). The resulting surface (e.g., ground) disparity field may be used for a variety of downstream tasks, such as obstacle detection, segmentation of a navigable space, ego-motion refinement, and/or generation of an estimated surface profile.
Apparatuses, systems, and techniques to control a device. In at least one embodiment, an autonomous or semi-autonomous machine (e.g., a vehicle) is controlled based, at least in part on, for example, one or more machine learning processes, such as one or more neural networks. In at least one embodiment, the one or more machine learning processes are trained using inference results obtained based at least in part on real-world observations and rule based results obtained from having performed one or more simulations. After training, in at least one embodiment, the machine learning process (es) are used to infer trajectories (e.g., to control an autonomous or semi-autonomous machine).
Systems and methods for automatically adjusting the control settings of one or more adaptive systems in an adaptive space are disclosed. For example, the embodiments provide systems and methods for automatically controlling lighting, climate, and furniture layout in an adaptive space. The systems and methods utilize real-time sensed information from the adaptive space (as well as occupants within the adaptive space) to adaptively modify the living space by adjusting, for example, lighting, climate, and furniture layout. Adaptation is enabled using a hierarchy of machine learning models that can synthesize different kind of sensory information and predict human behaviors in the adaptive space. In response to predicted behaviors, the systems and methods can be used to automatically change aspects of the adaptive space so as to satisfy the behavioral needs of the occupants.
G05B 13/02 - Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
G05B 13/04 - Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
H05B 47/105 - Controlling the light source in response to determined parameters
49.
Address size conversion via application programming interface
Apparatuses, systems, and techniques to cause to cause one or more first storage address sizes to be converted into one or more second storage address sizes. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause one or more first storage address sizes to be converted to one or more second storage address sizes based, at least in part, on one or more identifiers of one or more physical storage locations corresponding to either of the one or more first storage address sizes or the one or more second storage address sizes.
Apparatuses, systems, and techniques are presented to determine visual properties of an object. In at least one embodiment, shape and visual properties of an object can be determined using differently illuminated images fed to a pipeline of neural networks.
Approaches of the disclosure are directed towards the intelligent management of notifications. Notifications can be intelligently managed across digital devices by contextualizing and prioritizing notifications to, for example, match a current state or situation of a user. Incoming notifications may be contextualized by analyzing their content and sources, such as by using large language models. Such an approach may further take into account the user's current status, including factors such as location, activity, and personal preferences, as may be obtained from various sources or learned over time. Preferences or appropriate delivery methods can be learned by observing and/or analyzing user interactions associated with previously-presented notifications and adjusting the delivery methods for subsequent notifications, which may involve suppressing the notification, presenting immediately, changing an alert type, or altering content for presentation.
H04M 1/72454 - User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions according to context-related or environment-related conditions
H04M 1/72457 - User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions according to geographic location
H04M 1/72484 - User interfaces specially adapted for cordless or mobile telephones wherein functions are triggered by incoming communication events
52.
ITERATIVE AUTOMATIC LABELING OF MEDIA DATA FOR ARTIFICIAL INTELLIGENCE APPLICATIONS
Disclosed are apparatuses, systems, and techniques for automated iterative content detection and annotation of objects in media items. The techniques include performing a plurality of iterations to identify objects represented in a media item and referenced in a plurality of object descriptions of a prompt. An individual iteration includes identifying, using a content detection model, a subset of the objects represented in the media item and referenced in the plurality of object descriptions, or no objects represented in the media item and referenced in the plurality of object descriptions. Using the content detection model includes applying the content detection model to the media item and to the prompt or to an iteration prompt obtained from the prompt by eliminating descriptions of the subsets of the objects identified during previous iterations. The techniques further include generating, using the identified objects, a characterization of the media item.
In various examples, updating visual characteristics of frames for content streaming systems and applications is described herein. Systems and methods are disclosed that use one or more minimum and/or maximum color values associated one or more color channels to update color values associated with points (e.g., pixels) of frames in order to improve one or more visual characteristics (e.g., increase contrast etc.) associated with the frames. For instance, and for at least a portion of a frame, in some examples, a minimum color value and/or a maximum color value associated with the color channels may be used to “stretch” the color values associated with the points, such as by decreasing at least a portion of the color values and/or increasing another portion of the color values. Additionally, in some examples, average color values and/or factors may be used to further update the color values.
Various examples, systems, and methods are disclosed relating to a computer system that can be designed for software development. The computer system can identify or access written details about the requirements for a software product. Using these requirements, the computer system can generate prompts that guide the operation of the software. The computer system can use the prompts and the initial requirements to produce feedback through a neural network, such as a large language model. The neural network can be trained with examples of software requirements and corresponding feedback. The feedback can suggest changes or confirm the requirements. Additionally, the computer system can provide the feedback, used for refining and improving software requirements.
Systems and methods are disclosed related to generative AI with logical reasoning. For example, an LLM may be used to convert statements such as natural language statements or lines of software code into logical statements in a logic specification language, a logical reasoning engine may be used to evaluate the logical statements, and an LLM may be used to explain the output of the logical reasoning engine in natural language (e.g., using a system that uses retrieval augmented generation (RAG) and/or a fine-tuned LLM). As such, the present techniques may be utilized to provide a generative AI system that can logically reason about, deduce new facts from, compute logical consequences of, and/or check the consistency of a set of statements such as natural language statements or software code, and provide a logical explanation of how the computation(s) were done.
In various examples, systems and methods are disclosed relating to accurately extracting requested portions of video data by concatenating video data with selective transcoding. The systems can receive a request indicating a start position and an end position and select a video data element including the start position. The systems can decode a portion of the video data element including the start position and encode a subset of a plurality of first frames of the video data element to provide a first video output. The systems can combine the first video output with a second video output that includes one or more second frames of the video data up until the end position.
H04N 19/172 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
H04N 19/132 - Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
H04N 19/177 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]
H04N 21/234 - Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
57.
GENERATIVE AI MODELS FOR IMAGE RENDERING AND INVERSE RENDERING
Embodiments of the present disclosure relate to rendering and inverse rendering using one or more generative models. “Rendering” refers to the process of generating a final visual image, video frame, or animation from a 2D or 3D model. “Inverse rendering” is a process that involves deducing or estimating the properties (e.g., material maps or other properties such as geometry, lighting, and textures) of a scene from observed images or visual data. Essentially, it aims to reverse the traditional rendering process. Various aspects of the present disclosure introduce editable light and material controls into generative models to allow for artistic creation. Various embodiments integrate generative models as a renderer for classic rendering pipelines to upcycle and enhance the style of rendered content.
A game-agnostic event detector can be used to automatically identify game events. Game-specific configuration data can be used to specify types of pre-processing to be performed on media for a game session, as well as types of detectors to be used to detect events for the game. Event data for detected events can be written to an event log in a form that is both human- and process-readable. The event data can be used for various purposes, such as to generate highlight videos or provide player performance feedback. The event data may be determined based upon output from detectors such as optical character recognition (OCR) engines, and the regions may be upscaled and binarized before OCR processing.
G06V 20/62 - Text, e.g. of license plates, overlay texts or captions on TV images
A63F 13/53 - Controlling the output signals based on the game progress involving additional visual information provided to the game scene, e.g. by overlay to simulate a head-up display [HUD] or displaying a laser sight in a shooting game
G06T 3/4007 - Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
G06V 20/40 - ScenesScene-specific elements in video content
In various examples, sensor data recorded in the real-world may be leveraged to generate transformed, additional, sensor data to test one or more functions of a vehicle—such as a function of an AEB, CMW, LDW, ALC, or ACC system. Sensor data recorded by the sensors may be augmented, transformed, or otherwise updated to represent sensor data corresponding to state information defined by a simulation test profile for testing the vehicle function(s). Once a set of test data has been generated, the test data may be processed by a system of the vehicle to determine the efficacy of the system with respect to any number of test criteria. As a result, a test set including additional or alternative instances of sensor data may be generated from real-world recorded sensor data to test a vehicle in a variety of test scenarios.
G06V 10/774 - Generating sets of training patternsBootstrap methods, e.g. bagging or boosting
G06V 20/56 - Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
G07C 5/08 - Registering or indicating performance data other than driving, working, idle, or waiting time, with or without registering driving, working, idle, or waiting time
In various examples, systems and methods are disclosed relating to detecting objects on parallel processing systems. The systems can generate a graph of a cost matrix associating a plurality of first object elements with a plurality of second object elements. The graph can include a plurality of first nodes representing rows of the cost matrix and a plurality of second nodes representing columns of the cost matrix. The systems can determine a matching between the plurality of first nodes and the plurality of second nodes based at least on the cost matrix. The systems can then update the matching by generating an alternating tree and performing one or more matrix multiplications to update the matching based on the alternating tree detecting an unmatched second node of the graph.
Testing chip or board packages for thermal failure can be conducted at various stages of the package's life cycle. Conventional testing can detect a thermal failure of the package, though conventional testing does not detect at which specific thermal layer a failure has occurred. By applying an R deviation percentage to the measured thermal parameters received from a testing unit, a specific breakdown of each thermal interface layer can be analyzed. Each thermal interface layer has an associated gold value which is a known good value of thermal energy at a specific time interval. The gold value can be compared to the timing results calculated from the measured thermal parameters. This comparison can then identify which thermal interface layers, if any, are causing the thermal failure of the package. The thermal interface layer can then be repaired or the manufacturing process can be modified to eliminate the failure.
In various examples, techniques for improving image retrieval precision for machine learning systems and applications is described herein. Systems and methods described herein may segment images into various portions (e.g., patches, tiles, areas, regions, etc.) and then use data associated with the portions to perform a search. For instance, after segmenting the images into the portions, one or more models may process the images in order to generate the data for the portions, such as data representing embeddings, identifiers, locations, and/or any other information. This data may then be used to identify at least a set of images when performing a search for a query. Additionally, systems and methods described herein may perform improved searches using compositable queries and/or user feedback.
G06V 20/58 - Recognition of moving objects or obstacles, e.g. vehicles or pedestriansRecognition of traffic objects, e.g. traffic signs, traffic lights or roads
Apparatuses, systems, and techniques to identify a location in which to place objects within a graphically rendered scene. In at least one embodiment, a location in which to place objects is identified using one or more neural networks, based, at least in part, on text or speech input to the one or more neural networks.
In various examples, systems and methods are disclosed relating to detecting objects on parallel processing systems. The systems can generate a graph of a cost matrix associating a plurality of first object elements with a plurality of second object elements. The graph can include a plurality of first nodes representing rows of the cost matrix and a plurality of second nodes representing columns of the cost matrix. The systems can determine a matching between the plurality of first nodes and the plurality of second nodes based at least on the cost matrix. The systems can then update the matching by generating an alternating tree and performing one or more matrix multiplications to update the matching based on the alternating tree detecting an unmatched second node of the graph.
G06V 20/58 - Recognition of moving objects or obstacles, e.g. vehicles or pedestriansRecognition of traffic objects, e.g. traffic signs, traffic lights or roads
Techniques for training a vision-based robot control model include generating, based on scene data, a plurality of scenes, generating, based on the plurality of scenes, one or more goal specifications, determining, based on the one or more goal specifications and a robot model, one or more robot plans, generating, based on the one or more robot plans and the plurality of scenes, simulated sensor data, and performing one or more training operations to generate a trained vision-based robot control model based on the one or more goal specifications, the one or more robot plans, and the simulated sensor data.
Techniques for controlling a robot include receiving sensor data and one or more goal specifications, processing the sensor data, a robot size, and the one or more goal specifications using one or more trained encoders to generate a plurality context tokens, processing the plurality of context tokens using one or more trained decoders to generate a robot plan, and controlling a robot based on the robot plan.
The disclosed method for generating molecules includes selecting, based on one or more molecule properties, one or more hard molecule fragments and one or more soft molecule fragments; and processing, using a trained machine learning model, the one or more hard molecule fragments and the one or more soft molecule fragments to generate a molecule, where the molecule includes the one or more hard molecule fragments, and the trained machine learning model generates the molecule based on the one or more soft molecule fragments.
Systems and methods described relate to the synthesis of content using generative models. In at least one embodiment, a score-based generative model can use a stochastic differential equation with critically-damped Langevin diffusion to learn to synthesize content. During a forward diffusion process, noise can be introduced into a set of auxiliary (e.g., “velocity”) values for an input image to learn a score function. This score function can be used with the stochastic differential equation during a reverse diffusion denoising process to remove noise from the image to generate a reconstructed version of the input image. A score matching objective for the critically-damped Langevin diffusion process can require only the conditional distribution learned from the velocity data. A stochastic differential equation based integrator can then allow for efficient sampling from these critically-damped Langevin diffusion models.
Approaches in accordance with various embodiments can perform spatial hash map updates while ensuring the atomicity of the updates for arbitrary data structures. A hash map can be generated for a dataset where entries in the hash map may correspond to multiple independent values, such as pixels of an image to be rendered. Update requests for independent values may be received on multiple concurrent threads, but change requests for independent values corresponding to a hash map entry can be aggregated from a buffer and processed iteratively in a single thread for a given hash map entry. In the case of multi-resolution spatial hashing where data can be stored at various discretization levels, this operation can be repeated to propagate changes from one level to another.
In various examples, systems and methods are disclosed relating to historical reset. One method includes determining at least one history buffer for a frame, determining, in a spatial domain, a spatial component of the accumulated pixel value at the pixel location based on a first spatial moment and a second spatial moment, determining, in a temporal domain, a temporal component of the accumulated pixel value at the pixel location based on a first temporal moment and a second temporal moment. The method further includes determining a pixel value range based at least on the spatial component and the temporal component, determining an amount of historical reset to apply, and updating the accumulated pixel value based at least on the amount of historical reset.
Techniques for training a machine learning model to control a robot include performing, based on a first set of robot data, one or more training operations to generate one or more first trained machine learning models for performing one or more robotic tasks, expert demonstration data, and one or more trained evaluation models; and performing, based on the expert demonstration data, a set of sensor data, and first feedback generated by the one or more trained evaluation models, one or more training operations to generate a second trained machine learning model to control a robot for a plurality of robotic tasks.
G05B 13/02 - Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
Apparatuses, systems, and techniques to deterministically classify data. In at least one embodiment, inference classes with weights within a threshold range are treated as equivalent and one representative inference class is selected.
Apparatuses, systems, and techniques to identify a first one or more fifth generation new radio (5G NR) signal pairs having one or more quality characteristics based, at least in part, on neural network weight information received from a plurality of different sources. In at least one embodiment, neural network weight information can be aggregated to generate one or more neural networks to identify the first one or more 5G NR signal pairs.
A first device includes first circuitry to communicate with a second device over a first die-to-die (D2D) link, second circuitry to communicate with the second device over a second D2D link, and a link controller comprising logic to send first configuration data to the second device over the first D2D link. Responsive to determining that the first configuration data failed to configure the second device, the link controller comprises logic to send second configuration data to the second device over the second D2D link.
An integrated circuit includes an analog-to-digital converter (ADC), associated with a thermal sensor, to determine a present power supply value of a power supply voltage for the thermal sensor. E-fuse registers store a set of calibrated temperature values, from the thermal sensor, for each power supply value of a plurality of power supply values. Control logic is coupled to the ADC and the e-fuse registers. The control logic reads the present power supply value from the ADC and generates, based on the present power supply value and the plurality of power supply values, a calibration equation that relates calibrated temperature values to thermal sensor values for the present power supply value.
Disclosed are systems and techniques for a cloud function controller for executing code using graphics processing units (GPUs) in a serverless architecture. The techniques include maintaining, at a cloud function controller, a plurality of cloud function queues for a plurality of workers in a plurality of cluster environments. Each cluster environment hosts an agent that communicates with the cloud function controller and has GPU resources accessible to a subset of the plurality of workers. The techniques include storing a first cloud function execution request of an entity in a first queue of the plurality of cloud function queues, receiving a first cloud function execution result corresponding to the first cloud function execution request of the entity from a first worker of the plurality of workers in a first cluster environment of the plurality of cluster environments, and causing the first cloud function execution result to be provided to the entity.
Disclosed are systems and techniques for a cloud function worker for executing code using graphics processing units (GPUs) in a serverless architecture. The techniques include receiving, at a cloud function worker, a cloud function execution request from a cloud function queue of a cloud function controller. The techniques include identifying, based on the cloud function execution request, a first artificial intelligence (AI) model of a plurality of AI models of the cloud function controller. The techniques include generating a cloud function execution result of the cloud function execution request using the first AI model and at least a graphics processing unit (GPU) of a cluster environment hosting the cloud function worker. The techniques include causing the cloud function execution result to be transmitted to the cloud function controller.
An on-chip static RAM (SRAM) is disclosed. In one embodiment, the on-chip SRAM includes an array of memory cells arranged in columns and rows, a read bit line for each column of the array of memory cells, and an output latch to store a bit for one of the array of memory cells when the on-chip SRAM is in a retention mode. In one embodiment, each memory cell in the column is connected to the read bit line for that column and the output latch includes a transistor that does not allow the read bit line for the column corresponding to the one of the array of memory cells to pre-charge when a data latch in the output latch is in a high state.
Testing a semiconductor can be time-consuming as the chip architecture becomes more complex. Testing the possible scenarios becomes increasingly difficult. Chip quality characteristics relating to the chips on a wafer can be used to estimate a probability or rating relating to bypassing system-level testing (SLT). A chip can bypass SLT if there is a high likelihood of passing SLT. Thousands of chip characteristics can be received from wafer testing, chip probe testing, environmental parameters, factory parameters, and other parameters. A chip quality model can use chip quality characteristics as input to generate chip group and SLT parameters. The chip quality model can be a machine learning model or other types of machine learning systems. The chip group parameter or the SLT parameter can be used to direct the testing path of a chip where some chips can bypass SLT thereby saving production time.
Apparatuses, systems, and methods for low-latency audio-to-face animation with emotion detection are disclosed herein. The system may receive a first audio stream associated with a first device and a second audio stream associated with a second device, and provide, concurrently, a first segment of the first audio stream and a second segment of the second audio stream as inputs to an emotion detection artificial intelligence (AI) model to obtain first emotion data and second emotion data. The system may then provide, concurrently, a third segment of the first audio stream with the first emotion data and a fourth segment of the second audio stream with the second emotion data as inputs to a face animation AI model to obtain first face pose data and second face pose data, and provide the first face pose data to the first device and the second face pose data to the second device.
G10L 25/63 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination for estimating an emotional state
81.
SPEAKER IDENTIFICATION, VERIFICATION, AND DIARIZATION USING NEURAL NETWORKS FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS
Disclosed are apparatuses, systems, and techniques that may use machine learning for implementing speaker recognition, verification, and/or diarization. The techniques include applying a neural network (NN) to a speech data to obtain a speaker embedding representative of an association between the speech data and a speaker that produced the speech. The speech data includes a plurality of frames and a plurality of channels representative of spectral content of the speech data. The NN has one or more blocks of neurons that include a first branch performing convolutions of the speech data across the plurality of channels and across the plurality of frames and a second branch performing convolutions of the speech data across the plurality of channels. Obtained speaker embeddings may be used for various tasks of speaker identification, verification, and/or diarization.
Apparatuses, systems, and techniques to generate trajectory predictions. In at least one embodiment, trajectory predictions are generated based on, for example, one or more neural networks.
B60W 60/00 - Drive control systems specially adapted for autonomous road vehicles
G06N 3/0442 - Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
Neural network architectures for feature extraction from visual input. In at least one embodiment, a neural network architecture for a vision backbone includes hybrid stages with at least one state space model (SSM)-based block preceding at least one transformer block. In at least one embodiment, an SSM-based block includes parallel branches, one including an SSM and one without an SSM, and a concatenation layer for concatenating the output of each branch. In at least one embodiment, the SSM performs a parallel selective scan operation to efficiently map tokens of an input sequence to tokens of an output sequence via GPU acceleration.
In various examples, an estimated curvature associated with a driving surface may be updated or improved based on additional sources of information, such as map data and/or perception data. For instance, systems and methods are disclosed that may predict curvature (e.g., magnitudes of curvature) for one or more portions and/or points along a driving surface traversed by a machine. The predicted curvature may be determined based on one or more previous curvature predictions for the driving surface and based on a trajectory and/or a distance traveled by the machine subsequent to making those previous curvature predictions. In some instances, the predicted curvature may be updated based on one or more measured curvatures associated with the driving surface. These measured curvatures may be determined using map data associated with the driving surface and/or perception data generated from sensor data obtained using one or more sensors of the machine.
A method for identifying potential wheel balance and alignment issues in autonomous machines is described. A computing device of an autonomous machine obtains a rotation value of a steering wheel of the autonomous vehicle. The rotation value can be obtained by analyzing images from an in-cabin camera to estimate the rotation value or it can be obtained from a sensor of the autonomous machine. The path or trajectory of the autonomous vehicle is associated with an expected rotation value and this value is compared to the obtained or initial rotation value. In response to determining a difference between the obtained rotation value and the expected rotation value being greater than a threshold value, a notification is generated to perform an alignment check of the autonomous vehicle.
In various examples, to improve path perception in machine learning implementations, a temporal model includes a backbone model trained to predict one or more path perception outputs, such as, path geometry, path class, path uncertainty and/or other path attributes, for a current input frame. To create temporal context, the temporal model enables the backbone model to separately operate (in parallel or otherwise) on a set of frames that are temporally related to the current input frame. The outputs of the separate executions of the backbone model are then concatenated and processed via one or more convolution operations to generate a set of features that will be fed to the final output layer of the pipeline that encapsulates one or more path perception outputs that are generated based on temporal context.
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
B60W 60/00 - Drive control systems specially adapted for autonomous road vehicles
G05D 1/243 - Means capturing signals occurring naturally from the environment, e.g. ambient optical, acoustic, gravitational or magnetic signals
G05D 1/246 - Arrangements for determining position or orientation using environment maps, e.g. simultaneous localisation and mapping [SLAM]
G05D 101/15 - Details of software or hardware architectures used for the control of position using artificial intelligence [AI] techniques using machine learning, e.g. neural networks
G06V 10/77 - Processing image or video features in feature spacesArrangements for image or video recognition or understanding using pattern recognition or machine learning using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]Blind source separation
G06V 20/56 - Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
87.
HYBRID SELF-ATTENTION FOR OPTIMIZATION OF DECODER AI MODELS
Disclosed are apparatuses, systems, and techniques deploying hybrid self-attention for efficient artificial intelligence (AI) processing, including using sparse attention to obtain hidden states and using full or intermediate attention to predict new tokens. The techniques include predicting, using a set of N hidden states, a token, an individual hidden state of the set of N hidden states being generated, by an attention-based neural network, using M other previously-predicted tokens, such that M is smaller than N.
In various example, embodiments are directed to angle bias error identification and correction for autonomous and semi-autonomous systems and applications. Systems and methods are disclosed that identify angle bias error(s) associated with detected sensor data and correct for such angle bias error(s) for use in localization, navigation, and/or other uses by autonomous vehicles, semi-autonomous vehicles, robots, and/or other object or machine types. In embodiments, angle bias error identification is performed by detecting angle error in association with various points detected via a sensor during normal driving operation of an ego-machine. The detected angle errors may be used to generate a representation of angle bias error for various angles of the sensor, which may be used to apply a correction to raw angle measurements. Using techniques described herein, for example, corrected azimuth angle measurements may be generated for use by downstream modules to perform more efficient and effective navigation.
In various examples, embodiments are directed to identifying map data (e.g., relevant to a route) using a quad-tree spatial index. In this regard, spatial map data that indicates various map features is represented in a quad-tree spatial index for use in identifying map data. To identify map data, bounding shapes may be generated in association with various segments of a route. An indication of an object-oriented bounding shape may be used to query the quad-tree spatial index to identify map data related to the object-oriented bounding shape. In embodiments, an object-oriented spatial index may be generated that indexes the object-oriented bounding shapes associated with the route. The object-oriented spatial index may be used to query the quad-tree spatial index to identify map data related to the corresponding object-oriented bounding shapes. Alternatively, the quad-tree spatial index may be used to query the object-oriented spatial index to identify map data.
Disclosed are systems and techniques for a cloud function controller for executing code using graphics processing units (GPUs) in a serverless architecture. The techniques include generating, at cloud function controller, a first worker deployment request for a first worker and receiving a first registration of the first worker based on the first worker deployment request. The first worker is deployed in a first cluster environment having graphics processing unit (GPU) resources accessible to the first worker to process cloud function execution requests. The techniques include receiving an indication that the first worker is unavailable, generating a second worker deployment request for a second worker, and receiving a second registration of the second worker based on the second worker deployment request. The second worker is deployed in a second cluster environment having GPU resources accessible to the second worker to continue processing the cloud function execution requests.
Disclosed are systems and techniques for a cloud function worker for executing code using graphics processing units (GPUs) in a serverless architecture. The techniques include receiving, at a cloud function worker, a first cloud function execution request from a cloud function queue of a cloud function controller, executing a first instance of a code based on the first cloud function execution request using at least a graphics processing unit (GPU) of a cluster environment hosting the cloud function worker, determining first metrics of the GPU of the cluster environment, responsive to the first metrics of the GPU satisfying a bandwidth criterion, requesting a second cloud function execution request from the cloud function queue of the cloud function controller, and executing, concurrent with execution of the first instance of the code, a second instance of the code based on the second cloud function execution request.
Apparatuses, systems, and techniques for virtual asset rendering based on articulation data are provided. Characteristic data indicating physical characteristic(s) of an object is collected and provided as input to an artificial intelligence (AI) model. One or more outputs of the IA model are obtained, which includes articulation data indicating a stationary state and/or a dynamic state of the object based on the physical characteristic(s) of the object. A model file associated with a virtual asset corresponding to the object is updated based on the articulation data. the model file, when executed, creates a rendering of the virtual asset in a virtual scene associated with a three-dimensional (3D) graphics platform according to at least one of the stationary state or the dynamic state of the object.
Disclosed are apparatuses, systems, and techniques for real-time streaming and playback of synchronized audio and animation data in a web-browser, which include responsive to determining that audio data in an audio data queue satisfies a first criterion, generating a delay indicator; receiving updates to the audio data queue; and responsive to determining that the audio data in the audio data queue satisfies a second criterion, causing the audio data in the audio data queue and animation data in an animation data queue to play in accordance with the delay indicator to maintain synchronization between the audio data and the animation data.
H04N 21/43 - Processing of content or additional data, e.g. demultiplexing additional data from a digital video streamElementary client operations, e.g. monitoring of home network or synchronizing decoder's clockClient middleware
A datacenter cooling system is disclosed. The system includes a first cooling loop with a heat exchanger to exchange heat with a second cooling loop. The second cooling loop includes a cooling distribution unit (CDU) to exchange heat between the second cooling loop and a primary cooling loop.
Apparatuses, systems, and techniques are presented to generate image or video content representing at least one point of view. In at least one embodiment, one or more neural networks are used to generate one or more images of one or more objects from a first point of view based at least in part upon one or more images of the one or more objects from a second point of view.
Apparatuses, systems, and techniques to identify objects within an image. In at least one embodiment, objects are identified in an image using one or more neural networks based, at least in part, on neural network outputs ranked according to uncertainty values.
Approaches of the disclosure are directed towards dynamic and automatic user interface adjustment that accounts for drift in the finger or position of a user over time while providing touch input without direct tactile feedback. Due to a lack of tactile response, a tap position of a finger may drift over time. To compensate for this drift, the touch positions of a user can be monitored over time and compared to regions of the touch interface that are associated with specific inputs. For at least certain types of inputs, it can be determined when there is a pattern or direction of drift that may lead to problems with missed input. Based on the detected drift, the location or screen region associated with that input can be shifted by an appropriate magnitude, as may be based in part upon the magnitude of drift or screen real estate, among other such factors.
G06F 3/04886 - Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures by partitioning the display area of the touch-screen or the surface of the digitising tablet into independently controllable areas, e.g. virtual keyboards or menus
A63F 13/2145 - Input arrangements for video game devices characterised by their sensors, purposes or types for locating contacts on a surface, e.g. floor mats or touch pads the surface being also a display device, e.g. touch screens
A63F 13/22 - Setup operations, e.g. calibration, key configuration or button assignment
G06F 3/04842 - Selection of displayed objects or displayed text elements
98.
USING ATOMIC OPERATIONS TO IMPLEMENT A READ-WRITE LOCK
A processing device maintains locking data including a first portion indicating whether a write operation is updating a data item or awaiting to update the data item, and a second portion indicating a number of read operations accessing the data item. A first read operation to the read the data item is received. a first locking condition is determined to be satisfied using the locking data, representing that no write operations are updating the data item or awaiting to update the data item. Responsive to the determination the first locking condition is satisfied, the second portion of the locking data is atomically incremented to reflect the number of read operations reading the data, and the first read operation to read the data is executed.
In various examples, a technique for routing generic HTTP traffic over a reversed UDP stream includes receiving, from a client device via a first connection, a client request to perform a function with a server that is not addressable by the client device; determining that a second connection with the server has been established; receiving first data from and transmitting second data to the client device via the first connection; and transmitting the first data to and receiving the second data from the server via the second connection.
Apparatuses, systems, and techniques for generating a clothed three-dimensional (3D) avatar character from a text prompt and enabling smooth animation through physics or neural simulators. In at least one embodiment, a clothed 3D avatar is generated through body layer modeling and garment layer modeling based on text descriptions. The outputs from the body layer and garment layer modeling are combined to generate an animation-ready, clothed 3D avatar.