Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network. One of the methods includes, for a weight tensor that includes weights of the neural network: performing, using a plurality of training examples, a training step to obtain respective gradients of a loss function with respect to the weights in the weight tensor; applying an optimizer to the respective gradients to generate respective gradient-based updates to the weights in the weight tensor; applying the respective gradient-based updates to the weights in the weight tensor to generate initial updated values of the weights in the weight tensor; scaling the initial updated values of the weights in the weight tensor to generate scaled updated values that have a predetermined target norm; and setting current values of the weights in the weight tensor for a next training step to be equal to the scaled updated values.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating controllable videos using generative neural networks.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing inputs using an epistemic machine learning model that improves the quality of outputs generated by a base machine learning model.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network. One of the methods includes, for a weight tensor that includes weights of the neural network: performing, using a plurality of training examples, a training step to obtain respective gradients of a loss function with respect to the weights in the weight tensor; applying an optimizer to the respective gradients to generate respective gradient-based updates to the weights in the weight tensor; applying the respective gradient-based updates to the weights in the weight tensor to generate initial updated values of the weights in the weight tensor; scaling the initial updated values of the weights in the weight tensor to generate scaled updated values that have a predetermined target norm; and setting current values of the weights in the weight tensor for a next training step to be equal to the scaled updated values.
Methods, systems, and apparatus for selecting actions to be performed by an agent interacting with an environment. One system includes a high-level controller neural network, low-level controller network, and subsystem. The high-level controller neural network receives an input observation and processes the input observation to generate a high-level output defining a control signal for the low-level controller. The low-level controller neural network receives a designated component of an input observation and processes the designated component and an input control signal to generate a low-level output that defines an action to be performed by the agent in response to the input observation. The subsystem receives a current observation characterizing a current state of the environment, determines whether criteria are satisfied for generating a new control signal, and based on the determination, provides appropriate inputs to the high-level and low-level controllers for selecting an action to be performed by the agent.
G06N 3/006 - Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
G06N 3/044 - Recurrent networks, e.g. Hopfield networks
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for designing a protein by jointly generating an amino acid sequence and a structure of the protein. In one aspect, a method comprises: generating data defining the amino acid sequence and the structure of the protein using a protein design neural network, comprising, for a plurality of positions in the amino acid sequence: receiving the current representation of the protein as of the current position: processing the current representation of the protein using the protein design neural network to generate design data for the current position that comprises: (i) data identifying an amino acid at the current position, and (ii) a set of structure parameters for the current position; and updating the current representation of the protein using the design data for the current position.
G16B 40/00 - ICT specially adapted for biostatisticsICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing video data using an adaptive visual speech recognition model. One of the methods includes receiving a video that includes a plurality of video frames that depict a first speaker; obtaining a first embedding characterizing the first speaker; and processing a first input comprising (i) the video and (ii) the first embedding using a visual speech recognition neural network having a plurality of parameters, wherein the visual speech recognition neural network is configured to process the video and the first embedding in accordance with trained values of the parameters to generate a speech recognition output that defines a sequence of one or more words being spoken by the first speaker in the video.
G10L 15/06 - Creation of reference templatesTraining of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
8.
LEARNED COMPUTER CONTROL USING POINTING DEVICE AND KEYBOARD ACTIONS
A computer-implemented method for controlling a particular computer to execute a task is described. The method includes receiving a control input comprising a visual input, the visual input including one or more screen frames of a computer display that represent at least a current state of the particular computer, processing the control input using a neural network to generate one or more control outputs that are used to control the particular computer to execute the task, in which the one or more control outputs include an action type output that specifies at least one of a pointing device action or a keyboard action to be performed to control the particular computer; determining one or more actions from the one or more control outputs; and executing the one or more actions to control the particular computer.
G06F 3/038 - Control and interface arrangements therefor, e.g. drivers or device-embedded control circuitry
G06F 3/023 - Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
G06F 40/284 - Lexical analysis, e.g. tokenisation or collocates
G06N 3/0442 - Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
A method performed by one or more data processing apparatus. The method comprises receiving an image item; obtaining a mask for selecting portions of the image item; and generating, from the image item, one or more video item comprising a respective one or more sequences of image frames. Each image frame comprises a respective portion of the image item selected using the mask. For each image sequence, the mask is translated incrementally over the image item to select the respective portions of the image item for successive image frames in the sequence. The method further comprises performing a machine learning task by processing the one or more video items using a machine learning model.
A method performed by one or more data processing apparatus. The method comprises receiving an image item; obtaining a mask for selecting portions of the image item; and generating, from the image item, one or more video item comprising a respective one or more sequences of image frames. Each image frame comprises a respective portion of the image item selected using the mask. For each image sequence, the mask is translated incrementally over the image item to select the respective portions of the image item for successive image frames in the sequence. The method further comprises performing a machine learning task by processing the one or more video items using a machine learning model.
Methods and systems for one or more computers, in which a method includes maintaining software tool use data that includes a software tool selection embedding and a respective software tool embedding for each software tool in a set of software tools. The method includes receiving a query input, generating a software tool selection input sequence, processing the software tool selection input sequence to generate a software tool selection output that identifies a particular software tool, and generating a software tool call input sequence that includes the respective software tool embedding for the particular software tool and an embedded characterization of the query input.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for predicting weather using diffusion neural networks.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for predicting a 3D structure of a molecule complex In one aspect, there is provided a method comprising: obtaining a network input that characterizes a molecule complex; processing the network input characterizing the molecule complex using an embedding neural network to generate molecule embedding data; and generating, using a generative model and while the generative model is conditioned on the molecule embedding data, a predicted three-dimensional (3D) structure of the molecule complex that defines a respective predicted 3D spatial location of each atom in the molecule complex.
G16B 15/00 - ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
Implementations are provided for an intermediary machine learning model that enables conditioning between different pretrained machine learning models to perform non-native task(s). In various implementations, a first set of token(s) may be applied as inputs across initial layer(s) of a first pretrained machine learning model to generate a first set of raw activations. A second set of token(s) may be applied as inputs across initial layer(s) of a second pretrained machine learning model to generate a second set of raw activations. The raw activations may be processed using the intermediary machine learning model to generate first and second sets of steered activations. The first set of steered activations may be applied across subsequent layer(s) of the first pretrained machine learning model to generate first steered output(s). The second set of steered activations may be applied across subsequent layer(s) of the second pretrained machine learning model to generate second steered output(s).
Methods and systems for processing sequences using spectral state space models One of the methods includes, for successive time steps: processing an initial item embedding using an analysis network comprising processing layers arranged in a sequence, a first processing layer being configured to receive the initial item embedding, and to output a modified item embedding, and each other processing layer being configured to receive the item embedding output by the preceding layer and output a modified item embedding; wherein at least one of the processing layers is a spectral transform layer which, for each time step: generates a plurality of feature vectors by processing a sequence embedding using a plurality of spectral filters; multiplies the feature vectors by weight matrices, to form respective weighted feature vectors; and generates the modified item embedding for the time step including a term based on the weighted feature vectors.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing input sequences using a neural network that uses a partial position encoding scheme The neural network generally includes both global and local attention layers. In the partial position encoding scheme, while the local attention layers do use position encoding, (i) a subset of the global attention layers can apply an attention mechanism that does not use position encoding, or (ii) the subset of global attention layers can apply an attention mechanism that does not apply position encoding to one or more of the dimensions of the input to the attention mechanism.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating computer code using neural networks. One of the methods includes receiving description data describing a computer programming task; generating a plurality of candidate computer programs by sampling a plurality of output sequences from a set of one or more generative neural networks; clustering the plurality of candidate computer programs to generate a plurality of clusters; for each cluster in a set of one or more of the clusters: processing each of the respective plurality of candidate computer programs in the cluster using a correctness estimation neural network to generate a correctness score for the candidate computer program that estimates a likelihood that the candidate computer program accurately performs the computer programming task; and selecting a representative computer program for the cluster using the correctness scores for the respective plurality of candidate computer programs in the cluster; and selecting one or more of the representative computer programs for the clusters as synthesized computer programs for performing the computer programming task.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for partitioning the training of a neural network across multiple devices In particular, the training is partitioned using a schedule that includes multiple partitioning tactics.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a photomosaic of an input image or an input video using a text and image conditioned generative neural network. One of the methods includes receiving an input including a source image and, for each of a plurality of patches of the source image, one or more respective text descriptions; generating a photomosaic of the source image, where the photomosaic of the source image is an image that replaces each of the plurality of patches of the source image with a respective tile image that has one or more properties that are similar to one or more corresponding properties of the patch, and where the generating includes processing an image input includes the source image and the respective text descriptions for the plurality of patches using a text and image conditioned image generation neural network to generate the photomosaic.
Implementations are provided for leveraging video conference tools to streamline machine learning model training relating to robot (and non-robot) planning and/or control, and/or for subsequent robot management. In various implementations, video conference client(s) of a video conference session may render output that includes sensor feed(s) capturing an environment in which a robot operates. A robotic planner process may be communicatively coupled to the video conference session. A natural language (NL) request for the robot to perform a high-level task may be received. The robotic planner process may process the NL request using a first machine learning model to generate NL responses, each expressing a mid- level action to be performed by the robot to carry out a respective portion of the high-level task. In some implementations, the NL responses may be processed to generate robot control data that may be used to operate a robot.
A neural network system is trained to generate textual reports (that is, medical reports, such as radiology reports) from one or more medical images, by fine-tuning a pre-trained neural network system (a visual language model, "VLM") operative, upon receiving an input comprising at least one image and a textual input, to generate a value indicative of a predicted likelihood of one or more candidate text continuations of the textual input. The fine-tuning of the neural network system is performed to reduce the value of a cost function which includes a first prediction cost term based on a first training database including first training datasets of at least one medical image and an associated text report, the first training datasets corresponding to first individuals. The first prediction cost term further includes a cost value for each individual, inversely dependent on a likelihood value of the associated textual report, conditioned on the at least one medical image, and created by the neural network system.
G16H 10/60 - ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating audio and, optionally, a corresponding image using generative neural networks. For example, a spectrogram of the audio can be generated using a hierarchy of diffusion neural networks.
G10H 1/00 - Details of electrophonic musical instruments
G10H 7/10 - Instruments in which the tones are synthesised from a data store, e.g. computer organs by calculating functions or polynomial approximations to evaluate amplitudes at successive sample points of a tone waveform using coefficients or parameters stored in a memory, e.g. Fourier coefficients
G10H 7/12 - Instruments in which the tones are synthesised from a data store, e.g. computer organs by calculating functions or polynomial approximations to evaluate amplitudes at successive sample points of a tone waveform by means of a recursive algorithm using one or more sets of parameters stored in a memory and the calculated amplitudes of one or more preceding sample points
Systems, methods, and program code for training an encoder neural network and de-noising decoder neural network for generating an output data item such as an image or audio Training source and target data items are obtained, representing views of an object or scene, and used to train the encoder neural network and the de-noising decoder neural network. The trained encoder neural network generates representations usable for many downstream tasks. The trained encoder neural network and de-noising decoder neural network can be used together to generate new views of objects or scenes, such as a new 3D view, given just one or a few source views.
Systems and methods for using a distributed computing system to train a large neural network to perform a machine learning task. A shared set of trainable parameters is maintained in a shared data store, and each of a geographically distributed set of workers updates their trainable parameters using a shard training dataset. There are two optimization processes: an outer optimization process, and an inner optimization loop that is executed by each worker independently and in parallel tens, hundreds, or thousands of times. The workers can have different computing capabilities and can be geographically distant from one another, and the communications bandwidth used by the system can be two or three orders of magnitude less than that of other systems.
A caption for a media data item, such as an image or video, is chosen from plurality of candidate captions. The choice is made using both a respective posterior probability of the media data item given the candidate caption, and a respective prior probability for the candidate caption.
Methods, systems, and apparatuses, including computer programs encoded on computer storage media, for generating training data for training a vision language model. In particular, generating training data such that, once a vision language model has been trained on the training data, the vision language model ("vision language neural network") can accurately encode information about spatial properties of objects depicted in images that are provided as input to the vision language model. Because of how the described techniques generate training data, trained vision language models using the generated training data demonstrate significant improvements in performance on visual question and answering tasks compared to conventionally trained vision language models.
Systems and methods, implemented as computer programs on one or more computers in one or more locations, for generating a sequence of data elements using a neural network comprising a sequence of attention neural network layers. The sequence comprises a respective continuous valued data element at each position in a sequence of positions. Implementations of the described techniques remove the need for discrete tokens and fixed, finite vocabularies.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing a video that includes a plurality of video segments using neural networks to generate, for each video segment, an output that characterizes the video segment The neural networks include a video encoder neural network, a decoder neural network, and, optionally, in some implementations, a dimensionality reduction neural network and an autoregressive transformer neural network. Optionally, the neural networks have access to a retrieval dataset that stores a plurality of embeddings.
A machine learning model is trained using a subset of training examples from a store of training data. Rather than randomly selecting the subset, the training examples in the subset are selected based on a score obtained using an online model. The online model is also trained using the subset of training examples, before performing another selection. As such, each successive selection of a subset of the training data, which are then provided to the machine learning model for further training, contains training examples which are better suited for efficiently training the machine learning model.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural network. In one aspect, a method includes obtaining a temporal sequence of video frames and training the neural network on one or more superimposed initial video frame or one or more masked initial video frame that are generated based on video frames included in the temporal sequence. The temporal sequence includes one or more initial video frames at one or more initial time steps followed by one or more subsequent video frames at one or more subsequent time steps.
An image generation model generates a visual data item by auto-regressive process in which a set of initial data tokens is iteratively refined in multiple steps based on corresponding textual prompt items The visual data item is based on an output vector generated in one or more of the steps of the auto-regressive process. The textual prompt items are modified during the auto-regressive process by referencing additional ones of a set of concept definition datasets which define content to be included in the visual data item, so that the visual data item depicts content defined by the referenced concept definition datasets.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium for encoding input data comprising input data values corresponding to respective input data grid points of an input data grid, such as image, video or audio data.
A computer-implemented method of training a neural network system comprising a visual encoder neural network and a text encoder neural network is provided The method comprises obtaining a plurality of training data items (each training data item comprising an image and associated text defining a sequence of text tokens) and at each of a plurality of training steps processing at least one of the training data items by: processing pixels of the image in the training data item using the visual encoder neural network to generate a set of patch embeddings for the image; processing the sequence of text tokens using the text encoder neural network to generate a sequence of token embeddings, processing the set of patch embeddings and the sequence of token embeddings to generate a set of language-aware patch embeddings (based on similarities between patch embeddings and token embeddings), and training at least the visual encoder neural network by backpropagating gradients of a contrastive objective function evaluated over the language-aware patch embeddings and the sequence of token embeddings.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a multimodal machine learning model to perform a machine learning task. The multimodal machine learning model is trained on noisy batches of training data from one or more noisy training datasets, and on task-specific batches of training data from one or more further, task-specific training datasets. The multimodal machine learning model is disproportionately trained on noisy batches of training data, for example by training the model using a proportion of noisy training batches in the training data that is greater than a proportion of a number of the noisy training datasets in a total number of training datasets.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining a size of a training data set for training a machine learning model In one aspect, the size is determined using a shrinking estimate that estimates how much training data is needed to train a smaller machine learning model to achieve the same performance as a larger machine learning model.
Methods, systems, and apparatuses, including computer programs encoded on computer storage media, for determining a final computer program for performing a task by searching through a space of candidate computer programs That is, by starting with an initial computer program and using an evolutionary search procedure that uses a pre-trained language model to generate new candidate computer programs in conjunction with an evaluation function to verify the quality of the new candidate computer programs, the resulting final computer program can be determined in an automatic fashion and can perform the task (often using novel steps and processes) more effectively than the initial computer program for the task.
A method is proposed for generating a multi-modal language model (MMLM) neural network trained to perform a multi-modal task on an input data element comprising at least one media input (an image or sound signal) to generate a token output which is a text response to the input data element. The method employs a decoder network trained to use the token output to generate reconstructed media tokens. Repeated modifications are made to the MMLM to reduce a discrepancy between the reconstructed media tokens and media tokens generated from the media input(s) by a media encoder neural network.
A method is proposed for generating a visual language model (VLM) neural network trained to perform a multi-modal task on an input dataset comprising an input image to generate a token output which is a text response to the input dataset. The VLM is trained using a training database comprising tuples of sample input datasets and corresponding sample token outputs. The sample input dataset of some of the tuples comprises an image generated from a text description by a text-to-image model, and the corresponding sample token output comprises at least part of the text description.
Methods, systems, and apparatuses, including computer programs encoded on computer storage media, for generating an imputation for a multi-agent interaction. Generating an imputation can include generating the imputation for the least core imputation problem using multiple levels of parallelism that solve a Lagrangian formulation of the least core imputation problem. That is, intra-hardware device parallelism, i.e., parallel processing hardware accelerators (e.g., multiple arithmetic logic units, multiple central processors, graphics processing units, tensor processing units, and other application-specific integrated circuits), and inter-hardware device parallelism, e.g., using more than one hardware device, can both be leveraged to achieve solutions to the least core imputation problem in times that are orders of magnitude faster than using conventional linear programing techniques.
G06F 18/2115 - Selection of the most significant subset of features by evaluating different subsets according to an optimisation criterion, e.g. class separability, forward selection or backward elimination
Systems, methods, and computer program code for controlling a robot that is interacting with an environment to perform a particular task The technique involves generating a 2D trajectory image representing a 2D trajectory sketch. The 2D trajectory sketch indicates a desired trajectory for a part of the robot, e.g. an end effector, when performing the task. A neural network system uses the 2D trajectory image as deliberately underspecified guidance for how to perform the task. Techniques for training the neural network system are also described.
G05B 19/42 - Recording and playback systems, i.e. in which the programme is recorded from a cycle of operations, e.g. the cycle of operations being manually controlled, after which this record is played back on the same machine
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for using a Transformer-based neural network to generate output sequences. To generate the output sequences, the Transformer-based neural network is configured to perform quantized inference.
G06F 18/2413 - Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
A system that uses a graph neural network to determine a representation of a physical environment at a new time step The new time step can be before or after a current time step based on representations of the physical environment at the current time step and one or more other time steps, e.g. one or more time steps before and/or after the current time step. The representation of the physical environment at the new time step may, for example, be used to generate an image of the physical environment at the new time step. The system can be used for controlling a robot interacting with the physical environment. Some examples of the techniques are specifically adapted for implementation using hardware accelerator units.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a diffusion neural network using a differentiable reward function.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training an object detection model In particular, a system performs detection-oriented pre-training of the object detection model by pre-training at least a set of detection heads that output level-specific detection embeddings on image-text pairs.
G06V 10/25 - Determination of region of interest [ROI] or a volume of interest [VOI]
G06V 10/52 - Scale-space analysis, e.g. wavelet analysis
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
A computer-implemented method is provided for processing a training database for training a neural network to perform a computational task, the training database comprising training items, to obtain a weight value for each training item. The method comprises: for each of one or more attributes, determining a corresponding item attribute vector for each training item which is a vector indicative of a likelihood of the training item exhibiting the attribute; and for each training item determining a corresponding weight value by: defining a loss function of the weight values and the item attribute vectors; and updating the weight values to reduce the loss function. A corresponding computer system and computer program product are also provided.
G06F 18/214 - Generating training patternsBootstrap methods, e.g. bagging or boosting
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
G06V 10/772 - Determining representative reference patterns, e.g. averaging or distorting patternsGenerating dictionaries
G06V 10/774 - Generating sets of training patternsBootstrap methods, e.g. bagging or boosting
G06V 10/80 - Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
G06V 20/40 - ScenesScene-specific elements in video content
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining a measure of bias for a trained neural network One of the methods includes obtaining a plurality of initial network inputs, wherein each of the initial network inputs has been classified as belonging to a respective ground truth class, and wherein each of the initial network inputs is associated with a corresponding feature value of a particular feature; processing each of the plurality of initial network inputs using a trained target neural network to generate a respective predicted network output for each initial network input; determining, for each class of the respective ground truth classes for the initial network inputs, a respective effect size; and determining a measure of bias of the trained target neural network with respect to the particular feature by aggregating the respective effect sizes over each of the respective ground truth classes.
A computer-implemented method of obtaining, for a multi-participant interaction is provided, in which each of a plurality of participants is able to perform one of a respective set of actions and each participant receives a reward defined by a respective reward function of the corresponding actions performed by the plurality of participants, corresponding action embeddings for one or more of the actions which one or more of the participants can perform.
G06N 3/006 - Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for calibrating correlated noise specific to a machine learning task In one aspect, there is provided a method for, at each of a plurality of training iterations, receiving a set of one or more inputs and generating a respective output for each input, determining a gradient with respect to the network parameters of an objective function for the machine learning task, calibrating a noise correlation matrix specific to the machine learning task and an index of the training iteration by generating a correlation weight matrix parameterized as a function of a tuneable noise correlation parameter and the index of the training iteration, generating a correlated-noise gradient using the calibrated noise correlation matrix, and using the correlated-noise gradient to update the values of the plurality of network parameters.
Systems, methods, and computer program code for generating a task-prompt for inclusion in a model input for controlling a model to perform a task. An evolutionary process is used to evolve prompts that increasingly improve the extraction of knowledge from a machine learned model. Some machine learned model, such as Large Language Models, can consume significant computing resources, and implementations of the described techniques are configured to use parallel processing in a way that facilitates makes efficient use of these resources.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a high-level controller neural network for controlling an agent In particular, the high-level controller neural network generates natural language commands that can be provided as input to a low-level controller neural network, which generates control outputs that can be used to control the agent.
G06N 3/006 - Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
G06N 3/0442 - Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
Methods, systems, and apparatuses, including computer programs encoded on computer storage media, for receiving a data item of a first modality that is not text and generating a compressed data item Methods, systems, and apparatuses, including computer programs encoded on computer storage media, for receiving a compressed data item and generating a lossless reconstruction of the data item of a first modality that is not text. Compressing data items and decompressing compressed data items both include the use of language model neural networks to uncover the complex structure within data items. For compressing data items, the use of the language model neural network allows compression systems to achieve better compression rates than traditional methods across a range of data modalities that are not text.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling a robot using language model programs. A language model program is a computer program generated from an output of a code generation neural network, e.g., one that has been trained on a language modeling objective on computer code data.
Systems and methods for controlling agents using tracked points in images For example, controlling a mechanical agent that is interacting in a real-world environment by selecting action to be performed by the agent to perform instances of a task using images captured while the agent performs the instance of the task.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a data item using a diffusion neural network or other generative neural network.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a data item using a diffusion neural network or other generative neural network.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a data item using a diffusion neural network or other generative neural network.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a data item using a diffusion neural network or other generative neural network.
Systems and methods for sub-additive action planning using multiple action selection policies. For example, sub-additive action planning can be performed using statistics of different tree searches that are guided by different action selection policies. As another example, multiple different action selection policies can be learned using intrinsic rewards that encourage diversity.
G06N 3/006 - Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a generative neural network. One of the methods includes training a generative neural network by performing a sequence of a plurality training stages each generating an expanded training data set. The method also involves performing a sequence of improve steps, each comprising training the generative neural network on the training examples in a corresponding subset of the expanded training data set.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for conformal prediction using ambiguous calibration examples. In particular, how to perform conformal prediction using calibration examples that include plausibility distributions is described.
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
G06N 3/042 - Knowledge-based neural networksLogical representations of neural networks
Methods and systems for controlling agents, e.g., robots, using tokenized goal images. One of the methods includes receiving a goal image; tokenizing the goal image to generate a plurality of visual tokens; at each of a plurality of time steps: obtaining one or more observation images characterizing a state of the environment at the time step; tokenizing each of the one or more observation images; generating a sequence of input tokens that comprises the plurality of visual tokens that represent the goal image and the plurality of visual tokens that represent the one or more observation images; processing the sequence of input tokens to generate an output sequence of output tokens from the discrete vocabulary of tokens that represents an action to be performed by the agent in response to the observation images; and causing the agent to perform the selected action.
Systems and methods for designing a logic circuit For example, the logic circuit can be designed by training a circuit neural network that represents the circuit.
G06F 30/27 - Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for designing proteins. In one aspect, a method comprises: generating noisy molecular structure data sampled from a noise distribution that defines, for each position in an amino acid sequence of the protein, a corresponding initial spatial position for each atom in a predefined set of possible atoms; and processing the noisy molecular structure data using a diffusion model that comprises a denoising neural network to generate denoised molecular structure data that defines a denoised version of the noisy molecular structure data.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for obtaining one or more final policies for controlling an agent in an environment. In one aspect, one of the methods include: obtaining a candidate policy set that includes a plurality of candidate policies for controlling an agent in an environment; obtaining an offline dataset that stores a plurality of history trajectories, wherein each history trajectory comprises a plurality of history observations that each characterize a respective history state of the environment; and generating a behavioral representation for each candidate policy.
G06N 3/006 - Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
Computer-implemented methods, systems, and software for image processing. A particular image processing task to be performed is defined by a set of task examples that demonstrate the task. A memory stores keys and values based on the task examples, and a task image on which the particular image processing task is to be performed is processed using an image encoder to obtain a task image feature vector for each of a plurality of spatial locations in the task image. The task image feature vectors for the spatial locations are used to obtain query vectors that are applied to the memory using a query-key-value attention mechanism, to obtain predicted local label values that, in turn, provide a result for the image processing task.
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Systems and methods, implemented as computer programs on one or more computers in one or more locations, for learning to control an agent to perform a task. The method involves training a policy neural network on demonstration actions that perform the task to obtain an initial, cloned action selection policy, determining a shaped reward using the cloned policy, then using the shaped reward to fine tune the policy neural network. The system can transition smoothly between learning to copy actions of a task demonstrated by an agent such as a human expert, and refining the learned actions. The system can also learn to recover gracefully when outside a distribution of actions of the demonstrating agent.
G06N 3/006 - Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents using sequence-processing neural networks. In particular, the sequence-processing neural network is used as a dynamics model of the environment in order to perform planning when selecting actions to be performed by an agent.
G06N 3/006 - Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for verifying the provenance of a digital object generated by a neural network, such as an image or audio object. Also methods, systems, and apparatus, including computer programs, for training a watermarking neural network and a watermark decoding neural network. The described techniques make efficient use of computing resources and are robust to attack.
G06F 16/907 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
G06F 21/16 - Program or content traceability, e.g. by watermarking
G06N 3/084 - Backpropagation, e.g. using gradient descent
G10L 19/018 - Audio watermarking, i.e. embedding inaudible data in the audio signal
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
69.
MULTI-STAGE WATERMARKING OF A DIGITAL OBJECT GENERATED BY A MACHINE LEARNING MODEL
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for watermarking a digital object generated by a machine learning model. The digital object is defined by a sequence of tokens. The watermarking involves modifying a probability distribution of the tokens by applying a succession of watermarking stages.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for applying classifiers to messages between a user and a machine learning model.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for optimizing a memory allocation of a target program using a state representation neural network.
An adaptive system includes an adaptive unit and a previously trained language model. A data input item to the adaptive system includes a language input for processing by the language model, and a second input a different modality. The second input is processed, such as by a second trained neural network, to generate an input for the adaptive unit. A training database of training examples, each including a data input and a desired outputs of the adaptive system, is used to generate a second training database of additional training examples including an output of a variant of the language model upon receiving the language input, and the second training database is used to train the adaptive unit of the adaptive system. The adaptive unit includes a number of gates which, during an initial part of the training of the adaptive system, cause data to flow though the adaptive unit, so as to preserve alignment of the output of the adaptive unit with the language model.
G06N 3/0442 - Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a predictive machine learning model, wherein the predictive machine learning model is configured to process a model input that comprises an image to generate a predicted image label characterizing the image. In one aspect, a method comprises: obtaining a set of real training examples; generating a set of synthetic training examples for training the predictive machine learning model, comprising: determining an image sampling policy for generating synthetic images based on a distribution of the set of real training examples; generating a plurality of synthetic images, using a generative machine learning model, in accordance with the image sampling policy; and generating a respective synthetic training example based on each of the synthetic images; and training the predictive machine learning model using the set of real training examples and the set of synthetic training examples.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents using a language model neural network and a vision-language model (VLM) neural network.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for differentially private diffusion neural network fine-tuning. In one aspect, a method includes, while training the neural network on a set of fine-tuning data items, for each fine-tuning data item in the set: sampling a set of one or more time steps by sampling from a time step distribution over time steps between a lower bound and an upper bound of the time step distribution, wherein the time step distribution is a non-uniform distribution over the time steps between the lower bound and the upper bound.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing sequence data using neural networks with linear recurrence and feedforward units. In one aspect, a system for performing a machine learning task on a network input to generate a network output is provided. The system includes one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to implement a neural network configured to perform the machine learning task. The neural network includes a number of layer blocks each including: (i) a linear recurrent layer, and (ii) one or more feedforward layers. Each layer block is configured to perform operations including: receiving an input sequence for the layer block; and generating an output sequence for the layer block.
A system and method of training neural networks that can be used to mitigate loss of plasticity is provided. The method comprises: obtaining an original neural network including an output subnetwork, training the original neural network during a first training phase and then creating first and second versions of the output subnetwork. An updated neural network is formed comprising the original neural network and the first and second versions of the output subnetwork. Trainable parameters of the first and second versions of the output subnetwork are initialized and the trainable parameters of the second version of the output subnetwork are then frozen after the initialization. The updated neural network is then trained during a second training phase by processing second training data items using the updated neural network and using the resulting updated network outputs to update the trainable parameters of the updated neural network that are not frozen.
G06N 3/006 - Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
79.
LEARNING REINFORCEMENT LEARNING POLICIES WITH LOCAL PLANNING
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a policy neural network used to select actions performed by an agent interacting with an environment by performing actions that cause the environment to transition states. In one aspect, one of the methods include: maintaining a history buffer; executing a local planning process using the history buffer to determine an initial observation of the environment; executing a data collection process that begins from the determined initial observation of the environment; and training, using an online reinforcement learning technique, the policy neural network on the history buffer to update the current parameters values of the policy neural network.
G06N 3/006 - Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
Systems and methods for processing inputs using neural networks with intention layers. Each intention layer includes one or more intention sub-layers that are each configured to: obtain a query matrix, a key matrix, and a value matrix for the intention sub- layer, wherein at least one of the query matrix, the key matrix, and the value matrix are derived from the layer input to the intention layer; determine a key covariance matrix that estimates a covariance of the key matrix; determine an inverse matrix that represents an inverse of the key covariance matrix; and determine a sub-layer output for the intention sub- layer from the inverse matrix, the query matrix, and the value matrix.
G06N 3/008 - Artificial life, i.e. computing arrangements simulating life based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour
G06N 3/0442 - Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
A computer-implemented method that comprises obtaining an input sequence of network inputs, processing each network input in the input sequence using a recurrent neural network to generate a sequence of recurrent outputs that includes a respective recurrent output for each network input in the input sequence, generating a sub-sampled sequence that includes a proper subset of the respective recurrent outputs, and processing the sub-sampled sequence using a self-attention neural network to generate a network output for the input sequence. The self-attention neural network comprises a self-attention subnetwork configured to apply self-attention over the sub-sampled sequence to generate a respective updated output for each recurrent output in the sub-sampled sequence and an output neural network configured to process one or more of the updated outputs to generate the network output for the input sequence.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a policy neural network used to select an action to be performed by an agent interacting with an environment. In one aspect, a method includes: receiving a latent representation that characterizes a current state of the environment; generating an imagination trajectory of latent representations; for each latent representation in the imagination trajectory: determining a predicted reward; and generating a predicted state value; determining a target state value for each latent representation; determining an update to the current values of the policy network parameters; applying a symmetric logarithmic transformation to each target state value; encoding each transformed target state value to generate an encoded transformed target state value; and determining an update to the current values of the value network parameters by optimizing a critic objective function.
G06N 3/008 - Artificial life, i.e. computing arrangements simulating life based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour
G06N 3/0442 - Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
Systems and methods that can be used for controlling an agent to act in an environment to perform a task, for training or evaluating an action selection system used by such an agent, and for obtaining training data for training such an action selection system. The task can be one of a plurality of different tasks that the agent can be trained, or instructed, to perform. The systems and methods use a multi-modal language model that jointly processes language and data of another modality, such as visual data or sound data. The multi-modal language model is used as a "success detector", to determine whether or not a task performed by the agent has been achieved.
G06N 3/006 - Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for predicting weather using graph neural networks. One of the methods includes obtaining current weather data for a current time step; generating, using the current weather data, current graph data representing a current state of a graph of the surface; and processing the current graph data using a graph neural network to generate a first prediction output that defines first future weather data.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an output sequence of discrete tokens using a diffusion model. In one aspect, a method includes initializing the output sequence by assigning a respective embedding to each of the plurality of output positions; repeatedly performing the following at each of multiple reverse diffusion steps: a current continuous representation of the output sequence; processing a diffusion model input that comprises the current continuous representation using the diffusion model to generate a diffusion model output; processing the respective initial scores using a softmax function to generate, for each of the plurality of output positions, a probability distribution over the plurality of embeddings in the vocabulary of embeddings; and updating the continuous representation of the output sequence using the probability distributions and the vocabulary of embeddings.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents. In particular, an interactive agent can be controlled by a neural network trained with reward values using reinforcement learning.
G06N 3/0442 - Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
Methods and systems for generating positional encodings for nodes in a directed graph. In particular, the positional encodings are directionally-aware and are used to update node features for the nodes of the directed graph before the node feature are processed by a neural network to generate a task prediction for the directed graph.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a collection of policy neural networks to select actions to be performed by an agent interacting with an environment to accomplish a task. In one aspect, a method comprises training the collection of policy neural networks by, for each episode of a plurality of episodes: designating, from the collection of policy networks (i) a target network and (ii) differentiated policy neural networks; controlling the agent using the target network; receiving task rewards that define a metric of performance on the task by the agent as controlled by the target network; training the target network using the task rewards; and training each differentiated network using modified rewards that encourage an increase in a measure of differentiation between the differentiated network and the target network.
G06N 3/006 - Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
G05B 13/02 - Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction characterizing an environment. In one aspect, a method includes obtaining a respective observation characterizing a state of an environment for each time step in a sequence of multiple time steps, comprising, for each time step after a first time step in the sequence of time steps: processing a network input that comprises observations obtained for one or more preceding time steps to generate a plurality of acquisition decisions; obtaining an observation for the time step, wherein the observation includes data corresponding to modalities that are selected for acquisition at the time step, does not include data corresponding to modalities that are not selected for acquisition at the time step; and processing a model input that includes the observation for each time step in the sequence of time steps to generate the prediction.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a pathogenicity score characterizing a likelihood that a mutation to a protein is a pathogenic mutation, wherein the mutation modifies an amino acid sequence of the protein by replacing an original amino acid by a substitute amino acid at a mutation position in the amino acid sequence of the protein. In one aspect, a method comprises: generating a network input to a pathogenicity prediction neural network, wherein the network input comprises a multiple sequence alignment (MSA) representation that represents an MSA for the protein; processing the network input using the pathogenicity prediction neural network to generate a score distribution over a set of amino acids; and generating the pathogenicity score using the score distribution over the set of amino acids.
A method performed by one or more computers for obtaining an optimized algorithm that (i) is functionally equivalent to a target algorithm and (ii) optimizes one or more target properties when executed on a target set of one or more hardware devices. The method includes: initializing a target tensor representing the target algorithm; generating, using a neural network having a plurality of network parameters, a tensor decomposition of the target tensor that parametrizes a candidate algorithm; generating target property values for each of the target properties when executing the candidate algorithm on the target set of hardware devices; determining a benchmarking score for the tensor decomposition based on the target property values of the candidate algorithm; generating a training example from the tensor decomposition and the benchmarking score; and storing, in a training data store, the training example for use in updating the network parameters of the neural network.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents using reporter neural networks.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for using simulation-based inference to inferring a set of parameters such as measurements, from observations, e.g. real world observations. The method uses a score generation neural network to determine scores for individual observations or for groups of observations that are combined and used to iteratively adjust values of the parameters.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for simulating a state of an environment over a sequence of time steps. In one aspect, a method comprises, at each of one or more time steps: obtaining an environment mesh representing the state of the environment at the time step; generating a graph representing the state of the environment at the time step, comprising: determining that a first face of a first object mesh is within a collision distance of a second face of a second object mesh; and in response, instantiating a face-face edge in the graph that connects: (i) a first set of graph nodes in the graph that represent the first face in the first object mesh, and (ii) a second set of graph nodes in the graph that represent the second face in the second object mesh.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling an agent that is interacting with an environment. Implementations of the system use previously learned skills to explore states of the environment to collect and store training data, which is then used to train an action selection system. The system includes a set of skill action selection subsystems, each configured to select actions for the agent to perform for a respective skill. The set of skill action selection subsystems is used to explore states of the environment to collect the training data, keeping their individual action selection policies unchanged. A scheduler neural network selects the skill neural networks to use. The action selection system is trained on the stored training data.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network used to select actions to be performed by an agent interacting with an environment. Implementations of the described techniques can learn to explore the environment efficiently by storing and updating state embedding cluster centers based on observations characterizing states of the environment.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for controlling agents. In particular, an agent can be controlled using an action selection neural network that performs in-context reinforcement learning when controlling an agent on a new task.
G06N 3/006 - Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
G06N 3/0442 - Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]