In a system with control logic and a processing element array, two modes of operation may be provided. In the first mode of operation, the control logic may configure the system to perform matrix multiplication or 1×1 convolution. In the second mode of operation, the control logic may configure the system to perform 3×3 convolution. The processing element array may include an array of processing elements. Each of the processing elements may be configured to compute the dot product of two vectors in a single clock cycle, and further may accumulate the dot products that are sequentially computed over time.
G06F 7/544 - Méthodes ou dispositions pour effectuer des calculs en utilisant exclusivement une représentation numérique codée, p.ex. en utilisant une représentation binaire, ternaire, décimale utilisant des dispositifs non spécifiés pour l'évaluation de fonctions par calcul
2.
Multiply accumulate (MAC) unit with split accumulator
In a multiply accumulate (MAC) unit, an accumulator may be implemented in two or more stages. For example, a first accumulator may accumulate products from the multiplier of the MAC unit, and a second accumulator may periodically accumulate the running total of the first accumulator. Each time the first accumulator's running total is accumulated by the second accumulator, the first accumulator may be initialized to begin a new accumulation period. In one embodiment, the number of values accumulated by the first accumulator within an accumulation period may be a user-adjustable parameter. In one embodiment, the bit width of the input of the second accumulator may be greater than the bit width of the output of the first accumulator. In another embodiment, an adder may be shared between the first and second accumulators, and a multiplexor may switch the accumulation operations between the first and second accumulators.
G06F 7/544 - Méthodes ou dispositions pour effectuer des calculs en utilisant exclusivement une représentation numérique codée, p.ex. en utilisant une représentation binaire, ternaire, décimale utilisant des dispositifs non spécifiés pour l'évaluation de fonctions par calcul
G06F 7/509 - Addition; Soustraction en mode parallèle binaire, c. à d. ayant un circuit de maniement de chiffre différent pour chaque position pour opérandes multiples, p.ex. intégrateurs numériques
3.
Multiply accumulate (MAC) unit with split accumulator
In a multiply accumulate (MAC) unit, an accumulator may be implemented in two or more stages. For example, a first accumulator may accumulate products from the multiplier of the MAC unit, and a second accumulator may periodically accumulate the running total of the first accumulator. Each time the first accumulator's running total is accumulated by the second accumulator, the first accumulator may be initialized to begin a new accumulation period. In one embodiment, the number of values accumulated by the first accumulator within an accumulation period may be a user-adjustable parameter. In one embodiment, the bit width of the input of the second accumulator may be greater than the bit width of the output of the first accumulator. In another embodiment, an adder may be shared between the first and second accumulators, and a multiplexor may switch the accumulation operations between the first and second accumulators.
In a system with control logic and a processing element array, two modes of operation may be provided. In the first mode of operation, the control logic may configure the system to perform matrix multiplication or 1×1 convolution. In the second mode of operation, the control logic may configure the system to perform 3×3 convolution. The processing element array may include an array of processing elements. Each of the processing elements may be configured to compute the dot product of two vectors in a single clock cycle, and further may accumulate the dot products that are sequentially computed over time.
G06F 15/80 - Architectures de calculateurs universels à programmes enregistrés comprenant un ensemble d'unités de traitement à commande commune, p.ex. plusieurs processeurs de données à instruction unique
In a system with control logic and a processing element array, two modes of operation may be provided. In the first mode of operation, the control logic may configure the system to perform matrix multiplication or 1×1 convolution. In the second mode of operation, the control logic may configure the system to perform 3×3 convolution. The processing element array may include an array of processing elements. Each of the processing elements may be configured to compute the dot product of two vectors in a single clock cycle, and further may accumulate the dot products that are sequentially computed over time.
G06F 7/544 - Méthodes ou dispositions pour effectuer des calculs en utilisant exclusivement une représentation numérique codée, p.ex. en utilisant une représentation binaire, ternaire, décimale utilisant des dispositifs non spécifiés pour l'évaluation de fonctions par calcul
6.
Low power hardware architecture for a convolutional neural network
Dynamic data quantization may be applied to minimize the power consumption of a system that implements a convolutional neural network (CNN). Under such a quantization scheme, a quantized representation of a 3×3 array of m-bit activation values may include 9 n-bit mantissa values and one exponent shared between the n-bit mantissa values (n
G06F 7/544 - Méthodes ou dispositions pour effectuer des calculs en utilisant exclusivement une représentation numérique codée, p.ex. en utilisant une représentation binaire, ternaire, décimale utilisant des dispositifs non spécifiés pour l'évaluation de fonctions par calcul
G06N 3/06 - Réalisation physique, c. à d. mise en œuvre matérielle de réseaux neuronaux, de neurones ou de parties de neurone
G06N 3/063 - Réalisation physique, c. à d. mise en œuvre matérielle de réseaux neuronaux, de neurones ou de parties de neurone utilisant des moyens électroniques
7.
METHODS AND SYSTEMS FOR PROCESSING READ-MODIFY-WRITE REQUESTS
A memory system comprises a plurality of memory sub-systems, each with a memory bank and other circuit components. For each of the memory sub-systems, a first buffer receives and stores a read-modify -write request (with a read address, a write address and a first operand), a second operand is read from the memory bank at the location specified by the read address, a combiner circuit combines the first operand with the second operand, an activation circuit transforms the output of the combiner circuit, and the output of the activation circuit is stored in the memory bank at the location specified by the write address. The first operand and the write address may be stored in a second buffer while the second operand is read from the memory bank. Further, the output of the activation circuit may be first stored in the first buffer before being stored in the memory bank.
G06N 3/063 - Réalisation physique, c. à d. mise en œuvre matérielle de réseaux neuronaux, de neurones ou de parties de neurone utilisant des moyens électroniques
G06F 3/06 - Entrée numérique à partir de, ou sortie numérique vers des supports d'enregistrement
8.
METHODS AND SYSTEMS FOR PROCESSING READ-MODIFY-WRITE REQUESTS
A memory system comprises a plurality of memory sub-systems, each with a memory bank and other circuit components. For each of the memory sub-systems, a first buffer receives and stores a read-modify-write request (with a read address, a write address and a first operand), a second operand is read from the memory bank at the location specified by the read address, a combiner circuit combines the first operand with the second operand, an activation circuit transforms the output of the combiner circuit, and the output of the activation circuit is stored in the memory bank at the location specified by the write address. The first operand and the write address may be stored in a second buffer while the second operand is read from the memory bank. Further, the output of the activation circuit may be first stored in the first buffer before being stored in the memory bank.
G11C 11/4093 - Dispositions d'interface d'entrée/sortie [E/S, I/O] de données, p.ex. mémoires tampon de données
G11C 11/4096 - Circuits de commande ou de gestion d'entrée/sortie [E/S, I/O] de données, p.ex. circuits pour la lecture ou l'écriture, circuits d'attaque d'entrée/sortie ou commutateurs de lignes de bits
H03K 19/173 - Circuits logiques, c. à d. ayant au moins deux entrées agissant sur une sortie; Circuits d'inversion utilisant des éléments spécifiés utilisant des circuits logiques élémentaires comme composants
9.
Low power hardware architecture for handling accumulation overflows in a convolution operation
In a low power hardware architecture for handling accumulation overflows in a convolver unit, an accumulator of the convolver unit computes a running total by successively summing dot products from a dot product computation module during an accumulation cycle. In response to the running total overflowing the maximum or minimum value of a data storage element, the accumulator transmits an overflow indicator to a controller and sets its output equal to a positive or negative overflow value. In turn, the controller disables the dot product computation module by clock gating, clamping one of its inputs to zero and/or holding its inputs to constant values. At the end of the accumulation cycle, the output of the accumulator is sampled. In response to a clear signal being asserted, the dot product computation module is enabled, and the running total is set to zero for the start of the next accumulation cycle.
G06N 3/063 - Réalisation physique, c. à d. mise en œuvre matérielle de réseaux neuronaux, de neurones ou de parties de neurone utilisant des moyens électroniques
G06F 9/30 - Dispositions pour exécuter des instructions machines, p.ex. décodage d'instructions
Convolution with a 5×5 kernel involves computing the dot product of a 5×5 data block with a 5×5 kernel. Instead of computing this dot product as a single sum of 25 products, the dot product is computed as a sum of four partial sums, where each partial sum is computed as a dot product of a 3×3 data block with a 3×3 kernel. The four partial sums may be computed by a single 3×3 convolver unit over four time periods. During each time period, at least some of the weights received by the 3×3 convolver unit may correspond to a quadrant of weights from the 5×5 kernel. A shifter circuit provides shifted columns (left or right shifted) of the input data to the 3×3 convolver unit, allowing the 3×3 convolver unit access to the 3×3 data block that spatially corresponds to a particular quadrant of weights from the 5×5 kernel.
G06F 5/01 - Procédés ou dispositions pour la conversion de données, sans modification de l'ordre ou du contenu des données maniées pour le décalage, p.ex. la justification, le changement d'échelle, la normalisation
A system for evaluating a piecewise linear function includes a first look-up table with N entries, and a second look-up table with M entries, with M being less than N. Each of the N entries contains parameters that define a corresponding linear segment of the piecewise linear function. The system further includes a controller configured to store a subset of the N entries from the first look-up table in the second look-up table. The system further includes a classifier for receiving an input value and classifying the input value in one of a plurality of segments of a number line. A total number of the segments is equal to M, and the segments are non-overlapping and contiguous. The system further includes a multiplexor for selecting one of the M entries of the second look-up table based on the classification of the input value into one of the plurality of segments.
G06F 17/12 - Opérations mathématiques complexes pour la résolution d'équations d'équations simultanées
G06F 7/544 - Méthodes ou dispositions pour effectuer des calculs en utilisant exclusivement une représentation numérique codée, p.ex. en utilisant une représentation binaire, ternaire, décimale utilisant des dispositifs non spécifiés pour l'évaluation de fonctions par calcul
12.
Methods and systems for processing read-modify-write requests
A memory system comprises a plurality of memory sub-systems, each with a memory bank and other circuit components. For each of the memory sub-systems, a first buffer receives and stores a read-modify-write request (with a read address, a write address and a first operand), a second operand is read from the memory bank at the location specified by the read address, a combiner circuit combines the first operand with the second operand, an activation circuit transforms the output of the combiner circuit, and the output of the activation circuit is stored in the memory bank at the location specified by the write address. The first operand and the write address may be stored in a second buffer while the second operand is read from the memory bank. Further, the output of the activation circuit may be first stored in the first buffer before being stored in the memory bank.
Contiguous columns of a convolutional engine are partitioned into two or more groups. Each group of columns may be used to process input data. Filter weights assigned to one group may be distinct from filter weights assigned to another group.
G06N 3/063 - Réalisation physique, c. à d. mise en œuvre matérielle de réseaux neuronaux, de neurones ou de parties de neurone utilisant des moyens électroniques
G06F 5/01 - Procédés ou dispositions pour la conversion de données, sans modification de l'ordre ou du contenu des données maniées pour le décalage, p.ex. la justification, le changement d'échelle, la normalisation
G06N 3/063 - Réalisation physique, c. à d. mise en œuvre matérielle de réseaux neuronaux, de neurones ou de parties de neurone utilisant des moyens électroniques
G06F 5/01 - Procédés ou dispositions pour la conversion de données, sans modification de l'ordre ou du contenu des données maniées pour le décalage, p.ex. la justification, le changement d'échelle, la normalisation
A hardware architecture for implementing a convolutional neural network. Certain ones of the convolver units may be controlled to be active and others may be controlled to be non-active by a controller in order to perform convolution with a striding of greater than or equal to two.
G06N 3/063 - Réalisation physique, c. à d. mise en œuvre matérielle de réseaux neuronaux, de neurones ou de parties de neurone utilisant des moyens électroniques
G06F 5/01 - Procédés ou dispositions pour la conversion de données, sans modification de l'ordre ou du contenu des données maniées pour le décalage, p.ex. la justification, le changement d'échelle, la normalisation
A convolutional engine is configured to process input data that is organized into horizontal stripes. The number of accumulators present in each convolver unit of the convolutional engine may equal a total number of rows of data in each of the horizontal stripes.
G06N 3/063 - Réalisation physique, c. à d. mise en œuvre matérielle de réseaux neuronaux, de neurones ou de parties de neurone utilisant des moyens électroniques
G06F 5/01 - Procédés ou dispositions pour la conversion de données, sans modification de l'ordre ou du contenu des données maniées pour le décalage, p.ex. la justification, le changement d'échelle, la normalisation
G06N 3/063 - Réalisation physique, c. à d. mise en œuvre matérielle de réseaux neuronaux, de neurones ou de parties de neurone utilisant des moyens électroniques
G06F 5/01 - Procédés ou dispositions pour la conversion de données, sans modification de l'ordre ou du contenu des données maniées pour le décalage, p.ex. la justification, le changement d'échelle, la normalisation
Dynamic data quantization may be applied to minimize the power consumption of a system that implements a convolutional neural network (CNN). Under such a quantization scheme, a quantized representation of a 3×3 array of m-bit activation values may include 9 n-bit mantissa values and one exponent shared between the n-bit mantissa values (n
G06N 3/06 - Réalisation physique, c. à d. mise en œuvre matérielle de réseaux neuronaux, de neurones ou de parties de neurone
G06N 3/063 - Réalisation physique, c. à d. mise en œuvre matérielle de réseaux neuronaux, de neurones ou de parties de neurone utilisant des moyens électroniques
G06F 7/544 - Méthodes ou dispositions pour effectuer des calculs en utilisant exclusivement une représentation numérique codée, p.ex. en utilisant une représentation binaire, ternaire, décimale utilisant des dispositifs non spécifiés pour l'évaluation de fonctions par calcul
Dynamic data quantization may be applied to minimize the power consumption of a system that implements a convolutional neural network (CNN). Under such a quantization scheme, a quantized representation of a 3x3 array of m-bit activation values may include 9 n- bit mantissa values and one exponent shared between the n-bit mantissa values (n < m); and a quantized representation of a 3x3 kernel with p-bit parameter values may include 9 q-bit mantissa values and one exponent shared between the q-bit mantissa values (q < p). Convolution of the kernel with the activation data may include computing a dot product of the 9 n-bit mantissa values with the 9 q-bit mantissa values, and summing the shared exponents. In a CNN with multiple kernels, multiple computing units (each corresponding to one of the kernels) may receive the quantized representation of the 3x3 array of m-bit activation values from the same quantization-alignment module.
G06N 3/063 - Réalisation physique, c. à d. mise en œuvre matérielle de réseaux neuronaux, de neurones ou de parties de neurone utilisant des moyens électroniques
09 - Appareils et instruments scientifiques et électriques
Produits et services
Semiconductors; semiconductor chips; computer hardware; recorded computer software featuring artificial intelligence for the operation of computer chips; recorded computer software featuring artificial intelligence for the autonomous driving of vehicles, autonomous navigation, autonomous control of vehicles and assisted driving; recorded computer software featuring artificial intelligence for the collection, compilation, processing, transmission and dissemination of positioning data featuring roadway, geographic, map, route planning, crowd source information, travel information enabling structuring, maintaining and using computerized models of an environment of the vehicle by processing signals from sensors, recognition of landmarks including traffic signs, road profile and lampposts and correcting ego motion estimation; interactive recorded computer software that provides roadway, navigation, geographic, map and travel information; Interactive recorded computer software featuring artificial intelligence for enabling creation or updating of computerized data models of an environment.
09 - Appareils et instruments scientifiques et électriques
Produits et services
Semiconductors; semiconductor chips; computer hardware; recorded computer software featuring artificial intelligence for the operation of computer chips; recorded computer software featuring artificial intelligence for the autonomous driving of vehicles, autonomous navigation, autonomous control of vehicles and assisted driving; recorded computer software featuring artificial intelligence for the collection, compilation, processing, transmission and dissemination of positioning data featuring roadway, geographic, map, route planning, crowd source information, travel information enabling structuring, maintaining and using computerized models of an environment of the vehicle by processing signals from sensors, recognition of landmarks including traffic signs, road profile and lampposts and correcting ego motion estimation; interactive recorded computer software that provides roadway, navigation, geographic, map and travel information; Interactive recorded computer software featuring artificial intelligence for enabling creation or updating of computerized data models of an environment.
09 - Appareils et instruments scientifiques et électriques
Produits et services
Semiconductors; semiconductor chips; computer hardware; recorded computer software featuring artificial intelligence for the operation of computer chips; recorded computer software featuring artificial intelligence for the autonomous driving of vehicles, autonomous navigation, autonomous control of vehicles and assisted driving; recorded computer software featuring artificial intelligence for the collection, compilation, processing, transmission and dissemination of positioning data featuring roadway, geographic, map, route planning, crowd source information, travel information enabling structuring, maintaining and using computerized models of an environment of the vehicle by processing signals from sensors, recognition of landmarks including traffic signs, road profile and lampposts and correcting ego motion estimation; interactive recorded computer software that provides roadway, navigation, geographic, map and travel information; Interactive recorded computer software featuring artificial intelligence for enabling creation or updating of computerized data models of an environment; all of the aforesaid goods only for use in the field of object recognition for self-driving cars and autonomous and assisted driving of vehicles.
09 - Appareils et instruments scientifiques et électriques
Produits et services
Semiconductors; semiconductor chips; computer hardware; recorded computer software featuring artificial intelligence for the operation of computer chips; recorded computer software featuring artificial intelligence for the autonomous driving of vehicles, autonomous navigation, autonomous control of vehicles and assisted driving; recorded computer software featuring artificial intelligence for the collection, compilation, processing, transmission and dissemination of positioning data featuring roadway, geographic, map, route planning, crowd source information, travel information enabling structuring, maintaining and using computerized models of an environment of the vehicle by processing signals from sensors, recognition of landmarks including traffic signs, road profile and lampposts and correcting ego motion estimation; interactive recorded computer software that provides roadway, navigation, geographic, map and travel information; Interactive recorded computer software featuring artificial intelligence for enabling creation or updating of computerized data models of an environment.
09 - Appareils et instruments scientifiques et électriques
Produits et services
Semiconductors; semiconductor chips; computer hardware; recorded computer software featuring artificial intelligence for the operation of computer chips; recorded computer software featuring artificial intelligence for the autonomous driving of vehicles, autonomous navigation, autonomous control of vehicles and assisted driving; recorded computer software featuring artificial intelligence for the collection, compilation, processing, transmission and dissemination of positioning data featuring roadway, geographic, map, route planning, crowd source information, travel information enabling structuring, maintaining and using computerized models of an environment of the vehicle by processing signals from sensors, recognition of landmarks including traffic signs, road profile and lampposts and correcting ego motion estimation; interactive recorded computer software that provides roadway, navigation, geographic, map and travel information; Interactive recorded computer software featuring artificial intelligence for enabling creation or updating of computerized data models of an environment.
09 - Appareils et instruments scientifiques et électriques
Produits et services
Semiconductors; semiconductor chips; computer hardware; recorded computer software featuring artificial intelligence for the operation of computer chips; recorded computer software featuring artificial intelligence for the autonomous driving of vehicles, autonomous navigation, autonomous control of vehicles and assisted driving; recorded computer software featuring artificial intelligence for the collection, compilation, processing, transmission and dissemination of positioning data featuring roadway, geographic, map, route planning, crowd source information, travel information enabling structuring, maintaining and using computerized models of an environment of the vehicle by processing signals from sensors, recognition of landmarks including traffic signs, road profile and lampposts and correcting ego motion estimation; interactive recorded computer software that provides roadway, navigation, geographic, map and travel information; Interactive recorded computer software featuring artificial intelligence for enabling creation or updating of computerized data models of an environment.
09 - Appareils et instruments scientifiques et électriques
Produits et services
semiconductors; semiconductor chips; computer hardware; recorded computer software featuring artificial intelligence for the operation of computer chips; recorded computer software featuring artificial intelligence for the autonomous driving of vehicles, autonomous navigation, autonomous control of vehicles and assisted driving; recorded computer software featuring artificial intelligence for the collection, compilation, processing, transmission and dissemination of positioning data featuring roadway, geographic, map, route planning, crowd source information, travel information enabling structuring, maintaining and using computerized models of an environment of the vehicle by processing signals from sensors, recognition of landmarks including traffic signs, road profile and lampposts and correcting ego motion estimation; interactive recorded computer software that provides roadway, navigation, geographic, map and travel information; Interactive recorded computer software featuring artificial intelligence for enabling creation or updating of computerized data models of an environment
27.
Cluster compression for compressing weights in neural networks
A method for instantiating a convolutional neural network on a computing system. The convolutional neural network includes a plurality of layers, and instantiating the convolutional neural network includes training the convolutional neural network using a first loss function until a first classification accuracy is reached, clustering a set of F×K kernels of the first layer into a set of C clusters, training the convolutional neural network using a second loss function until a second classification accuracy is reached, creating a dictionary which maps each of a number of centroids to a corresponding centroid identifier, quantizing and compressing F filters of the first layer, storing F quantized and compressed filters of the first layer in a memory of the computing system, storing F biases of the first layer in the memory, and classifying data received by the convolutional neural network.
Labeled data is deterministically generated for training or validating machine learning models for image analysis. Approaches are described that allow this training data to be generated, for example, in real-time, and in response to the conditions at the location where images are generated by image sensors.
G06K 9/00 - Méthodes ou dispositions pour la lecture ou la reconnaissance de caractères imprimés ou écrits ou pour la reconnaissance de formes, p.ex. d'empreintes digitales
Systems, methods, and machine-readable media for deterministically generating labeled data for training or validating machine learning models for image analysis are described. Approaches described herein allow this training data to be generated, for example, in real time, and in response to the conditions at the location where images are generated by image sensors.
Systems, methods, and machine-readable media for using a convolutional neural network to generate hash strings corresponding to object instances, and thereby use the characteristic hash strings to recognize the same object instance depicted in images generated at different times and by different camera devices.
G06K 9/00 - Méthodes ou dispositions pour la lecture ou la reconnaissance de caractères imprimés ou écrits ou pour la reconnaissance de formes, p.ex. d'empreintes digitales
32.
Methods for inter-camera recognition of individuals and their properties
A convolutional neural network is used to generate hash strings corresponding to object instances. The characteristic hash strings are used to recognize the same object instance depicted in images generated at different times and by different camera devices.
G06N 3/063 - Réalisation physique, c. à d. mise en œuvre matérielle de réseaux neuronaux, de neurones ou de parties de neurone utilisant des moyens électroniques
G06F 5/01 - Procédés ou dispositions pour la conversion de données, sans modification de l'ordre ou du contenu des données maniées pour le décalage, p.ex. la justification, le changement d'échelle, la normalisation
A three-dimensional model of the environment of one or more camera devices is determined, in which image processing for inferring the model may be performed at the one or more camera devices.
Systems, methods, and machine-readable media for deterministically generating labeled data for training or validating machine learning models for image analysis, and for using such machine learning models to determine the contents of real-domain images by using a domain transfer to synthetic-appearing images are described.
G06N 3/063 - Réalisation physique, c. à d. mise en œuvre matérielle de réseaux neuronaux, de neurones ou de parties de neurone utilisant des moyens électroniques
Systems, methods, and machine-readable media for determining a three-dimensional environment model of the environment of one or more camera devices, in which image processing for inferring the model may be performed at the camera devices, are described.
09 - Appareils et instruments scientifiques et électriques
Produits et services
Semiconductors; semiconductor chips; computer hardware; recorded computer software featuring artificial intelligence for the operation of computer chips; recorded computer software featuring artificial intelligence for the autonomous driving of vehicles, autonomous navigation, autonomous control of vehicles and assisted driving; recorded computer software featuring artificial intelligence for the collection, compilation, processing, transmission and dissemination of positioning data featuring roadway, geographic, map, route planning, crowd source information, travel information enabling structuring, maintaining and using computerized models of an environment of the vehicle by processing signals from sensors, recognition of landmarks including traffic signs, road profile and lampposts and correcting ego motion estimation; interactive computer software that provides roadway, navigation, geographic, map and travel information; Interactive recorded computer software featuring artificial intelligence for enabling creation or updating of computerized data models of an environment.
09 - Appareils et instruments scientifiques et électriques
Produits et services
semiconductors; semiconductor chips; computer hardware; recorded computer software featuring artificial intelligence for the operation of computer chips; recorded computer software featuring artificial intelligence for the autonomous driving of vehicles, autonomous navigation, autonomous control of vehicles and assisted driving; recorded computer software featuring artificial intelligence for the collection, compilation, processing, transmission and dissemination of positioning data featuring roadway, geographic, map, route planning, crowd source information, travel information enabling structuring, maintaining and using computerized models of an environment of the vehicle by processing signals from sensors, recognition of landmarks including traffic signs, road profile and lampposts and correcting ego motion estimation; interactive recorded computer software that provides roadway, navigation, geographic, map and travel information; Interactive recorded computer software featuring artificial intelligence for enabling creation or updating of computerized data models of an environment