A module (100) includes a package substrate (170) for receiving a flip chip-attached semiconductor chip. A first flip chip-attached semiconductor chip (140) is attached to the package substrate (170) and a first ball grid array-attached packaged semiconductor chip (110) is attached to the package substrate (170). The first flip chip-attached semiconductor chip (140) and the first ball grid array-attached semiconductor chip (110) are in electrical communication with each other. The module (100) includes a connection component (160) attached to the package substrate (170). The connection component (160) includes an electrical coupling to couple the package substrate (170) to a corresponding connection component (160) on a motherboard (400). The package substrate (170) includes multiple conductive lines (177) to couple the first flip chip-attached semiconductor chip (140) to the first ball grid array-attached semiconductor chip (110) and to the connection component (160) attached to the package substrate (170).
H01L 23/538 - Dispositions pour conduire le courant électrique à l'intérieur du dispositif pendant son fonctionnement, d'un composant à un autre la structure d'interconnexion entre une pluralité de puces semi-conductrices se trouvant au-dessus ou à l'intérieur de substrats isolants
H01L 21/48 - Fabrication ou traitement de parties, p. ex. de conteneurs, avant l'assemblage des dispositifs, en utilisant des procédés non couverts par l'un uniquement des groupes ou
H01L 23/498 - Connexions électriques sur des substrats isolants
H01L 25/18 - Ensembles consistant en une pluralité de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide les dispositifs étant de types prévus dans plusieurs différents groupes principaux de la même sous-classe , , , , ou
H10B 80/00 - Ensembles de plusieurs dispositifs comprenant au moins un dispositif de mémoire couvert par la présente sous-classe
H10D 80/30 - Ensembles de plusieurs dispositifs comprenant au moins un dispositif couvert par la présente sous-classe l’au moins un dispositif étant couvert par les groupes , p. ex. des ensembles comprenant des puces de processeur à circuit intégré
2.
EXECUTION UNIT, PROCESSING DEVICE AND METHOD OF GENERATING RANDOM SAMPLES
An execution unit, the execution unit having access to a local memory storing a lookup table with a plurality of entries, each entry comprising an x value and corresponding y value representative of a point on a curve of a cumulative distribution function, CDF, consecutive entries of the plurality of entries forming an interval of the CDF, the execution unit being configured to: receive one or more computer program instructions, and in response: generate a random number using random number generation hardware associated with the execution unit, determine, based on the lookup table, the interval of the CDF in which the generated random number falls, and interpolate between y values of entries forming the interval based on the generated random number to generate a random sample of the CDF.
An execution unit configured to: receive a first computer program instruction to populate a lookup table with a plurality of entries, each entry comprising an x value and corresponding y value representative of a point on a curve of a function, consecutive entries of the plurality of entries forming an interval of the function, populate a lookup table stored in a local memory associated with the execution unit with the plurality of entries, receive a second computer program instruction, the second computer program instruction indicating an input value, determine, based on the lookup table, the interval of the function in which the input value falls, and interpolate between y values of entries forming the interval to generate an output value corresponding to the input value.
G06F 7/544 - Méthodes ou dispositions pour effectuer des calculs en utilisant exclusivement une représentation numérique codée, p. ex. en utilisant une représentation binaire, ternaire, décimale utilisant des dispositifs n'établissant pas de contact, p. ex. tube, dispositif à l'état solideMéthodes ou dispositions pour effectuer des calculs en utilisant exclusivement une représentation numérique codée, p. ex. en utilisant une représentation binaire, ternaire, décimale utilisant des dispositifs non spécifiés pour l'évaluation de fonctions par calcul
An execution unit, the execution unit having access to a local memory storing a lookup table with a plurality of entries, each entry comprising an x value and corresponding y value representative of a point on a curve of a cumulative distribution function, CDF, consecutive entries of the plurality of entries forming an interval of the CDF, the execution unit being configured to: receive one or more computer program instructions, and in response: generate a random number using random number generation hardware associated with the execution unit, determine, based on the lookup table, the interval of the CDF in which the generated random number falls, and interpolate between y values of entries forming the interval based on the generated random number to generate a random sample of the CDF.
An execution unit configured to: receive a first computer program instruction to populate a lookup table with a plurality of entries, each entry comprising an x value and corresponding y value representative of a point on a curve of a function, consecutive entries of the plurality of entries forming an interval of the function, populate a lookup table stored in a local memory associated with the execution unit with the plurality of entries, receive a second computer program instruction, the second computer program instruction indicating an input value, determine, based on the lookup table, the interval of the function in which the input value falls, and interpolate between y values of entries forming the interval to generate an output value corresponding to the input value.
G06F 17/17 - Évaluation de fonctions par des procédés d'approximation, p. ex. par interpolation ou extrapolation, par lissage ou par le procédé des moindres carrés
G06F 1/03 - Générateurs de fonctions numériques travaillant, au moins partiellement, par consultation de tables
A processing unit is provided with circuitry enabling execution quick evaluation of an exponential function. A multiplier circuit is used to multiply the input operand by log2(e), such that a result for the exponential function may be determined by evaluating 2i+f, where i is an integer part of a fixed-point number and f is a fractional part of the fixed-point number. A lookup table is used for providing an estimate for 2f based on the l MSBs of f. The lookup entries are provided according to a function such that the estimates for 2f are provided without bias towards either zero or infinity in the result. In other words, the maximum multiplicative error for each entry of the lookup table is the same in both negative and positive directions. In this way, statistical errors in the evaluation of a large number of exponential functions may be avoided.
G06F 7/556 - Méthodes ou dispositions pour effectuer des calculs en utilisant exclusivement une représentation numérique codée, p. ex. en utilisant une représentation binaire, ternaire, décimale utilisant des dispositifs n'établissant pas de contact, p. ex. tube, dispositif à l'état solideMéthodes ou dispositions pour effectuer des calculs en utilisant exclusivement une représentation numérique codée, p. ex. en utilisant une représentation binaire, ternaire, décimale utilisant des dispositifs non spécifiés pour l'évaluation de fonctions par calcul de fonctions logarithmiques ou exponentielles
7.
A MACHINE LEARNING SYSTEM ENABLING EFFECTIVE TRAINING
A machine learning system implements a machine learning model. The system includes at least one layer of processing nodes, each processing node comprising a processor that executes computer readable instructions to perform at least one operation based on one or more inputs received at the processing node. The operation is scaled by a first scaling factor which has been calculated to cause a variance of an output of the at least one operation to have a target variance, for example unit variance or a variance that matches the variance of the input.
G06N 3/063 - Réalisation physique, c.-à-d. mise en œuvre matérielle de réseaux neuronaux, de neurones ou de parties de neurone utilisant des moyens électroniques
A machine learning system implements a machine learning model. The system includes at least one layer of processing nodes, each processing node comprising a processor that executes computer readable instructions to perform at least one operation based on one or more inputs received at the processing node. The operation is scaled by a first scaling factor which has been calculated to cause a variance of an output of the at least one operation to have a target variance, for example unit variance or a variance that matches the variance of the input.
An execution unit performs a byte-wise rotation operation of an input data block. An input data array receives an input data block. Two first layer multiplexer arrays each receive a first layer data block comprising a respective subset of bytes of the input data block and a first layer control signal, and rotate the first layer data block by an amount indicated by the first layer control signal. The second layer multiplexer array receives a second control signal and selects between a corresponding byte of the first and second rotated first layer data blocks based on the second control signal. The execution unit also includes a control signal generator, configured to generate the first layer control signal and second layer control signal based on a received computer program instruction. Results of smaller block rotations are thus used as partial results for larger block rotation, avoiding large multiplexer arrays with complex wiring.
An execution unit performs a byte-wise rotation of an input data block. An input data array receives an input data block. Two first layer multiplexer arrays each receive a first layer data block comprising a respective subset of bytes of the input data block and a first layer control signal, and rotate the first layer data block by an amount indicated by the first layer control signal. The second layer multiplexer array receives a second control signal and selects between a corresponding byte of the first and second rotated first layer data blocks based on the second control signal. The execution unit also includes a control signal generator, configured to generate the first layer control signal and second layer control signal based on a received computer program instruction. Results of smaller block rotations are thus used as partial results for larger block rotation, avoiding large multiplexer arrays with complex wiring.
A bypass path is provided in the node for reducing the latency and power consumption associated with writing to and reading from the VC buffer, and is enabled when certain conditions are met. Bypass is enabled for a received packet when there is no other data that is ready to be sent from the VC buffer, which is the case when all VCs either have zero credits or an empty partition in the buffer. In this way, data arriving at the node is prevented from using the bypass path to take priority over data already held in the VC buffer and ready for transmission.
H04L 49/109 - Éléments de commutation de paquets caractérisés par la construction de la matrice de commutation intégrés sur micropuce, p. ex. interrupteurs sur puce
A processing device comprises a register configured to store a count value indicating a number of times overflow events have resulted from arithmetic operations performed by the processing device. An execution unit of the device, in response to performing an arithmetic operation having a result which extends beyond one of the predefined limit values for the floating-point format, stores a result value that is within the predefined limit values, and cause the count value to be incremented. The count value provides a performant way of determining the number of overflow events that have occurred during the arithmetic processing performed by the execution unit. The count value provides a metric that provides a measure of the inaccuracy imparted into the results of the application processing by overflow events.
A read and notify request is issued by a first processing unit to a lock manager on a different chip. A lock manager determines whether a condition specified by the request in relation to a variable for controlling access to a memory buffer is met. If the two are not equal, a notification request is registered until the variable changes. The second processing unit accesses the memory buffer and, when it has finished, updates the variable. If the variable then satisfies the condition specified by the read and notify request, the first processing unit is then notified by the lock manager and accesses the memory buffer. In this way, the first processing unit does not need to continually poll to determine when the variable has changed, but is notified when it is its turn to access the memory buffer.
A data processing device comprising: a plurality of processors, each of which has an associated sync request wire and an associated sync acknowledgment wire, both of which are used for co-ordinating barrier synchronisations. Each of the processors receives a signal representing a state of its sync acknowledgment wire, and asserts a sync request by setting a state of its sync request wire to be opposite to the state of its sync acknowledgement wire. The data processing device further comprises aggregation circuitry, which aggregates the state of the sync request wires to output an aggregate sync request to a sync controller. In response, the sync controller returns to each of the processors, an acknowledgment of the sync requests by causing the state of the sync acknowledgment wires to be set to be the same as the state of the sync request wires.
There is disclosed a method of controlling the frequency of a clock signal in a processor. The method selects a first clock generator to provide a processor clock signal for executing an application. If a threshold event is detected, a second clock generator is selected. The method reduces the frequency of a clock signal generated by the first clock generator while a processor clock signal is being provided for execution of an application from the second clock generator. The second clock generator generates a clock at a lower speed than the first clock generator. After a predetermined time, the first clock generator is reselected to provide the processor clock signal. The threshold detection is repeated until an optimum clock frequency is discovered.
A computer comprising a plurality of interconnected processing nodes arranged in a configuration in which multiple layers of interconnected nodes are arranged along an axis, each layer comprising at least four processing nodes connected in a non-axial ring by at least respective intralayer link between each pair of neighbouring processing nodes, wherein each of the at least four processing nodes in each layer is connected to a respective corresponding node in one or more adjacent layer by a respective interlayer link, the computer being programmed to provide in the configuration two embedded one dimensional paths and to transmit data around each of the two embedded one dimensional paths, each embedded one dimensional path using all processing nodes of the computer in such a manner that the two embedded one dimensional paths operate simultaneously without sharing links.
G06F 15/173 - Communication entre processeurs utilisant un réseau d'interconnexion, p. ex. matriciel, de réarrangement, pyramidal, en étoile ou ramifié
Each of the nodes stores a number, referred to herein as a generation number, which is updated whenever the respective node undergoes a reset and restart from checkpoint. Since the nodes of the system participate in the same reset event, at most times, each generation number held by a node will be the same across the system. However, in some cases, when one node resets before another node, the generation numbers between those two nodes will differ. The data frames sent between the nodes each comprise a generation number of the sending node, which is checked by the recipient and only accepted if the generation number in the frames matches the generation number of the recipient node.
A processing device comprising a plurality of operand registers, wherein a first subset of the operand registers are configured to store state information for a plurality of bins, comprising a range of values and a bin count associated with each respective bin, wherein a second subset of the operand registers is configured to store a vector of floating-point values; and an execution unit configured to execute a first instruction taking the state information for the plurality of bins and the vector of floating-point values as operands, and in response to execution of the first instruction, for each of the floating-point values: identify based on an exponent of the respective floating-point value, each one of the plurality of bins for which the respective floating-point value falls within the associated range of values; and increment the bin count associated with the identified bins.
Logic circuitry for multiplying floating point numbers is disclosed, comprising multiplication and addition logic. The multiplication logic includes first and second mantissa multiplying circuitry. The logic circuitry is configured to: in a first mode, determine a product of two values having a first number format, using sub-units of the first mantissa multiplying circuitry to calculate partial products of the mantissas, and using the addition logic to combine the partial products; in a second mode, determine a respective product of each of four pairs of values having a second number format, using the sub-units of the first mantissa multiplying circuitry to multiply the mantissas of the pairs; and in a third mode, determine products of each of a plurality of pairs of values having a third number format, using the second mantissa multiplying circuitry to generate a product for each pair.
A memory and routing module (100) includes a substrate (170) and a connection component (160). The connection component (160) is attached to the substrate (170) and includes multiple pins (161) that connect the module (100) to a corresponding connection component (160) on a motherboard (400). The substrate (170) is connected to a dynamic random-access memory, DRAM, chip (110), and a routing chip (140). The routing chip (140) includes a memory controller (142), multiple connections, and routing logic (46). The multiple connections include a first group between the memory controller (142) and the DRAM chip (110) and a second group of connections with the pins (161) of the connection component (160). The routing logic (46) routes data between the second group of connections and the first group of connections.
G11C 5/04 - Supports pour éléments d'emmagasinageMontage ou fixation d'éléments d'emmagasinage sur de tels supports
G11C 11/4093 - Dispositions d'interface d'entrée/sortie [E/S, I/O] de données, p. ex. mémoires tampon de données
G11C 11/4096 - Circuits de commande ou de gestion d'entrée/sortie [E/S, I/O] de données, p. ex. circuits pour la lecture ou l'écriture, circuits d'attaque d'entrée/sortie ou commutateurs de lignes de bits
G11C 5/02 - Disposition d'éléments d'emmagasinage, p. ex. sous la forme d'une matrice
G11C 5/06 - Dispositions pour interconnecter électriquement des éléments d'emmagasinage
G11C 7/10 - Dispositions d'interface d'entrée/sortie [E/S, I/O] de données, p. ex. circuits de commande E/S de données, mémoires tampon de données E/S
Each of the nodes stores a number, referred to herein as a generation number, which is updated whenever the respective node undergoes a reset and restart from checkpoint. Since the nodes of the system participate in the same reset event, at most times, each generation number held by a node will be the same across the system. However, in some cases, when one node resets before another node, the generation numbers between those two nodes will differ. The data frames sent between the nodes each comprise a generation number of the sending node, which is checked by the recipient and only accepted if the generation number in the frames matches the generation number of the recipient node. If the recipient node resets prior to the reset of the sending node, the generation number in the frames will not match the generation number of the recipient node, and the frames will not be accepted. Therefore, the node that has reset and restarted is protected against packets relating to an earlier generation of the application that are dispatched by a node that has not yet reset.
G06F 11/14 - Détection ou correction d'erreur dans les données par redondance dans les opérations, p. ex. en utilisant différentes séquences d'opérations aboutissant au même résultat
G06F 15/173 - Communication entre processeurs utilisant un réseau d'interconnexion, p. ex. matriciel, de réarrangement, pyramidal, en étoile ou ramifié
G06F 11/00 - Détection d'erreursCorrection d'erreursContrôle de fonctionnement
A processing device comprising a plurality of operand registers, wherein a first subset of the operand registers are configured to store state information for a plurality of bins, comprising a range of values and a bin count associated with each respective bin, wherein a second subset of the operand registers is configured to store a vector of floating-point values; and an execution unit configured to execute a first instruction taking the state information for the plurality of bins and the vector of floating-point values as operands, and in response to execution of the first instruction, for each of the floating-point values: identify based on an exponent of the respective floating-point value, each one of the plurality of bins for which the respective floating-point value falls within the associated range of values; and increment the bin count associated with the identified bins.
G06F 7/24 - Tri, c.-à-d. extraction de données d'un ou de plusieurs supports, nouveau rangement des données dans un ordre de succession numérique ou autre, et réinscription des données triées sur le support original ou sur un support différent ou sur une série de supports
G06F 7/483 - Calculs avec des nombres représentés par une combinaison non linéaire de nombres codés, p. ex. nombres rationnels, système de numération logarithmique ou nombres à virgule flottante
A computer structure comprises a first silicon substrate in which is formed computer circuitry and analogue circuitry for supporting communications. A second silicon substrate comprises a plurality of distributed capacitance units, and is connected to the first substrate via a set of connectors arranged extending depth-wise of the structure. The second substrate has an outer surface on which are arranged a supply voltage connector terminal and a ground connector terminal for connecting the computer structure to a supply voltage for the analogue circuitry and to ground respectively. One or more of the distributed capacitance units of the second silicon substrate is connected between the supply voltage connector and the ground connector terminal via one or more of the set of connectors to provide a decoupling capacitor for the analogue circuitry.
H01L 25/18 - Ensembles consistant en une pluralité de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide les dispositifs étant de types prévus dans plusieurs différents groupes principaux de la même sous-classe , , , , ou
H01L 23/00 - Détails de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide
H01L 21/768 - Fixation d'interconnexions servant à conduire le courant entre des composants distincts à l'intérieur du dispositif
24.
Memory and Routing Module for Use in a Computer System
A memory and routing module includes a substrate and a connection component. The connection component is attached to the substrate and includes multiple pins that connect the module to a corresponding connection component on a motherboard. The substrate is connected to a dynamic random-access memory, DRAM, chip, and a routing chip. The routing chip includes a memory controller, multiple connections, and routing logic. The multiple connections include a first group between the memory controller and the DRAM chip and a second group of connections with the pins of the connection component. The routing logic routes data between the second group of connections and the first group of connections.
A processing device has a plurality of interfaces and a plurality of processors. During different phases of execution of a computer program, different processors are associated with different interfaces, such that the connectivity between processors and interfaces for the sending of egress data and the receiving of ingress data may change during execution of that computer program. The change in this connectivity is directed by the compiled code running on the processors. The compiled code selects which buses associated with which interfaces, given processors are to connect to for receipt of ingress data. Furthermore, the compiled code causes control messages to be sent to circuitry associated with the interfaces, so as to control which buses associated with which processors, given interfaces are to connect to.
G06F 9/30 - Dispositions pour exécuter des instructions machines, p. ex. décodage d'instructions
G06F 9/52 - Synchronisation de programmesExclusion mutuelle, p. ex. au moyen de sémaphores
26.
Computer System Having Multiple Computer Devices Each with Routing Logic and Memory Controller and Multiple Computer Devices Each with Processing Circuitry
A computer includes first and second computer devices of a first class. Each computer device of the first class includes first and second external ports, at least one memory controller to attach to external memory, and routing logic to route data from the first external port to one of the memory controller and the second external port. The computer further includes first and second computer devices of a second class. The first computer device of the second class is connected to the first external ports via respective first and second links. The second computer device of the second class is connected to the second external ports via respective third and fourth links. The first and second computer devices of the second class include processing circuitry to execute a computer program and are connected to the first and second links, or third and fourth links, respectively to transmit and receive messages.
A memory attachment and routing chip includes a single die having a set of external ports; at least one memory attachment interface comprising a memory controller to attach to external memory, and a fabric core in which routing logic is implemented. The routing logic can (i) receive a first packet of a first type from a first port of the set of ports, the first type of packet being a memory access packet with a memory address which lies in a range of memory addresses associated with the memory attachment and routing chip, detect the memory address and route the packet of the first type to the memory attachment interface. The routing logic can (ii) receive a second packet of a second type, the second type of packet being an inter-processor packet comprising a destination identifier identifying a processing chip external to the memory attachment.
An error event vector is defined for the device, where each element of that error event vector is used to indicate whether or not an event of the associated event class has occurred for any of the components of the device. If so, a control node causes the respective element of each of the copies of the error event vector to be set to indicate that an error of the event class has occurred. A component, i.e. the second one of the components, performs a responsive action for the event class in response to the update to its own copy of the error event vector.
A heatsink is provided for a memory and routing module with a lower and upper side, both sides having multiple semiconductor chips attached. The lower side of the module has a connection component attached for connection to a motherboard. The heatsink includes a module receiving region configured to receive a lower side of the module, including a first thermally conductive portion arranged to face the semiconductor chips, an aperture through the lower heatsink component and a thermally conductive peripheral region disposed around the module receiving region. The heatsink includes an upper heatsink component which is configured to connect to the lower heatsink component at the peripheral region to retain the module. The upper heatsink component includes a lower side. The lower side includes a second thermally conductive portion arranged to face the semiconductor chips disposed on an upper side of the module and multiple second heat dissipating elements.
H01L 23/46 - Dispositions pour le refroidissement, le chauffage, la ventilation ou la compensation de la température impliquant le transfert de chaleur par des fluides en circulation
H01L 25/065 - Ensembles consistant en une pluralité de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide les dispositifs étant tous d'un type prévu dans une seule des sous-classes , , , , ou , p. ex. ensembles de diodes redresseuses les dispositifs n'ayant pas de conteneurs séparés les dispositifs étant d'un type prévu dans le groupe
H01L 23/40 - Supports ou moyens de fixation pour les dispositifs de refroidissement ou de chauffage amovibles
A processing device comprising: a control register configured to store a scaling factor; at least one execution unit configured to execute instructions to perform arithmetic operations on input floating-point numbers provided according to a first floating-point format, wherein each of the input floating-point numbers provided according to the first floating-point format comprises a predetermined number of bits, wherein the at least one execution unit is configured to, in response to execution of an instance of a first of the instructions: perform processing of a first set of the input floating-point numbers to generate a result value, the result value provided in a further format and comprising more the predetermined number of bits, enabling representation of a greater range of values than is representable in the first floating-point format; and apply the scaling factor specified in the control register to increase or decrease an exponent of the result value.
Each of the processing devices stores an event vector, which is updated when certain events (e.g. memory errors, overtemperature events) occur on the device. Different elements of the vector correspond to different types of events. When an event of a given type occurs on one device, the update to the event vector on that device is propagated to other devices in the system. Those other devices, in response, update the corresponding element in their own event vector to indicate that an event of that given type has occurred in the system. In this way, events are aggregated between the different devices using the event vector. The event vector is considered to be a global event vector, since its elements indicate whether certain events have occurred across the entire system, and the vector is consistent across the system.
A module (100) includes a package substrate (170) for receiving a flip chip-attached semiconductor chip. A first flip chip-attached semiconductor chip (140) is attached to the package substrate (170) and a first ball grid array-attached packaged semiconductor chip (110) is attached to the package substrate (170). The first flip chip-attached semiconductor chip (140) and the first ball grid array-attached semiconductor chip (110) are in electrical communication with each other. The module (100) includes a connection component (160) attached to the package substrate (170). The connection component (160) includes an electrical coupling to couple the package substrate (170) to a corresponding connection component (160) on a motherboard (400). The package substrate (170) includes multiple conductive lines (177) to couple the first flip chip-attached semiconductor chip (140) to the first ball grid array-attached semiconductor chip (110) and to the connection component (160) attached to the package substrate (170).
H01L 23/538 - Dispositions pour conduire le courant électrique à l'intérieur du dispositif pendant son fonctionnement, d'un composant à un autre la structure d'interconnexion entre une pluralité de puces semi-conductrices se trouvant au-dessus ou à l'intérieur de substrats isolants
A memory attachment and routing chip includes a single die having a set of external ports; at least one memory attachment interface comprising a memory controller to attach to external memory, and a fabric core in which routing logic is implemented. The routing logic can (i) receive a first packet of a first type from a first port of the set of ports, the first type of packet being a memory access packet with a memory address which lies in a range of memory addresses associated with the memory attachment and routing chip, detect the memory address and route the packet of the first type to the memory attachment interface. The routing logic can (ii) receive a second packet of a second type, the second type of packet being an inter-processor packet comprising a destination identifier identifying a processing chip external to the memory attachment and routing chip and route the second packet to a second one of the external ports, the second one of the external ports being selected based on the destination identifier.
G06F 12/0813 - Systèmes de mémoire cache multi-utilisateurs, multiprocesseurs ou multitraitement avec configuration en réseau ou matrice
G06F 12/06 - Adressage d'un bloc physique de transfert, p. ex. par adresse de base, adressage de modules, extension de l'espace d'adresse, spécialisation de mémoire
H04L 49/109 - Éléments de commutation de paquets caractérisés par la construction de la matrice de commutation intégrés sur micropuce, p. ex. interrupteurs sur puce
H04L 49/25 - Routage ou recherche de route dans une matrice de commutation
A hardware module is provided in an execution unit and is responsive to execution of multiple instances of a new type of instruction to perform a plurality of reductions in parallel. The hardware module comprises: a first accumulator storing first state associated with a first of the reductions; and a second accumulator storing second state associated with a second of the reductions. Upon execution of each of the multiple instances of the first type of instruction: an input value for the respective instance is provided to a first processing circuit of the hardware module such that the first processing circuit performs a first type of operation to update the first state; and the same input value is provided to the second processing circuit of the hardware module such that the second processing circuit performs a second type of operation to update the second state.
By providing a mode indication, an execution unit is operable to operate in two separate modes, each of which cause the execution unit to perform calculations by interpreting the same bit string (the first of the bit strings) as representing one of two different values. When operating in the first mode, the first of the bit string represents an undefined value, in other words a NaN. When operating in the second mode, the first of the bit strings represents a negative zero. Hence, the same string of bits can represent either a NaN or a negative zero depending upon the mode of operation of the processor. Since it is not necessary to reserve more than one bit string to represent these two special values, the remaining combinations of bits are available to represent other values.
A new type of instruction and a control register for the new type of instruction are provided to handle data that may be misaligned in memory. A first part of data (which may be misaligned in memory) is loaded into a first set of registers by loading a first atom containing the first part of data into registers. The pack instruction is executed by an execution unit to place part of data (whose length and starting position are indicated by second and third values in a control register) from one set of registers into an identified location (identified by a first value in the control register) in another set of registers.
A processing device comprising: at least one execution unit configured to interleave execution of a plurality of worker threads, wherein each of the worker threads is configured to execute a same set of code to perform operations on a different set of data held in an input buffer of a memory of the processing device and output the results data to an output buffer. An instruction is executed so as to cause a plurality of operand registers, each of which is associated with one of the worker threads, to be populated with one or more variables enabling each worker to determine where in the input buffer is located its set of input data and where to store its results data.
A new type of instruction and a control register for the new type of instruction are provided to handle data that may be misaligned in memory. A first part of data (which may be misaligned in memory) is loaded into a first set of registers by loading a first atom containing the first part of data into registers. The pack instruction is executed by an execution unit to place part of data (whose length and starting position are indicated by second and third values in a control register) from one set of registers into an identified location (identified by a first value in the control register) in another set of registers.
A processing device comprising: at least one execution unit configured to interleave execution of a plurality of worker threads, wherein each of the worker threads is configured to execute a same set of code to perform operations on a different set of data held in an input buffer of a memory of the processing device and output the results data to an output buffer. An instruction is executed so as to cause a plurality of operand registers, each of which is associated with one of the worker threads, to be populated with one or more variables enabling each worker to determine where in the input buffer is located its set of input data and where to store its results data.
A computer-implemented method of training a multi-layer neural network comprising a set of network weights, comprising: processing the training data in respective forward and backward passes through multiple layers, the forward pass comprising computing a set of activations in dependence on the network weights and training data, and the backward pass comprising: computing gradients of a pre-determined loss function with respect to the network weights and/or activations, wherein an adjustment parameter is applied to at least a subset of values in the neural network, the values comprising at least one of: the network weights, the activations, the gradients with respect to activations and the gradients with respect to weights; updating the network weights in dependence on the computed gradients; computing a proportion of the subset of values falling above a predefined threshold; and updating the adjustment parameter in dependence on the computed proportion.
A computer-implemented method comprising: processing data in a neural network to compute a network tensor comprising a plurality of tensor elements represented in an initial numerical format; computing a histogram of tensor elements; selecting a target numerical format, the target numerical format having a lower precision than the initial numerical format; evaluating a metric based on the histogram of tensor elements and the target numerical format, the metric indicating a degree of accuracy of a representation of the network tensor in the target numerical format; and based on the evaluated metric, converting the plurality of tensor elements from the initial numerical format to the target numerical format.
G06F 18/211 - Sélection du sous-ensemble de caractéristiques le plus significatif
G06F 18/21 - Conception ou mise en place de systèmes ou de techniquesExtraction de caractéristiques dans l'espace des caractéristiquesSéparation aveugle de sources
G06F 18/22 - Critères d'appariement, p. ex. mesures de proximité
G06N 3/04 - Architecture, p. ex. topologie d'interconnexion
42.
Communication in a computer having multiple processors
A computer comprising a plurality of processors, each of which are configured to perform operations on data during a compute phase for the computer and, following a pre-compiled synchronisation barrier, exchange data with at least one other of the processors during an exchange phase for the computer, wherein of the processors in the computer is indexed and the data exchange operations carried out by each processor in the exchange phase depend upon its index value.
G06F 9/48 - Lancement de programmes Commutation de programmes, p. ex. par interruption
G06F 9/52 - Synchronisation de programmesExclusion mutuelle, p. ex. au moyen de sémaphores
G06F 15/80 - Architectures de calculateurs universels à programmes enregistrés comprenant un ensemble d'unités de traitement à commande commune, p. ex. plusieurs processeurs de données à instruction unique
G06N 3/084 - Rétropropagation, p. ex. suivant l’algorithme du gradient
H04L 45/00 - Routage ou recherche de routes de paquets dans les réseaux de commutation de données
In order to provide for the extension of either the MAC address or the VLAN identifier as required, a sliding cursor functionality between the MAC address and the VLAN identifier is provided. The MAC address may be extended by borrowing bits conventionally used for representing part of the VLAN identifier. Similarly, VLAN identifier may be extended by borrowing bits conventionally used for representing part of the MAC address.
A hardware module comprises at least a first ingress buffer and a second ingress buffer, where the second ingress buffer holds data packets from multiple source components. The module first determines that the packet at the head of the first ingress buffer is targeting the first destination. To ensure fairness between one or more sources providing data to the first ingress buffer and the plurality of sources providing data to the second ingress buffer, processing circuitry examines source identifiers in packets held in the second ingress buffer and selects between the buffers so as to arbitrate between the sources. In some embodiments, the examination of the source identifiers provides statistics for a weighted round robin between the ingress buffers. In other embodiments, the source identifier of whichever packet is currently at the head of the second ingress buffer is used to perform a simple round robin between the sources.
H04L 49/253 - Routage ou recherche de route dans une matrice de commutation en utilisant l'établissement ou la libération de connexions entre les ports
45.
Fair arbitration between multiple sources targeting a destination
A hardware module comprises at least a first ingress buffer and a second ingress buffer, where the second ingress buffer holds data packets from a plurality of source components. To ensure fairness between one or more sources providing data to the first ingress buffer and the plurality of sources providing data to the second ingress buffer, processing circuitry examines source identifiers in packets held in the second ingress buffer and selects between the buffers so as to arbitrate between the sources. In some embodiments, the examination of the source identifiers provides statistics for a weighted round robin between the ingress buffers. In other embodiments, the source identifier of whichever packet is currently at the head of the second ingress buffer is used to perform a simple round robin between the sources.
H04L 12/28 - Réseaux de données à commutation caractérisés par la configuration des liaisons, p. ex. réseaux locaux [LAN Local Area Networks] ou réseaux étendus [WAN Wide Area Networks]
H04L 49/9047 - Dispositions de mémoires tampon comprenant plusieurs mémoires tampon, p. ex. des réservoirs de mémoires tampon
One or more bits of the destination MAC address indicate a number of times a reset event has occurred. These bits may be referred to as a generation number. The generation number in a destination MAC address is updated when a reset event occurs. In this way, frames issued by the sender prior to the reset may be distinguished from frames issued after the reset, since the destination MAC addresses in those frames will not match. In this way, the recipient device is protected from stale packets.
H04L 69/22 - Analyse syntaxique ou évaluation d’en-têtes
H04L 41/069 - Gestion des fautes, des événements, des alarmes ou des notifications en utilisant des journaux de notificationsPost-traitement des notifications
A data processing system having an address resolution function for deriving MAC addresses. The set of MACs defined for the devices on the network encode physical position or logical identifier information of those devices. Therefore, each of these MACs is derivable using a mapping function that maps the physical position or logical identifier information supplied by an application to the MAC addresses of the devices on the network. When the protocol processing entity has to send data over the network, it can obtain the MAC address for the destination determined on the basis of the physical position or logical identifier supplied by the application. In this way, since the MACs are derivable on the basis of the physical positions or logical identifiers, the broadcasting of ARP request messages, which would otherwise be required when the protocol processing entity requires the MAC for the destination, may be avoided.
H04L 45/00 - Routage ou recherche de routes de paquets dans les réseaux de commutation de données
H04L 61/103 - Correspondance entre adresses de types différents à travers les couches réseau, p. ex. résolution d’adresse de la couche réseau dans la couche physique ou protocole de résolution d'adresse [ARP]
A method for testing a stacked integrated circuit device comprising a first die and a second die, the method comprising: sending from testing logic of the first die, first testing control signals to first testing apparatus on the first die; in response to the first testing control signals, the first testing apparatus running a first one or more tests for testing functional logic or memory of the first die; sending from the testing logic of the first die, second testing control signals to the second die via through silicon vias formed in a substrate of the first die; and in dependence upon the second testing control signals from the first die, running a second one or more tests for testing the stacked integrated circuit device.
The first logic wafer is attached to a supporting wafer, which adds sufficient depth to this bonded structure such that the first logic wafer may be thinned during the manufacturing process. The first logic wafer is thinned such that the through silicon vias may be etched in the substrate of the first logic wafer so as to provide adequate connectivity to a second logic wafer, which is bonded to the first logic wafer. The second logic wafer adds sufficient depth to this bonded structure to allow the supporting wafer to then be thinned to enable through silicon vias to be added to the supporting wafer so as to provide appropriate connectivity for the entire stacked structure. The thinned supporting wafer is retained in the finished stacked wafer structure and may comprise additional components (e.g. capacitors) supporting the operation of the processing circuitry in the logic wafers.
H01L 25/065 - Ensembles consistant en une pluralité de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide les dispositifs étant tous d'un type prévu dans une seule des sous-classes , , , , ou , p. ex. ensembles de diodes redresseuses les dispositifs n'ayant pas de conteneurs séparés les dispositifs étant d'un type prévu dans le groupe
H01L 23/00 - Détails de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide
H01L 25/00 - Ensembles consistant en une pluralité de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide
A computer-implemented method of training a deep neural network, comprising, for each of one or more batches of training examples: processing the data in a forward pass through the layers of the network, by: applying a set of network weights to the input data to obtain a set of weighted inputs, normalising the weighted inputs based on statistics computed for each training example, transforming the normalised inputs by affine transformation parameters, applying an activation function to the transformed normalised inputs to obtain post-activation values, and normalizing the post-activation values based on one or more proxy variables sampled from a distribution defined by proxy distribution parameters, the normalization applied independently of training example; processing the data in a backward pass through the network to determine updates to learnable parameters comprising network weights, affine transformation parameters, and proxy distribution parameters, and updating the learnable parameters to optimise a predefined loss function.
G06N 3/084 - Rétropropagation, p. ex. suivant l’algorithme du gradient
G06N 3/063 - Réalisation physique, c.-à-d. mise en œuvre matérielle de réseaux neuronaux, de neurones ou de parties de neurone utilisant des moyens électroniques
A computer-implemented method of training a deep neural network, comprising, for each of one or more batches of training examples: processing the data in a forward pass through the layers of the network, by: applying a set of network weights to the input data to obtain a set of weighted inputs, normalising the weighted inputs based on statistics computed for each training example, transforming the normalised inputs by affine transformation parameters, applying an activation function to the transformed normalised inputs to obtain post-activation values, and normalizing the post-activation values based on one or more proxy variables sampled from a distribution defined by proxy distribution parameters, the normalization applied independently of training example; processing the data in a backward pass through the network to determine updates to learnable parameters comprising network weights, affine transformation parameters, and proxy distribution parameters, and updating the learnable parameters to optimise a predefined loss function.
The same test data frame is dispatched from a network interface device a plurality of times so as to test a network. Since the same test data frame is used, it may be unnecessary for a new test data frame to be provided and protocol processed each time one is required to be sent. The protocol processing resources of the network interface device are then available for sending further traffic in parallel with the dispatch of the test data frames. On the receive side, the network interface device collects statistics regarding the reliable receipt of test frames, without requiring the test frames to be further processed and provided to a driver of the network interface device. In this way, the processing and buffering capacity in the network interface device is available for handling further traffic in parallel with the test traffic.
For certain applications, parts of the application data held in memory of a processing device (e.g. that are produced as a result of operations performed by the execution unit) are arranged in regular repeating patterns in the memory, and therefore, the execution unit may set up a suitable striding pattern for use by a send engine. The send engine accesses the memory at locations in accordance with the configured striding pattern so as to access a plurality of items of data that are arranged together in a regular pattern. In a similar manner as done for sends, the execution may set up a striding pattern for use by a receive engine. The receive engine, upon receiving a plurality of items of data, causes those items of data to be stored at locations in the memory, as determined in accordance with the configured striding pattern.
G06F 12/0804 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p. ex. mémoires cache avec mise à jour de la mémoire principale
G06F 9/30 - Dispositions pour exécuter des instructions machines, p. ex. décodage d'instructions
A computer comprising a plurality of processor devices connected in a ring, wherein each of the processor devices is connected to each of two neighbouring ones of the processor devices by a respective physical inter-processor link. Each of a set of external memory device stores a local portion of the externally stored dataset. Each processor device executes instructions to: determine that a synchronisation point has been reached by the plurality of processor devices; responsive to the determination, access from its connected external memory device its local portion of the externally stored dataset stored; record a copy of its local portion of the externally stored dataset in its local memory; transmit its local portion of the externally stored dataset to at least one of its connected neighbouring processing devices; and receive an incoming portion of the externally stored dataset from at least one of its connected neighbouring processing devices.
A processor comprises at least one delay stage for each processing circuit and switching circuitry for selectively switching the delay stage into or out of a communication path involved in message exchange. For processing circuits up to a defective processing circuit in the column, the delay stage is switched into the communication path, and for processing circuits above the defective processing circuit in the column, including a repairing processing circuit which repairs the defective processing circuit the delay stage is switched out of the communication path whereby the fixed transmission time of processing circuits is preserved in the event of a repair of the column.
A computer comprising a plurality of processor devices connected in a ring, wherein each of the processor devices is connected to each of two neighbouring ones of the processor devices by a respective physical inter-processor link. Each of a set of external memory device stores a local portion of the externally stored dataset. Each processor device executes instructions to: determine that a synchronisation point has been reached by the plurality of processor devices; responsive to the determination, access from its connected external memory device its local portion of the externally stored dataset stored; record a copy of its local portion of the externally stored dataset in its local memory; transmit its local portion of the externally stored dataset to at least one of its connected neighbouring processing devices; and receive an incoming portion of the externally stored dataset from at least one of its connected neighbouring processing devices.
G06F 15/173 - Communication entre processeurs utilisant un réseau d'interconnexion, p. ex. matriciel, de réarrangement, pyramidal, en étoile ou ramifié
G06F 15/167 - Communication entre processeurs utilisant une mémoire commune, p. ex. boîte aux lettres électronique
G06F 13/16 - Gestion de demandes d'interconnexion ou de transfert pour l'accès au bus de mémoire
For certain applications, parts of the application data held in memory of a processing device (e.g. that are produced as a result of operations performed by the execution unit) are arranged in regular repeating patterns in the memory, and therefore, the execution unit may set up a suitable striding pattern for use by a send engine. The send engine accesses the memory at locations in accordance with the configured striding pattern so as to access a plurality of items of data that are arranged together in a regular pattern. In a similar manner as done for sends, the execution may set up a striding pattern for use by a receive engine. The receive engine, upon receiving a plurality of items of data, causes those items of data to be stored at locations in the memory, as determined in accordance with the configured striding pattern.
G06F 9/30 - Dispositions pour exécuter des instructions machines, p. ex. décodage d'instructions
G06F 9/345 - Adressage de l'opérande d'instruction ou du résultat ou accès à l'opérande d'instruction ou au résultat d'opérandes ou de résultats multiples
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire
A gateway for interfacing a host with a subsystem for acting as a work accelerator to the host. The gateway enables the transfer of batches of data to the subsystem at precompiled data exchange synchronisation points. The gateway acts to route data between accelerators which are connected in a scaled system of multiple gateways and accelerators using a global address space set up at compile time of an application to run on the computer system.
G06F 12/00 - Accès à, adressage ou affectation dans des systèmes ou des architectures de mémoires
G06F 9/22 - Aménagements de microcommande ou de microprogramme
G06F 9/52 - Synchronisation de programmesExclusion mutuelle, p. ex. au moyen de sémaphores
G06F 12/0813 - Systèmes de mémoire cache multi-utilisateurs, multiprocesseurs ou multitraitement avec configuration en réseau ou matrice
G06F 12/0831 - Protocoles de cohérence de mémoire cache à l’aide d’un schéma de bus, p. ex. avec moyen de contrôle ou de surveillance
G06F 13/14 - Gestion de demandes d'interconnexion ou de transfert
H04L 12/66 - Dispositions pour la connexion entre des réseaux ayant différents types de systèmes de commutation, p. ex. passerelles
H04L 67/1097 - Protocoles dans lesquels une application est distribuée parmi les nœuds du réseau pour le stockage distribué de données dans des réseaux, p. ex. dispositions de transport pour le système de fichiers réseau [NFS], réseaux de stockage [SAN] ou stockage en réseau [NAS]
H04W 92/06 - Interfaces entre des dispositifs formant réseau hiérarchiquement différents entre des passerelles et des dispositifs formant réseau public
A processor comprises an exchange, a plurality of columns, and a plurality of exchange scan chains. The exchange comprises a plurality of exchange paths, each comprising a set of exchange path portions, for transmitting data between processing units. Each of the plurality of column comprises processing units, each processing unit connected to output data to a respective exchange path, and column pipe circuitry for providing a controllable path between the exchange and the processing units. The column pipe circuitry comprises a column wrapper chain for preventing a scan test signal from passing between the exchange paths and the processing units. The exchange scan chains enable scan testing of the exchange paths. Each exchange scan chain comprises a plurality of scan chain segments, each scan chain segment comprises an exchange path portion connected to at least one of the processing units of at least one of the columns of the processor.
G01R 31/3185 - Reconfiguration pour les essais, p. ex. LSSD, découpage
G06F 11/22 - Détection ou localisation du matériel d'ordinateur défectueux en effectuant des tests pendant les opérations d'attente ou pendant les temps morts, p. ex. essais de mise en route
G06F 11/267 - Reconfiguration pour les tests, p. ex. LSSD, découpage
A method for repairing a processor. The processor comprises a plurality of processing units and an exchange comprising a plurality of exchange paths for transmitting data between the processing units. Each processing unit is connected to output data to a respective exchange path. An exchange path functional test of at least a portion of the exchange paths is carried out. Based on the exchange path functional test, it is identified that one or more of the exchange paths is defective, and the processing units connected to the one or more defective exchange paths is identified. The identified processing units are switched out of functional operation of the processor and switching in at least one repair processing unit connected to a non-defective exchange path for functional operation of the processor.
A device comprising: a processing unit comprising at least one processor configured to: participate in barrier synchronisations, each of which separates a compute phase of the at least one processor from an exchange phase for the at least one processor; and exchange sync messages with a sync controller hardware unit so as to co-ordinate each of the barrier synchronisations; and sync trace circuitry configured to: receive one or more of the sync messages; and in response to each of the one or more of the sync messages, provide sync trace information for output from the device, the sync trace information comprising timing information associated with the respective sync message.
In a stacked integrated circuit device, there are two components, one in a first of the die and another in a second of the die. Each of the components is provided with two output connections, one leading above and one leading below the die, and two input connections, one leading above and one leading below the die, either of the two die. As a result of the redundancy, both die may be used in either position in the stacked structure. If either of the die is used as the top die, it sends data on its second output path and receives data on its second input path. On the other hand, when one of the die is used as the bottom die, it sends data on its first output path and receives data on its first input path. In this way, the same design may be used for the connections between each of the die.
H01L 23/528 - Configuration de la structure d'interconnexion
H01L 25/065 - Ensembles consistant en une pluralité de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide les dispositifs étant tous d'un type prévu dans une seule des sous-classes , , , , ou , p. ex. ensembles de diodes redresseuses les dispositifs n'ayant pas de conteneurs séparés les dispositifs étant d'un type prévu dans le groupe
H01L 23/00 - Détails de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide
G06F 13/20 - Gestion de demandes d'interconnexion ou de transfert pour l'accès au bus d'entrée/sortie
63.
Tracing activity from multiple components of a device
A device comprising: a bus forming a ring path for circulation of one or more data packets around the bus, wherein the one or more data packets comprises a trace report packet for collecting trace data from a plurality of components attached to the bus, wherein the bus is configured to repeatedly circulate the trace report packet with a fixed time period taken for each circulation of the ring path performed by the trace report packet; and the plurality of components, each of which comprises circuitry configured to, upon reception of the trace report packet at the respective component, insert one or more items of the trace data that have been obtained by the respective component.
A multi-tile processing unit in which the tiles in the processing unit may be divided between two or more different external sync groups for performing barrier synchronisations. In this way, different sets of tiles of the same processing unit each sync with different sets of tiles external to that processing unit.
A method for securely terminating a distributed trusted execution environment spanning a plurality of work accelerators. Each accelerator is configured to self-isolate upon determining that the distributed TEE is to be terminated across the system of accelerators. The data is also wiped from the processor memory of each accelerator, such that the data cannot be read out from the processor memory once the accelerator's links are re-enabled. The self-isolation is performed on each accelerator prior to the step of terminating the TEE on that accelerator. An accelerator only re-enables its links to other accelerators once the data is wiped from its processor memory such that the secret data is removed from the accelerator memory.
G06F 21/53 - Contrôle des utilisateurs, des programmes ou des dispositifs de préservation de l’intégrité des plates-formes, p. ex. des processeurs, des micrologiciels ou des systèmes d’exploitation au stade de l’exécution du programme, p. ex. intégrité de la pile, débordement de tampon ou prévention d'effacement involontaire de données par exécution dans un environnement restreint, p. ex. "boîte à sable" ou machine virtuelle sécurisée
66.
Terminating distributed trusted execution environment via confirmation messages
A method for securely terminating a distributed trusted execution environment (TEE) spanning a plurality of work accelerators. After wiping sensitive data from the memory of its accelerator, a root of trust for each accelerator is configured to receive confirmation that the data has been wiped from the processor memory in relevant other accelerators prior to moving on to the next stage at which the TEE on its associated accelerator is terminated. Since the data has been wiped from the other accelerators, even if a third party were to inject malicious code into the accelerator, they would be unable to read out the secret data from the other accelerators since the data has been wiped from those other accelerators. In this way, a mechanism is provided for ensuring that when the distributed TEE is terminated, malicious third parties are unable to read out confidential data from the accelerators.
G06F 21/53 - Contrôle des utilisateurs, des programmes ou des dispositifs de préservation de l’intégrité des plates-formes, p. ex. des processeurs, des micrologiciels ou des systèmes d’exploitation au stade de l’exécution du programme, p. ex. intégrité de la pile, débordement de tampon ou prévention d'effacement involontaire de données par exécution dans un environnement restreint, p. ex. "boîte à sable" ou machine virtuelle sécurisée
G06F 21/64 - Protection de l’intégrité des données, p. ex. par sommes de contrôle, certificats ou signatures
A set of configurable sync groupings (which may be referred to as sync zones) are defined. Any of the processors may belong to any of the sync zones. Each of the processor comprises a register indicating to which of the sync zones it belongs. If a processor does not belong to a sync zone, it continually asserts a sync request for that sync zone to the sync controller. If a processor does belong to a sync zone, it will only assert its sync request for that sync zone upon arriving at a synchronisation point for that sync zone indicated in its compiled code set.
A set of configurable sync groupings (which may be referred to as sync zones) are defined. Any of the processors may belong to any of the sync zones. Each of the processor comprises a register indicating to which of the sync zones it belongs. If a processor does not belong to a sync zone, it continually asserts a sync request for that sync zone to the sync controller. If a processor does belong to a sync zone, it will only assert its sync request for that sync zone upon arriving at a synchronisation point for that sync zone indicated in its compiled code set.
G06F 9/52 - Synchronisation de programmesExclusion mutuelle, p. ex. au moyen de sémaphores
G06F 15/173 - Communication entre processeurs utilisant un réseau d'interconnexion, p. ex. matriciel, de réarrangement, pyramidal, en étoile ou ramifié
A multi-tile processing unit in which the tiles in the processing unit may be divided between two or more different external sync groups for performing barrier synchronisations. In this way, different sets of tiles of the same processing unit each sync with different sets of tiles external to that processing unit.
G06F 9/52 - Synchronisation de programmesExclusion mutuelle, p. ex. au moyen de sémaphores
G06F 15/173 - Communication entre processeurs utilisant un réseau d'interconnexion, p. ex. matriciel, de réarrangement, pyramidal, en étoile ou ramifié
The device implements a processing pipeline having distinct circuitry for performing encryption/decryption operations and authentication operations and having state stores associated with the respective operations. The state stores store state associated with a given encryption frame, enabling the respective operations to be performed when blocks of data reach that stage in the pipeline. Due to the complexity of operations in a block cipher encryption scheme, the pipeline is deep, which provide the possibility for processing multiple data packets at any one time. The provision of the state stores at the stages in the pipeline at which they are required prevents stalling when a new data packet is received.
H04L 9/06 - Dispositions pour les communications secrètes ou protégéesProtocoles réseaux de sécurité l'appareil de chiffrement utilisant des registres à décalage ou des mémoires pour le codage par blocs, p. ex. système DES
Signature generation circuitry is configured to update a signature in response to each of a plurality of writes to memory. The signature is updated by performing bitwise operations between current bit values of the signature and at least some of the bits written to memory in response a write. The bitwise operation are order-independent such that the resulting signature is the same irrespective of the order in which the writes are used to update the signature. The signatures are formed in an order-independent manner such that, if no errors have occurred in generating the data to be written to be memory, the signatures will match. In this way, a compact signature is developed that is suitable export from the data processing device for checking against a corresponding data processing device of a machine running a duplicate application.
G06F 11/10 - Détection ou correction d'erreur par introduction de redondance dans la représentation des données, p. ex. en utilisant des codes de contrôle en ajoutant des chiffres binaires ou des symboles particuliers aux données exprimées suivant un code, p. ex. contrôle de parité, exclusion des 9 ou des 11
G06F 11/07 - Réaction à l'apparition d'un défaut, p. ex. tolérance de certains défauts
H03K 19/21 - Circuits OU EXCLUSIF, c.-à-d. donnant un signal de sortie si un signal n'existe qu'à une seule entréeCircuits à COÏNCIDENCES, c.-à-d. ne donnant un signal de sortie que si tous les signaux d'entrée sont identiques
72.
Control of data sending from a multi-processor device
A method for controlling the sending of data by a plurality of processors belonging to a device, the method comprising: sending a first message to a first processor of the plurality of processors to grant permission to the first processor of the plurality of processors to send a first set of data packets over at least one external interface of the device; receiving from the first processor, an identifier of a second processor of the plurality of processors; and in response to receipt of the identifier of the second processor, send a second message to the second processor to grant permission to the second processor to send a second set of data packets over the at least one external interface.
G06F 13/364 - Gestion de demandes d'interconnexion ou de transfert pour l'accès au bus ou au système à bus communs avec commande d'accès centralisée utilisant des signaux indépendants de demande ou d'autorisation, p. ex. utilisant des lignes séparées de demande et d'autorisation
G06F 9/52 - Synchronisation de programmesExclusion mutuelle, p. ex. au moyen de sémaphores
A computer, including a plurality of processing nodes arranged in two-dimensional arrays in respective front and rear layers. Each processing node has a set of activatable links. When activated, transmission of data items between the nodes connected via the activated link is enabled. When not activated, transmission of data items between the nodes is prevented. The set of activatable links including a respective link which connects the processing node to each adjacent node in the array, and to a facing processing node in the other layer. An allocation engine is configured to receive an allocation instruction and connected to the processing nodes to selectively activate the links in a configuration.
G06F 15/173 - Communication entre processeurs utilisant un réseau d'interconnexion, p. ex. matriciel, de réarrangement, pyramidal, en étoile ou ramifié
During normal operation of a processor, voltage droop is likely to occur and there is, therefore, a need for techniques for rapidly and accurately detecting this droop so as to reduce the probability of circuit timing failures. The droop detector described herein uses a tap sampled delay line in which a clock signal is split along two separate paths. Each of the taps in the paths are separated by two inverter delays such that the set of samples produced represent sample values of the clock signal that are each separated by a single inverter delay without inversion of the first clock signal between the samples.
G06F 1/06 - Générateurs d'horloge produisant plusieurs signaux d'horloge
G01R 19/165 - Indication de ce qu'un courant ou une tension est, soit supérieur ou inférieur à une valeur prédéterminée, soit à l'intérieur ou à l'extérieur d'une plage de valeurs prédéterminée
H03K 5/14 - Dispositions ayant une sortie unique et transformant les signaux d'entrée en impulsions délivrées à des intervalles de temps désirés par l'utilisation de lignes à retard
G01R 17/02 - Dispositions dans lesquelles la valeur à mesurer est automatiquement comparée à une valeur de référence
75.
Method of debugging a processor that executes vertices of an application, each vertex being assigned to a programming thread of the processor
A method for debugging a processor which is executing vertices of a software application is described. Each vertex is assigned to a programming thread of the processor. The processor has debug hardware for raising exceptions in certain break conditions. The method comprises inspecting a vertex identifier, comparing the vertex identifier and raising an instruction exception event for the programming thread if the vertex identifier assigned to the thread matches the vertex break identifier in the debug hardware. Exceptions are raised based on identified vertices, rather than just individual instructions or instruction addresses.
Signature generation circuitry is configured to update a signature in response to each of a plurality of writes to memory. The signature is updated by performing bitwise operations between current bit values of the signature and at least some of the bits written to memory in response a write. The bitwise operation are order-independent such that the resulting signature is the same irrespective of the order in which the writes are used to update the signature. The signatures are formed in an order-independent manner such that, if no errors have occurred in generating the data to be written to be memory, the signatures will match. In this way, a compact signature is developed that is suitable export from the data processing device for checking against a corresponding data processing device of a machine running a duplicate application.
G06F 11/00 - Détection d'erreursCorrection d'erreursContrôle de fonctionnement
G06F 11/10 - Détection ou correction d'erreur par introduction de redondance dans la représentation des données, p. ex. en utilisant des codes de contrôle en ajoutant des chiffres binaires ou des symboles particuliers aux données exprimées suivant un code, p. ex. contrôle de parité, exclusion des 9 ou des 11
H03K 19/21 - Circuits OU EXCLUSIF, c.-à-d. donnant un signal de sortie si un signal n'existe qu'à une seule entréeCircuits à COÏNCIDENCES, c.-à-d. ne donnant un signal de sortie que si tous les signaux d'entrée sont identiques
G06F 11/07 - Réaction à l'apparition d'un défaut, p. ex. tolérance de certains défauts
A system and method for providing a set of data transfer instructions for converting one or more tensors between two different layouts. A first layout is used for storage of the data in host memory. A second layout is used for storage of the data in external memory accessible to a subsystem. The subsystem acts as a work accelerator to the host, and reads the external memory and processes the data read from the external memory. The first layout may be a logical representation of the tensor. The second layout is optimised for transfer to and processing by the subsystem. The data transfer instructions for converting between the two layouts are generated in dependence upon an analysis of the instructions to be executed by the subsystem.
A computer design verification system comprising a parsing module configured to receive output messages from a computer design testing tool and to compose from the output messages formatted objects comprising a set of fields having field descriptors and test values; a signoff module holding a plurality of signoff objects, each comprising a plurality of fields having a field descriptor, at least some fields populated with a signoff expression, each signoff object associated with a severity level indicative of the severity of a condition represented by the signoff object. The signoff module is configured compare at least one test value in the formatted objects received from the parsing module with at least one signoff expression in the signoff objects to determine if a signoff object matches the formatted object, and in the case of a match, associating the severity level of the signoff object with the formatted object.
The invention relates to a computer program comprising a sequence of instructions for execution on a processing unit having instruction storage for holding the computer program, an execution unit for executing the computer program and data storage for holding data, the computer program comprising one or more computer executable instruction which, when executed, implements: a send function which causes a data packet destined for a recipient processing unit to be transmitted on a set of connection wires connected to the processing unit, the data packet having no destination identifier but being transmitted at a predetermined transmit time; and a switch control function which causes the processing unit to control switching circuitry to connect a set of connection wires of the processing unit to a switching fabric to receive a data packet at a predetermined receive time.
G06F 15/173 - Communication entre processeurs utilisant un réseau d'interconnexion, p. ex. matriciel, de réarrangement, pyramidal, en étoile ou ramifié
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire
G06F 15/80 - Architectures de calculateurs universels à programmes enregistrés comprenant un ensemble d'unités de traitement à commande commune, p. ex. plusieurs processeurs de données à instruction unique
80.
Execution unit for evaluating functions using Newton Raphson iterations
An execution unit for a processor, the execution unit comprising: a look up table having a plurality of entries, each of the plurality of entries comprising an initial estimate for a result of an operation; a preparatory circuit configured to search the look up table using an index value dependent upon the operand to locate an entry comprising a first initial estimate for a result of the operation; a plurality of processing circuits comprising at least one multiplier circuit; and control circuitry configured to provide the first initial estimate to the at least one multiplier circuit of the plurality of processing circuits so as perform processing, by the plurality of processing units, of the first initial estimate to generate the function result, said processing comprising applying one or more Newton Raphson iterations to the first initial estimate.
G06F 7/483 - Calculs avec des nombres représentés par une combinaison non linéaire de nombres codés, p. ex. nombres rationnels, système de numération logarithmique ou nombres à virgule flottante
G06F 7/552 - Méthodes ou dispositions pour effectuer des calculs en utilisant exclusivement une représentation numérique codée, p. ex. en utilisant une représentation binaire, ternaire, décimale utilisant des dispositifs n'établissant pas de contact, p. ex. tube, dispositif à l'état solideMéthodes ou dispositions pour effectuer des calculs en utilisant exclusivement une représentation numérique codée, p. ex. en utilisant une représentation binaire, ternaire, décimale utilisant des dispositifs non spécifiés pour l'évaluation de fonctions par calcul de puissances ou racines
G06F 1/03 - Générateurs de fonctions numériques travaillant, au moins partiellement, par consultation de tables
Two or more die are stacked together in a stacked integrated circuit device. Each of the processors on these die is able to communicate with other processors on its die by sending data over the switching fabric of its respective die. The mechanism for sending data between processors on the same die (i.e. intradie communication) is reused for sending data between processors on different die (i.e. interdie communication). The reuse of the mechanism is enabled by assigning each processor a vertical neighbour on its opposing die. Each processor has an interdie connection that connects it to the output exchange bus of its neighbour. A processor is able to borrow the output exchange bus of its neighbour by sending data along the output exchange bus of its neighbour.
G06F 15/80 - Architectures de calculateurs universels à programmes enregistrés comprenant un ensemble d'unités de traitement à commande commune, p. ex. plusieurs processeurs de données à instruction unique
G06F 13/36 - Gestion de demandes d'interconnexion ou de transfert pour l'accès au bus ou au système à bus communs
A function approximation system is disclosed for determining output floating point values of functions calculated using floating point numbers. Complex functions have different shapes in different subsets of their input domain, making them difficult to predict for different values of the input variable. The function approximation system comprises an execution unit configured to determine corresponding values of a given function given a floating point input to the function; a plurality of look up tables for each function type; a correction table of values which determines if corrections to the output value are required; and a table selector for finding an appropriate table for a given function.
G06F 17/17 - Évaluation de fonctions par des procédés d'approximation, p. ex. par interpolation ou extrapolation, par lissage ou par le procédé des moindres carrés
G06F 1/03 - Générateurs de fonctions numériques travaillant, au moins partiellement, par consultation de tables
G06F 16/901 - IndexationStructures de données à cet effetStructures de stockage
83.
Repeat Instruction for Loading and/or Executing Code in a Claimable Repeat Cache a Specified Number of Times
A processor is disclosed including: a barrel-threaded execution unit for executing concurrent threads, and a repeat cache shared between the concurrent threads. The processor's instruction set includes a repeat instruction which takes a repeat count operand. When the repeat cache is not claimed and the repeat instruction is executed in a first thread, a portion of code is cached from the first thread into the repeat cache, the state of the repeat cache is changed to record it as claimed, and the cached code is executed a number of times. When the repeat instruction is then executed in a further thread, then the already-cached portion of code is again executed a respective number of times, each time from the repeat cache. For each of the first and further instructions, the repeat count operand in the respective instruction specifies the number of times to execute the cached code.
G06F 9/30 - Dispositions pour exécuter des instructions machines, p. ex. décodage d'instructions
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire
G06F 12/0875 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p. ex. mémoires cache avec mémoire cache dédiée, p. ex. instruction ou pile
84.
Data exchange pathways between pairs of processing units in columns in a computer
A time deterministic computer is architected so that exchange code compiled for one set of tiles, e.g., a column, can be reused on other sets. The computer comprises: a plurality of processing units each having an input interface with a set of input wires, and an output interface with a set of output wires: a switching fabric connected to each of the processing units by the respective set of output wires and connectable to each of the processing units by the respective set of output wires and connectable to each of the processing units by the respective input wires via switching circuitry controllable by its associated processing unit; the processing units arranged in columns, each column having a base processing unit proximate the switching fabric and multiple processing units one adjacent the other in respective positions in the direction of the column.
G06F 15/80 - Architectures de calculateurs universels à programmes enregistrés comprenant un ensemble d'unités de traitement à commande commune, p. ex. plusieurs processeurs de données à instruction unique
G06F 9/30 - Dispositions pour exécuter des instructions machines, p. ex. décodage d'instructions
G06F 9/52 - Synchronisation de programmesExclusion mutuelle, p. ex. au moyen de sémaphores
A system comprising a gateway for interfacing external data sources with one or more accelerators. The gateway comprises a plurality of virtual gateways, each of which is configured to stream data from the external data sources to one or more associated accelerators. The plurality of virtual gateways are each configured to stream data from external data sources so that the data is received at an associated accelerator in response to a synchronisation point being obtained by a synchronisation zone. Each of the virtual gateways is assigned a virtual ID so that when data is received at the gateway, data can be delivered to the appropriate gateway.
G06F 9/455 - ÉmulationInterprétationSimulation de logiciel, p. ex. virtualisation ou émulation des moteurs d’exécution d’applications ou de systèmes d’exploitation
86.
Controlling Warpage of a Substrate for Mounting a Semiconductor Die
A substrate and a method for manufacturing the substrate. The substrate is suitable for mounting at least one semiconductor die onto a printed circuit board. The substrate comprises two opposing stacks, with each stack comprising alternating layers of copper and electrically insulating film. The film and the copper have different co-efficients of thermal expansion, allowing the warpage behaviour of the substrate to be controlled by providing the substrate with different film thicknesses between the opposing stacks.
H01L 23/00 - Détails de dispositifs à semi-conducteurs ou d'autres dispositifs à l'état solide
H01L 23/14 - Supports, p. ex. substrats isolants non amovibles caractérisés par le matériau ou par ses propriétés électriques
H01L 23/532 - Dispositions pour conduire le courant électrique à l'intérieur du dispositif pendant son fonctionnement, d'un composant à un autre comprenant des interconnexions externes formées d'une structure multicouche de couches conductrices et isolantes inséparables du corps semi-conducteur sur lequel elles ont été déposées caractérisées par les matériaux
H01L 21/768 - Fixation d'interconnexions servant à conduire le courant entre des composants distincts à l'intérieur du dispositif
H05K 1/09 - Emploi de matériaux pour réaliser le parcours métallique
H05K 1/18 - Circuits imprimés associés structurellement à des composants électriques non imprimés
H01L 23/498 - Connexions électriques sur des substrats isolants
87.
CONTROLLING WARPAGE OF A SUBSTRATE FOR MOUNTING A SEMICONDUCTOR DIE
A substrate and a method for manufacturing the substrate. The substrate is suitable for mounting at least one semiconductor die onto a printed circuit board. The substrate comprises two opposing stacks, with each stack comprising alternating layers of copper and electrically insulating film. The film and the copper have different co-efficients of thermal expansion, allowing the warpage behaviour of the substrate to be controlled by providing the substrate with different film thicknesses between the opposing stacks.
H01L 21/48 - Fabrication ou traitement de parties, p. ex. de conteneurs, avant l'assemblage des dispositifs, en utilisant des procédés non couverts par l'un uniquement des groupes ou
A system comprising: a first subsystem comprising at least one first processor, and a second subsystem comprising one or more second processors. A first program is arranged to run on the at least one first processor, the first program being configured to send data from the first subsystem to the second subsystem. A second program is arranged to run on the one more second processors, the second program being configured to operate on the data content from the first subsystem. The first program is configured to set a checkpoint at one or more points in time. At each checkpoint it records in memory of the first subsystem i) a program state of the second program, comprising a state of one or more registers on each of the second processors at the time of the checkpoint, and ii) a copy of the data content sent to the second subsystem since the respective checkpoint.
G06F 11/14 - Détection ou correction d'erreur dans les données par redondance dans les opérations, p. ex. en utilisant différentes séquences d'opérations aboutissant au même résultat
G06F 9/52 - Synchronisation de programmesExclusion mutuelle, p. ex. au moyen de sémaphores
G06F 11/36 - Prévention d'erreurs par analyse, par débogage ou par test de logiciel
A memory controller is provided for reading and writing to and from a memory module. The memory controller implements an error correction algorithm, which calculates error correction code for message data to be written to the memory module and checks the error correction code against the message data when the data is read out of the memory module. The memory controller spreads each codeword over at least four different beats sent over the interface with the memory module, with each beat comprising a symbol of error correction code. Bits of a particular symbol of message data occupy the same positions in different beats. Since the bits of the symbols occupy the same positions in different beat, the number of bits affected by a hardware error is minimised. With four symbols of error correction code available for use in the codeword.
H03M 13/00 - Codage, décodage ou conversion de code pour détecter ou corriger des erreursHypothèses de base sur la théorie du codageLimites de codageMéthodes d'évaluation de la probabilité d'erreurModèles de canauxSimulation ou test des codes
G11C 29/42 - Dispositifs de vérification de réponse utilisant des codes correcteurs d'erreurs [ECC] ou un contrôle de parité
H03M 13/11 - Détection d'erreurs ou correction d'erreurs transmises par redondance dans la représentation des données, c.-à-d. mots de code contenant plus de chiffres que les mots source utilisant un codage par blocs, c.-à-d. un nombre prédéterminé de bits de contrôle ajouté à un nombre prédéterminé de bits d'information utilisant plusieurs bits de parité
H03M 13/15 - Codes cycliques, c.-à-d. décalages cycliques de mots de code produisant d'autres mots de code, p. ex. codes définis par un générateur polynomial, codes de Bose-Chaudhuri-Hocquenghen [BCH]
G11C 29/12 - Dispositions intégrées pour les tests, p. ex. auto-test intégré [BIST]
A method of processing batches of data in a computer comprising a plurality of pipelined stages each providing one or more layers of a machine learning model. The method comprises: processing a first batch of data in the pipeline processing stages, each layer of the model using an activation function and weights for that layer to generate an output activation, wherein an output layer generates an output of the model. The method further comprises, for each layer: computing an estimated gradient of a loss function; generating updated weights by processing the estimated gradient with respect to the weights for the first batch using a learning rate for the model; and storing the updated weights for processing on the next batch of data. Updated weights are generated using a modulation factor based on the number of processing stages between that layer and the output layer.
A processor comprising: a register file comprising a group of operand registers for holding data values, each operand register being a fixed number of bits in length for holding a respective data value of that length; and processing logic comprising floating point logic for performing floating point operations on data values in the register file, the floating point logic is configured to process the fixed number of bits in the respective data value according to a floating point format comprising a set of mantissa bits and a set of exponent bits. The processing logic is operable to select between a plurality of different variants of the floating point format, at least some of the variants having a different size sets of mantissa bits and exponent bits relative to one another.
A device comprising a processing unit having a plurality of processors is provided. At least one encryption unit is provided as part of the device for encrypting data written by the processors to external storage and decrypting data read from that storage. The processors are divided into different sets, with state information held in the encryption unit for performing encryption/decryption operations for requests for different sets of processors. This enables interleaved read completions or write requests from different sets of processors to be handled by the encryption unit, since associated state information for each set of processors is independently maintained.
G06F 21/72 - Protection de composants spécifiques internes ou périphériques, où la protection d'un composant mène à la protection de tout le calculateur pour assurer la sécurité du calcul ou du traitement de l’information dans les circuits de cryptographie
G06F 21/78 - Protection de composants spécifiques internes ou périphériques, où la protection d'un composant mène à la protection de tout le calculateur pour assurer la sécurité du stockage de données
H04L 9/06 - Dispositions pour les communications secrètes ou protégéesProtocoles réseaux de sécurité l'appareil de chiffrement utilisant des registres à décalage ou des mémoires pour le codage par blocs, p. ex. système DES
H04L 9/14 - Dispositions pour les communications secrètes ou protégéesProtocoles réseaux de sécurité utilisant plusieurs clés ou algorithmes
G06F 3/06 - Entrée numérique à partir de, ou sortie numérique vers des supports d'enregistrement
H04L 9/32 - Dispositions pour les communications secrètes ou protégéesProtocoles réseaux de sécurité comprenant des moyens pour vérifier l'identité ou l'autorisation d'un utilisateur du système
The provision of redundancy in a sync network, which protects the sync network against faults, such as broken cables in the sync network. The gateway comprises a sync propagation module configured to provide redundant sync requests that are sent along different pathways in the sync network. These sync requests are sent to towards different masters in the sync network. If a fault occurs at a point in one of the paths, the gateway will still receive a sync acknowledgment returned along the other path. Furthermore, the use of redundant sync networks, propagating the sync requests across different paths, allows fault detection in the wiring to be detected.
H04L 1/22 - Dispositions pour détecter ou empêcher les erreurs dans l'information reçue en utilisant un appareil en excédent pour accroître la fiabilité
A computer comprising a plurality of processing units, each processing unit having an execution unit and access to computer memory which stores code executable by the execution unit and input values of an input vector to be processed by the code, the code, when executed, configured to access the computer memory to obtain multiple pairs of input values of the input vector, determine a maximum or corrected maximum input value of each pair as a maximum result element, determine and store in a computer memory a maximum or corrected maximum result of each pair of maximum result elements as an approximation to the natural log of the sum of the exponents of the input values and access the computer memory to obtain each input value and apply it to the maximum or corrected maximum result to generate each output value of a Softmax output vector.
G06F 7/556 - Méthodes ou dispositions pour effectuer des calculs en utilisant exclusivement une représentation numérique codée, p. ex. en utilisant une représentation binaire, ternaire, décimale utilisant des dispositifs n'établissant pas de contact, p. ex. tube, dispositif à l'état solideMéthodes ou dispositions pour effectuer des calculs en utilisant exclusivement une représentation numérique codée, p. ex. en utilisant une représentation binaire, ternaire, décimale utilisant des dispositifs non spécifiés pour l'évaluation de fonctions par calcul de fonctions logarithmiques ou exponentielles
G06F 7/22 - Dispositions pour le tri ou l'interclassement de données de calculateur sur des supports d'enregistrement continus, p. ex. bande, tambour, disque
A predictive clock controller is provided for modifying the frequency of a clock signal provided to a processing unit based on knowledge of the power usage by the application running on the processing unit during different execution periods. The predictive clock controller counts barrier syncs for the application, so as to determine where the application is in its sync schedule. The predictive clock controller is able to determine from the number of counted syncs, when the application will transition from one execution period to another execution period with different power requirements, and to adjust the clock frequency accordingly.
A processor in a network has a plurality of processing units arranged on a chip. An on-chip interconnect enables data to be exchanged between the processing units. A plurality of external interfaces are configured to communicate data off chip in the form of packets, each packet having a destination address identifying a destination of the packet. The external interfaces are connected to respective additional connected processors. A routing bus routes packets between the processing units and the external interfaces. A routing register defines a routing domain for the processor, the routing domain comprising one or more of the additional processor, and at least a subset of further additional processors of the network, wherein the additional processors of the subset are directly or indirectly connected to the processor. The routing domain can be modified by changing the contents of the routing register as a sliding window domain.
A host system compiles a set of local programs which are provided over a network to a plurality of subsystems. By defining the synchronisation activity on the host, and then providing that information to the subsystems, the host can service a large number of subsystems. The defined synchronisation activity includes defining the synchronisation groups between which synchronisation barriers occur and the points during program execution at which data exchange with the host occurs. Defining synchronisation activity between the subsystems allows a large number of subsystems to be connecting whilst minimising the required exchanges with the host.
G06F 15/173 - Communication entre processeurs utilisant un réseau d'interconnexion, p. ex. matriciel, de réarrangement, pyramidal, en étoile ou ramifié
A new apparatus and method for securely distributing an application to processors of a processing unit. The processing unit is formed as part of an integrated circuit and comprises a plurality of processors (referred to as tiles), each having their own execution unit and storage for storing application data and additional executable instructions. The integrated circuit comprises a hardware module (referred to herein as the autoloader) that is configured to distribute a set of bootloader instructions (referred to herein as a secondary bootloader) to each of at least some of the tiles. Each of the tiles then executes instructions of the received secondary bootloader, which causes each tile to issue read requests to read a set of executable application instructions from a memory external to the integrated circuit. Each tile then performs operations using the received set of executable application instructions so as execute the application using the processing unit.
G06F 21/57 - Certification ou préservation de plates-formes informatiques fiables, p. ex. démarrages ou arrêts sécurisés, suivis de version, contrôles de logiciel système, mises à jour sécurisées ou évaluation de vulnérabilité
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire