Instruction set architectures (ISAs) and apparatus and methods related thereto comprise an instruction set that includes one or more instructions which identify the global pointer (GP) register as an operand (e.g., base register or source register) of the instruction. Identification can be implicit. By implicitly identifying the GP register as an operand of the instruction, one or more bits of the instruction that were dedicated to explicitly identifying the operand (e.g., base register or source register) can be used to extend the size of one or more other operands, such as the offset or immediate, to provide longer offsets or immediates.
G06F 9/30 - Dispositions pour exécuter des instructions machines, p. ex. décodage d'instructions
G06F 5/01 - Procédés ou dispositions pour la conversion de données, sans modification de l'ordre ou du contenu des données maniées pour le décalage, p. ex. la justification, le changement d'échelle, la normalisation
G06F 9/32 - Formation de l'adresse de l'instruction suivante, p. ex. par incrémentation du compteur ordinal
G06F 9/34 - Adressage de l'opérande d'instruction ou du résultat ou accès à l'opérande d'instruction ou au résultat
09 - Appareils et instruments scientifiques et électriques
42 - Services scientifiques, technologiques et industriels, recherche et conception
Produits et services
Microprocessors; integrated circuits; computer central
processing units. Computer system design services; consulting services in the
field of design, selection, implementation and use of
computer hardware and software systems for others;
development of computer platforms.
3.
Neural network data computation using mixed-precision
Techniques for mixed-precision data manipulation for neural network data computation are disclosed. A first left group comprising eight bytes of data and a first right group of eight bytes of data are obtained for computation using a processor. A second left group comprising eight bytes of data and a second right group of eight bytes of data are obtained. A sum of products is performed between the first left and right groups and the second left and right groups. The sum of products is performed on bytes of 8-bit integer data. A first result is based on a summation of eight values that are products of the first group's left eight bytes and the second group's left eight bytes. A second result is based on the summation of eight values that are products of the first group's left eight bytes and the second group's right eight bytes. Results are output.
G06F 7/544 - Méthodes ou dispositions pour effectuer des calculs en utilisant exclusivement une représentation numérique codée, p. ex. en utilisant une représentation binaire, ternaire, décimale utilisant des dispositifs n'établissant pas de contact, p. ex. tube, dispositif à l'état solideMéthodes ou dispositions pour effectuer des calculs en utilisant exclusivement une représentation numérique codée, p. ex. en utilisant une représentation binaire, ternaire, décimale utilisant des dispositifs non spécifiés pour l'évaluation de fonctions par calcul
G06N 3/063 - Réalisation physique, c.-à-d. mise en œuvre matérielle de réseaux neuronaux, de neurones ou de parties de neurone utilisant des moyens électroniques
G06N 5/022 - Ingénierie de la connaissanceAcquisition de la connaissance
09 - Appareils et instruments scientifiques et électriques
42 - Services scientifiques, technologiques et industriels, recherche et conception
Produits et services
(1) Microprocessors; integrated circuits; computer central processing units. (1) Computer system design services; consulting services in the field of design, selection, implementation and use of computer hardware and software systems for others; development of computer platforms.
Aspects relate to Input/Output (IO) Memory Management Units (MMUs) that include hardware structures for implementing virtualization. Some implementations allow guests to setup and maintain device IO tables within memory regions to which those guests have been given permissions by a hypervisor. Some implementations provide hardware page table walking capability within the IOMMU, while other implementations provide static tables. Such static tables may be maintained by a hypervisor on behalf of guests. Some implementations reduce a frequency of interrupts or invocation of hypervisor by allowing transactions to be setup by guests without hypervisor involvement within their assigned device IO regions. Devices may communicate with IOMMU to setup the requested memory transaction, and completion thereof may be signaled to the guest without hypervisor involvement. Various other aspects will be evident from the disclosure.
G06F 3/06 - Entrée numérique à partir de, ou sortie numérique vers des supports d'enregistrement
G06F 12/0811 - Systèmes de mémoire cache multi-utilisateurs, multiprocesseurs ou multitraitement avec hiérarchies de mémoires cache multi-niveaux
G06F 12/0831 - Protocoles de cohérence de mémoire cache à l’aide d’un schéma de bus, p. ex. avec moyen de contrôle ou de surveillance
G06F 12/1027 - Traduction d'adresses utilisant des moyens de traduction d’adresse associatifs ou pseudo-associatifs, p. ex. un répertoire de pages actives [TLB]
6.
Neural network processing using specialized data representation
Techniques for neural network processing using specialized data representation are disclosed. Input data for manipulation in a layer of a neural network is obtained. The input data includes image data, where the image data is represented in bfloat16 format without loss of precision. The manipulation of the input data is performed on a processor that supports single-precision operations. The input data is converted to a 16-bit reduced floating-point representation, where the reduced floating-point representation comprises an alternative single-precision data representation mode. The input data is manipulated with one or more 16-bit reduced floating-point data elements. The manipulation includes a multiply and add-accumulate operation. The manipulation further includes a unary operation, a binary operation, or a conversion operation. A result of the manipulating is forwarded to a next layer of the neural network.
G06F 7/483 - Calculs avec des nombres représentés par une combinaison non linéaire de nombres codés, p. ex. nombres rationnels, système de numération logarithmique ou nombres à virgule flottante
In one embodiment, a method includes accessing a virtual address from a request to access a memory of the computing device, where virtual addresses are translated to physical addresses of physical memory in the computing device using an N-level page table, and the lowest level of the page table contains page-table entries specifying the physical address of a frame of physical memory. The method includes searching, using the virtual address, a translation lookaside buffer (TLB) including a plurality of TLB entries, each TLB entry including (1) a tag identifying a virtual address associated with the entry and (2) a page-table entry specifying the physical address of a lower-level page table or of a frame of physical memory associated with the virtual address identified in the tag; and iteratively performing, until the virtual address is translated to a physical address, an address-translation procedure that depends on the cached TLB entries.
G06F 12/1027 - Traduction d'adresses utilisant des moyens de traduction d’adresse associatifs ou pseudo-associatifs, p. ex. un répertoire de pages actives [TLB]
09 - Appareils et instruments scientifiques et électriques
42 - Services scientifiques, technologiques et industriels, recherche et conception
Produits et services
Microprocessors; Integrated circuits; Computer central processing units Computer system design services; Consulting services in the field of design, selection, implementation and use of computer hardware and software systems for others; Development of computer platforms
09 - Appareils et instruments scientifiques et électriques
42 - Services scientifiques, technologiques et industriels, recherche et conception
Produits et services
Microprocessors; Integrated circuits; Computer central processing units Computer system design services; Consulting services in the field of design, selection, implementation and use of computer hardware and software systems for others; Development of computer platforms
10.
Neural network data computation using mixed-precision
Techniques for mixed-precision data manipulation for neural network data computation are disclosed. A first left group comprising eight bytes of data and a first right group of eight bytes of data are obtained for computation using a processor. A second left group comprising eight bytes of data and a second right group of eight bytes of data are obtained. A sum of products is performed between the first left and right groups and the second left and right groups. The sum of products is performed on bytes of 8-bit integer data. A first result is based on a summation of eight values that are products of the first group's left eight bytes and the second group's left eight bytes. A second result is based on the summation of eight values that are products of the first group's left eight bytes and the second group's right eight bytes. Results are output.
G06F 7/544 - Méthodes ou dispositions pour effectuer des calculs en utilisant exclusivement une représentation numérique codée, p. ex. en utilisant une représentation binaire, ternaire, décimale utilisant des dispositifs n'établissant pas de contact, p. ex. tube, dispositif à l'état solideMéthodes ou dispositions pour effectuer des calculs en utilisant exclusivement une représentation numérique codée, p. ex. en utilisant une représentation binaire, ternaire, décimale utilisant des dispositifs non spécifiés pour l'évaluation de fonctions par calcul
G06N 3/063 - Réalisation physique, c.-à-d. mise en œuvre matérielle de réseaux neuronaux, de neurones ou de parties de neurone utilisant des moyens électroniques
Techniques are disclosed for address manipulation using indices and tags. A first index is generated from bits of a processor program counter, where the first index is used to access a branch predictor bimodal table. A first branch prediction is provided from the bimodal table, based on the first index. The first branch prediction is matched against N tables, where the tables contain prior branch histories, and where: the branch history in table T(N) is of greater length than the branch history of table T(N-1), and the branch history in table T(N-1) is of greater length than the branch history of table T(N-2). A processor address is manipulated using a greatest length of hits of branch prediction matches from the N tables, based on one or more hits occurring. The branch predictor address is manipulated using the first branch prediction from the bimodal table, based on zero hits occurring.
Aspects relate to Input/Output (IO) Memory Management Units (MMUs) that include hardware structures for implementing virtualization. Some implementations allow guests to setup and maintain device IO tables within memory regions to which those guests have been given permissions by a hypervisor. Some implementations provide hardware page table walking capability within the IOMMU, while other implementations provide static tables. Such static tables may be maintained by a hypervisor on behalf of guests. Some implementations reduce a frequency of interrupts or invocation of hypervisor by allowing transactions to be setup by guests without hypervisor involvement within their assigned device IO regions. Devices may communicate with IOMMU to setup the requested memory transaction, and completion thereof may be signaled to the guest without hypervisor involvement. Various other aspects will be evident from the disclosure.
G06F 3/06 - Entrée numérique à partir de, ou sortie numérique vers des supports d'enregistrement
G06F 12/1027 - Traduction d'adresses utilisant des moyens de traduction d’adresse associatifs ou pseudo-associatifs, p. ex. un répertoire de pages actives [TLB]
G06F 12/0811 - Systèmes de mémoire cache multi-utilisateurs, multiprocesseurs ou multitraitement avec hiérarchies de mémoires cache multi-niveaux
G06F 12/0831 - Protocoles de cohérence de mémoire cache à l’aide d’un schéma de bus, p. ex. avec moyen de contrôle ou de surveillance
Techniques are disclosed for address manipulation using indices and tags. A first index is generated from bits of a processor program counter, where the first index is used to access a branch predictor bimodal table. A first branch prediction is provided from the bimodal table, based on the first index. The first branch prediction is matched against N tables, where the tables contain prior branch histories, and where: the branch history in table T(N) is of greater length than the branch history of table T(N-1), and the branch history in table T(N-1) is of greater length than the branch history of table T(N-2). A processor address is manipulated using a greatest length of hits of branch prediction matches from the N tables, based on one or more hits occurring. The branch predictor address is manipulated using the first branch prediction from the bimodal table, based on zero hits occurring.
Techniques for mixed-precision data manipulation for neural network data computation are disclosed. A first left group comprising eight bytes of data and a first right group of eight bytes of data are obtained for computation using a processor. A second left group comprising eight bytes of data and a second right group of eight bytes of data are obtained. A sum of products is performed between the first left and right groups and the second left and right groups. The sum of products is performed on bytes of 8-bit integer data. A first result is based on a summation of eight values that are products of the first group's left eight bytes and the second group's left eight bytes. A second result is based on the summation of eight values that are products of the first group's left eight bytes and the second group's right eight bytes. Results are output.
G06F 7/544 - Méthodes ou dispositions pour effectuer des calculs en utilisant exclusivement une représentation numérique codée, p. ex. en utilisant une représentation binaire, ternaire, décimale utilisant des dispositifs n'établissant pas de contact, p. ex. tube, dispositif à l'état solideMéthodes ou dispositions pour effectuer des calculs en utilisant exclusivement une représentation numérique codée, p. ex. en utilisant une représentation binaire, ternaire, décimale utilisant des dispositifs non spécifiés pour l'évaluation de fonctions par calcul
G06N 3/063 - Réalisation physique, c.-à-d. mise en œuvre matérielle de réseaux neuronaux, de neurones ou de parties de neurone utilisant des moyens électroniques
Aspects relate to Input/Output (IO) Memory Management Units (MMUs) that include hardware structures for implementing virtualization. Some implementations allow guests to setup and maintain device IO tables within memory regions to which those guests have been given permissions by a hypervisor. Some implementations provide hardware page table walking capability within the IOMMU, while other implementations provide static tables. Such static tables may be maintained by a hypervisor on behalf of guests. Some implementations reduce a frequency of interrupts or invocation of hypervisor by allowing transactions to be setup by guests without hypervisor involvement within their assigned device IO regions. Devices may communicate with IOMMU to setup the requested memory transaction, and completion thereof may be signaled to the guest without hypervisor involvement. Various other aspects will be evident from the disclosure.
G06F 3/06 - Entrée numérique à partir de, ou sortie numérique vers des supports d'enregistrement
G06F 12/1027 - Traduction d'adresses utilisant des moyens de traduction d’adresse associatifs ou pseudo-associatifs, p. ex. un répertoire de pages actives [TLB]
G06F 12/0811 - Systèmes de mémoire cache multi-utilisateurs, multiprocesseurs ou multitraitement avec hiérarchies de mémoires cache multi-niveaux
G06F 12/0831 - Protocoles de cohérence de mémoire cache à l’aide d’un schéma de bus, p. ex. avec moyen de contrôle ou de surveillance
Techniques are disclosed for address manipulation using indices and tags. A first index is generated from bits of a processor program counter, where the first index is used to access a branch predictor bimodal table. A first branch prediction is provided from the bimodal table, based on the first index. The first branch prediction is matched against N tables, where the tables contain prior branch histories, and where: the branch history in table T(N) is of greater length than the branch history of table T(N−1), and the branch history in table T(N−1) is of greater length than the branch history of table T(N−2). A processor address is manipulated using a greatest length of hits of branch prediction matches from the N tables, based on one or more hits occurring. The branch predictor address is manipulated using the first branch prediction from the bimodal table, based on zero hits occurring.
Techniques are disclosed for address manipulation using indices and tags. A first index is generated from bits of a processor program counter, where the first index is used to access a branch predictor bimodal table. A first branch prediction is provided from the bimodal table, based on the first index. The first branch prediction is matched against N tables, where the tables contain prior branch histories, and where: the branch history in table T(N) is of greater length than the branch history of table T(N-1), and the branch history in table T(N-1) is of greater length than the branch history of table T(N-2). A processor address is manipulated using a greatest length of hits of branch prediction matches from the N tables, based on one or more hits occurring. The branch predictor address is manipulated using the first branch prediction from the bimodal table, based on zero hits occurring.
Techniques for neural network processing using mixed-precision data representation are disclosed. Access to a processor that supports single-precision operations is obtained, where the processor is used for neural network calculations. A first input data element and a second input data element are presented for manipulation on the processor, where the manipulation supports the neural network calculations. The first input data element is manipulated with the second input data element using the processor, where the first input data element comprises a 16-bit reduced floating point representation. A result of the manipulation is output, where the result comprises a single-precision data representation element. The result is forwarded to a next layer of the neural network, based on the outputting.
Techniques for neural network processing using specialized data representation are disclosed. Input data for manipulation in a layer of a neural network is obtained. The input data includes image data, where the image data is represented in bfloat16 format without loss of precision. The manipulation of the input data is performed on a processor that supports single-precision operations. The input data is converted to a 16-bit reduced floating-point representation, where the reduced floating-point representation comprises an alternative single-precision data representation mode. The input data is manipulated with one or more 16-bit reduced floating-point data elements. The manipulation includes a multiply and add-accumulate operation. The manipulation further includes a unary operation, a binary operation, or a conversion operation. A result of the manipulating is forwarded to a next layer of the neural network.
G06F 7/483 - Calculs avec des nombres représentés par une combinaison non linéaire de nombres codés, p. ex. nombres rationnels, système de numération logarithmique ou nombres à virgule flottante
G06N 3/04 - Architecture, p. ex. topologie d'interconnexion
Techniques for neural network processing using specialized data representation are disclosed. Input data for manipulation in a layer of a neural network is obtained. The input data includes image data, where the image data is represented in bfloatl6 format without loss of precision. The manipulation of the input data is performed on a processor that supports single-precision operations. The input data is converted to a 16-bit reduced floating-point representation, where the reduced floating-point representation comprises an alternative single-precision data representation mode. The input data is manipulated with one or more 16-bit reduced floating-point data elements. The manipulation includes a multiply and add- accumulate operation. The manipulation further includes a unary operation, a binary operation, or a conversion operation. A result of the manipulating is forwarded to a next layer of the neural network.
G06N 3/063 - Réalisation physique, c.-à-d. mise en œuvre matérielle de réseaux neuronaux, de neurones ou de parties de neurone utilisant des moyens électroniques
21.
Hardware virtualized input output memory management unit
Aspects relate to Input/Output (IO) Memory Management Units (MMUs) that include hardware structures for implementing virtualization. Some implementations allow guests to setup and maintain device IO tables within memory regions to which those guests have been given permissions by a hypervisor. Some implementations provide hardware page table walking capability within the IOMMU, while other implementations provide static tables. Such static tables may be maintained by a hypervisor on behalf of guests. Some implementations reduce a frequency of interrupts or invocation of hypervisor by allowing transactions to be setup by guests without hypervisor involvement within their assigned device IO regions. Devices may communicate with IOMMU to setup the requested memory transaction, and completion thereof may be signaled to the guest without hypervisor involvement. Various other aspects will be evident from the disclosure.
G06F 3/06 - Entrée numérique à partir de, ou sortie numérique vers des supports d'enregistrement
G06F 12/1027 - Traduction d'adresses utilisant des moyens de traduction d’adresse associatifs ou pseudo-associatifs, p. ex. un répertoire de pages actives [TLB]
G06F 12/0811 - Systèmes de mémoire cache multi-utilisateurs, multiprocesseurs ou multitraitement avec hiérarchies de mémoires cache multi-niveaux
G06F 12/0831 - Protocoles de cohérence de mémoire cache à l’aide d’un schéma de bus, p. ex. avec moyen de contrôle ou de surveillance
A processor is configured to implement an instruction set architecture for accessing data that includes loading data elements from a memory containing data blocks separated by block boundaries. The instruction set architecture includes a first type of data load instruction for loading an aligned data structure from the memory and a second type of data load instruction for loading an unaligned data structure from the memory. The loading includes fetching a data load instruction of the second type and loading from the memory according to the data load instruction of the second type. The resulting data structure formed of n consecutive data elements is determined from the data load instruction. The data structure loaded from memory is formed of n consecutive unaligned data elements. The processor is similarly configured to implement storing data elements from a set of registers to a memory containing data blocks separated by block boundaries.
Described herein are instruction set architectures (ISAs), and related data processing apparatuses and methods, with two or more non-contiguous blocks of preserved registers wherein the registers to be saved or restored are identified in a save or restore instruction via a number of registers to be saved/restored (Num_Reg) and a starting register (rStart). Specifically, in the ISAs, apparatuses, and methods described herein, the registers to be saved or restored are identified as the Num_Reg registers in a predetermined sequence starting with rStart wherein, in the predetermined sequence, each register is followed by the next highest numbered register except the highest numbered preserved register, which is followed by the lowest numbered preserved register.
Fault tolerant and fault detecting multi-threaded processors are described. Instructions from a program are executed by both a master thread and a slave thread and execution of the master thread is prioritized. If the master thread stalls or reaches a memory write after having executed a sequence of instructions, the slave thread executes a corresponding sequence of instructions, where at least the first and last instructions in the sequence are the same as the sequence executed by the master thread. When the slave thread reaches the point at which execution of the master thread stopped, the contents of register banks for both the threads are compared, and if they are the same, execution by the master thread is allowed to continue, and any buffered speculative writes are committed to the memory system.
A binary logic circuit for manipulating an input binary string includes a first stage of a first group of multiplexers arranged to select respective portions of an input binary string and configured to receive a respective first control. A second stage is included in which a plurality of a second group of multiplexers is arranged to select respective portions of the input binary string and configured to receive a respective second control signal. The control signals are provided such that each multiplexer of a second group is configured to select a respective second portion of the first binary string. Control circuitry is configured to generate the first and second control signals such that two or more of the first groups and/or two or more of the second groups of multiplexers are independently controllable.
Instruction set architectures (ISAs) and apparatus and methods related thereto comprise a variable length instruction set that includes one or more pointer-size controlled memory access instructions of a smaller length (e.g. 16 bits) wherein the size of the data accessed by such an instruction is dynamically determined based on the size of the pointer. Specifically, when a pointer-size controlled memory access instruction is received at a decode unit, the decode unit outputs one or more control signals to cause an execution unit to perform a memory access of a first size (e.g. 32 bits) when the pointer size is the first size (e.g. 32 bits), and output one or more control signals to cause the execution unit to perform a memory access of a second size (e.g. 64 bits) when the pointer size is the second size (e.g. 64 bits).
Instruction set architectures (ISAs) and apparatus and methods related thereto comprise an instruction set that includes one or more instructions which identify the global pointer (GP) register as an operand (e.g., base register or source register) of the instruction. Identification can be implicit. By implicitly identifying the GP register as an operand of the instruction, one or more bits of the instruction that were dedicated to explicitly identifying the operand (e.g., base register or source register) can be used to extend the size of one or more other operands, such as the offset or immediate, to provide longer offsets or immediates.
Instruction set architectures (ISAs) and apparatus and methods related thereto comprise an instruction set that includes one or more instructions which identify the global pointer (GP) register as an operand (e.g., base register or source register) of the instruction. Identification can be implicit. By implicitly identifying the GP register as an operand of the instruction, one or more bits of the instruction that were dedicated to explicitly identifying the operand (e.g., base register or source register) can be used to extend the size of one or more other operands, such as the offset or immediate, to provide longer offsets or immediates.
G06F 9/30 - Dispositions pour exécuter des instructions machines, p. ex. décodage d'instructions
G06F 5/01 - Procédés ou dispositions pour la conversion de données, sans modification de l'ordre ou du contenu des données maniées pour le décalage, p. ex. la justification, le changement d'échelle, la normalisation
G06F 9/32 - Formation de l'adresse de l'instruction suivante, p. ex. par incrémentation du compteur ordinal
G06F 9/34 - Adressage de l'opérande d'instruction ou du résultat ou accès à l'opérande d'instruction ou au résultat
A binary logic circuit for manipulating an input binary string includes a first stage of a first group of multiplexers arranged to select respective portions of an input binary string and configured to receive a respective first control. A second stage is included in which a plurality of a second group of multiplexers is arranged to select respective portions of the input binary string and configured to receive a respective second control signal. The control signals are provided such that each multiplexer of a second group is configured to select a respective second portion of the first binary string. Control circuitry is configured to generate the first and second control signals such that two or more of the first groups and/or two or more of the second groups of multiplexers are independently controllable.
G06F 5/01 - Procédés ou dispositions pour la conversion de données, sans modification de l'ordre ou du contenu des données maniées pour le décalage, p. ex. la justification, le changement d'échelle, la normalisation
G06F 9/30 - Dispositions pour exécuter des instructions machines, p. ex. décodage d'instructions
G06F 9/32 - Formation de l'adresse de l'instruction suivante, p. ex. par incrémentation du compteur ordinal
G06F 9/34 - Adressage de l'opérande d'instruction ou du résultat ou accès à l'opérande d'instruction ou au résultat
Fault tolerant and fault detecting multi-threaded processors are described. Instructions from a program are executed by both a master thread and a slave thread and execution of the master thread is prioritized. If the master thread stalls or reaches a memory write after having executed a sequence of instructions, the slave thread executes a corresponding sequence of instructions, where at least the first and last instructions in the sequence are the same as the sequence executed by the master thread. When the slave thread reaches the point at which execution of the master thread stopped, the contents of register banks for both the threads are compared, and if they are the same, execution by the master thread is allowed to continue, and any buffered speculative writes are committed to the memory system.
G06F 11/14 - Détection ou correction d'erreur dans les données par redondance dans les opérations, p. ex. en utilisant différentes séquences d'opérations aboutissant au même résultat
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire
31.
Check pointing a shift register with a circular buffer
Hardware structures for check pointing a main shift register one or more times which include a circular buffer used to store the data elements most recently shifted onto the main shift register which has an extra data position for each check point and an extra data position for each restorable point in time; an update history shift register which has a data position for each check point which is used to store information indicating whether the circular buffer was updated in a particular clock cycle; a pointer that identifies a subset of the data positions of the circular buffer as active data positions; and check point generation logic that derives each check point by selecting a subset of the active data positions based on the information stored in the update history shift register.
G06F 3/06 - Entrée numérique à partir de, ou sortie numérique vers des supports d'enregistrement
G06F 5/01 - Procédés ou dispositions pour la conversion de données, sans modification de l'ordre ou du contenu des données maniées pour le décalage, p. ex. la justification, le changement d'échelle, la normalisation
G06F 9/30 - Dispositions pour exécuter des instructions machines, p. ex. décodage d'instructions
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire
32.
Decoding instructions that are modified by one or more other instructions
Methods and apparatus are provided for decoding instructions in a computer program wherein the instructions include one or more base instructions that are subject to modification by one or more other instructions. A decoder determines whether a first received instruction was arrived at by a non-incremental change to a program counter (i.e. a jump in the program). If the first instruction was arrived at by a non-incremental change to the program counter the decoder decodes the immediately preceding instruction to determine if the original instruction is a base instruction subject to modification by one or more other instructions. If the preceding instruction indicates that the original instruction is a base instruction an error has occurred and exception handling code is invoked.
Methods and migration units for use in out-of-order processors for migrating data to register file caches associated with functional units of the processor to satisfy register read operations. The migration unit receives register read operations to be executed for a particular functional unit. The migration unit reviews entries in a register renaming table to determine if the particular functional unit has recently accessed the source register and thus is likely to comprise an entry for the source register in its register file cache. In particular, the register renaming table comprises entries for physical registers that indicate what functional units have accessed the physical register. If the particular functional unit has not accessed the particular physical register the migration unit migrates data to the register file cache associated with the particular functional unit.
G06F 12/0875 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p. ex. mémoires cache avec mémoire cache dédiée, p. ex. instruction ou pile
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire
G06F 9/30 - Dispositions pour exécuter des instructions machines, p. ex. décodage d'instructions
Embodiments disclosed pertain to apparatuses, systems, and methods for Translation Lookaside Buffers (TLBs) that support visualization and multi-threading. Disclosed embodiments pertain to a TLB that includes a content addressable memory (CAM) with variable page size entries and a set associative memory with fixed page size entries. The CAM may include: a first set of logically contiguous entry locations, wherein the first set comprises a plurality of subsets, and each subset comprises logically contiguous entry locations for exclusive use of a corresponding virtual processing element (VPE); and a second set of logically contiguous entry locations, distinct from the first set, where the entry locations in the second set may be shared among available VPEs. The set associative memory may comprise a third set of logically contiguous entry locations shared among the available VPEs distinct from the first and second set of entry locations.
G06F 12/1027 - Traduction d'adresses utilisant des moyens de traduction d’adresse associatifs ou pseudo-associatifs, p. ex. un répertoire de pages actives [TLB]
G06F 12/0844 - Accès à une mémoire cache à accès multiples simultanés ou quasi-simultanés
G06F 12/1045 - Traduction d'adresses utilisant des moyens de traduction d’adresse associatifs ou pseudo-associatifs, p. ex. un répertoire de pages actives [TLB] associée à une mémoire cache de données
G06F 12/1009 - Traduction d'adresses avec tables de pages, p. ex. structures de table de page
Methods and indirect branch predictor logic units to predict the target addresses of indirect branch instructions. The method comprises storing in a table predicted target addresses for indirect branch instructions indexed by a combination of the indirect path history for previous indirect branch instructions and the taken/not-taken history for previous conditional branch instructions. When a new indirect branch instruction is received for prediction, the indirect path history and the taken/not-taken history are combined to generate an index for the indirect branch instruction. The generated index is then used to identify a predicted target address in the table. If the identified predicted target address is valid, then the target address of the indirect branch instruction is predicted to be the predicted target address.
A method in an instruction fetch unit configured to initiate a fetch of an instruction bundle from a first memory and to initiate a fetch of an instruction bundle from a second memory, wherein a fetch from the second memory takes a predetermined fixed plurality of processor cycles, the method comprising: identifying that an instruction bundle is to be selected for fetching from the second memory in a predetermined future processor cycle; and initiating a fetch of the identified instruction bundle from the second memory a number of processor cycles prior to the predetermined future processor cycle based upon the predetermined fixed plurality of processor cycles taken to fetch from the second memory.
A system and method process atomic instructions. A processor system includes a load store unit (LSU), first and second registers, a memory interface, and a main memory. In response to a load link (LL) instruction, the LSU loads first data from memory into the first register and sets an LL bit (LLBIT) to indicate a sequence of atomic instructions is being executed. The LSU further loads second data from memory into the second register in response to a load (LD) instruction. The LSU places a value of the second register into the memory interface in response to a store conditional coupled (SCX) instruction. When the LLBIT is set and in response to a store (SC) instruction, the LSU places a value of the second register into the memory interface and commits the first and second register values in the memory interface into the main memory when the LLBIT is set.
G06F 9/30 - Dispositions pour exécuter des instructions machines, p. ex. décodage d'instructions
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire
G06F 12/0875 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p. ex. mémoires cache avec mémoire cache dédiée, p. ex. instruction ou pile
G06F 12/0897 - Mémoires cache caractérisées par leur organisation ou leur structure avec plusieurs niveaux de hiérarchie de mémoire cache
G06F 12/0817 - Protocoles de cohérence de mémoire cache à l’aide de méthodes de répertoire
G06F 12/0811 - Systèmes de mémoire cache multi-utilisateurs, multiprocesseurs ou multitraitement avec hiérarchies de mémoires cache multi-niveaux
Methods and systems for improved control of traffic generated by a processor are described. In an embodiment, when a device generates a pre-fetch request for a piece of data or an instruction from a memory hierarchy, the device includes a pre-fetch identifier in the request. This identifier flags the request as a pre-fetch request rather than a non-pre-fetch request, such as a time-critical request. Based on this identifier, the memory hierarchy can then issue an abort response at times of high traffic which suppresses the pre-fetch traffic, as the pre-fetch traffic is not fulfilled by the memory hierarchy. On receipt of an abort response, the device deletes at least a part of any record of the pre-fetch request and if the data/instruction is later required, a new request is issued at a higher priority than the original pre-fetch request.
G06F 12/0862 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p. ex. mémoires cache avec pré-lecture
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire
G06F 12/0891 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p. ex. mémoires cache utilisant des moyens d’effacement, d’invalidation ou de réinitialisation
G06F 9/30 - Dispositions pour exécuter des instructions machines, p. ex. décodage d'instructions
G06F 12/08 - Adressage ou affectationRéadressage dans des systèmes de mémoires hiérarchiques, p. ex. des systèmes de mémoire virtuelle
H04L 12/851 - Actions liées au type de trafic, p.ex. qualité de service ou priorité
H04L 12/911 - Contrôle d’admission au réseau et allocation de ressources, p.ex. allocation de bande passante ou renégociation en cours de communication
H04L 29/06 - Commande de la communication; Traitement de la communication caractérisés par un protocole
H04L 29/08 - Procédure de commande de la transmission, p.ex. procédure de commande du niveau de la liaison
G06F 12/0897 - Mémoires cache caractérisées par leur organisation ou leur structure avec plusieurs niveaux de hiérarchie de mémoire cache
Techniques for executing a load instruction in a processor are described. In one example, load instructions which are detected to have an offset (or displacement) of zero are sent directly to a data cache, bypassing the address generation stage thereby reducing pipeline length. Load instructions having a nonzero offset can be executed in an address generation stage as is conventional. To avoid conflicts between a current load instruction with zero offset and a previous load instruction with nonzero offset, the current instruction can be rescheduled or sent through a separate dedicated load pipe. An alternative technique permits a load instruction with zero offset to be issued one cycle earlier than it would need to be if it had a nonzero offset, thus reducing load latency.
A method and apparatus are provided for executing instructions of a multi-threaded processor having multiple hardware threads with differing hardware resources comprising the steps of receiving a plurality of streams of instructions and determining which hardware threads are able to receive instructions for execution, determining whether a thread determined to be available for executing an instructions has the hardware resources available required by that instructions and executing the instruction in dependence on the result of the determination.
Methods and migration units for use in out-of-order processors for migrating data to register file caches associated with functional units of the processor to satisfy register read operations. The migration unit receives register read operations to be executed for a particular functional unit. The migration unit reviews entries in a register renaming table to determine if the particular functional unit has recently accessed the source register and thus is likely to comprise an entry for the source register in its register file cache. In particular, the register renaming table comprises entries for physical registers that indicate what functional units have accessed the physical register. If the particular functional unit has not accessed the particular physical register the migration unit migrates data to the register file cache associated with the particular functional unit.
G06F 12/0875 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p. ex. mémoires cache avec mémoire cache dédiée, p. ex. instruction ou pile
G06F 9/30 - Dispositions pour exécuter des instructions machines, p. ex. décodage d'instructions
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire
A fetch ahead branch target buffer is used by a branch predictor to determine a target address for a branch instruction based on a fetch pointer for a previous fetch bundle, i.e. a fetch bundle which is fetched prior to a fetch bundle which includes the branch instruction. An entry in the fetch ahead branch target buffer corresponds to one branch instruction and comprises a data portion identifying the target address of that branch instruction. In various examples, an entry also comprises a tag portion which stores data identifying the fetch pointer by which the entry is indexed. Branch prediction is performed by matching an index generated using a received fetch pointer to the tag portions to identify a matching entry and then determining the target address for the branch instruction from the data portion of the matching entry.
G06F 9/30 - Dispositions pour exécuter des instructions machines, p. ex. décodage d'instructions
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire
G06F 12/0875 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p. ex. mémoires cache avec mémoire cache dédiée, p. ex. instruction ou pile
Methods and systems that reduce the number of instance of a shared resource needed for a processor to perform an operation and/or execute a process without impacting function are provided. a method of processing in a processor is provided. Aspects include determining that an operation to be performed by the processor will require the use of a shared resource. A command can be issued to cause a second operation to not use the shared resources N cycles later. The shared resource can then be used for a first aspect of the operation at cycle X and then used for a second aspect of the operation at cycle X+N. The second operation may be rescheduled according to embodiments.
Methods and reservation stations for selecting instructions to issue to a functional unit of an out-of-order processor. The method includes classifying each instruction into one of a number of categories based on the type of instruction. Once classified an instruction is stored in an instruction queue corresponding to the category in which it was classified. Instructions are then selected from one or more of the instruction queues to issue to the functional unit based on a relative priority of the plurality of types of instructions. This allows certain types of instructions (e.g. control transfer instructions, flag setting instructions and/or address generation instructions) to be prioritized over other types of instructions even if they are younger.
A fetch unit configured to, in response to detecting a subroutine call and link instruction, calculate and store a predicted target address for the corresponding subroutine return instruction in a prediction stack, and if certain conditions are met, also cause to be stored in the prediction stack a predicted target instruction bundle. The fetch unit is also configured to, in response to detecting a subroutine return instruction, use the predicted target address in the prediction stack to determine the address of the next instruction bundle to be fetched, and if certain conditions are met, cause any valid predicted target instruction bundle in the prediction stack to be the next bundle to be decoded.
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire
G06F 9/30 - Dispositions pour exécuter des instructions machines, p. ex. décodage d'instructions
G06F 12/0875 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p. ex. mémoires cache avec mémoire cache dédiée, p. ex. instruction ou pile
46.
Check pointing a shift register using a circular buffer
Hardware structures for check pointing a main shift register one or more times which include a circular buffer used to store the data elements most recently shifted onto the main shift register which has an extra data position for each check point and an extra data position for each restorable point in time; an update history shift register which has a data position for each check point which is used to store information indicating whether the circular buffer was updated in a particular clock cycle; a pointer that identifies a subset of the data positions of the circular buffer as active data positions; and check point generation logic that derives each check point by selecting a subset of the active data positions based on the information stored in the update history shift register.
G06F 3/06 - Entrée numérique à partir de, ou sortie numérique vers des supports d'enregistrement
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire
G06F 5/01 - Procédés ou dispositions pour la conversion de données, sans modification de l'ordre ou du contenu des données maniées pour le décalage, p. ex. la justification, le changement d'échelle, la normalisation
G06F 9/30 - Dispositions pour exécuter des instructions machines, p. ex. décodage d'instructions
Embodiments disclosed pertain to apparatuses, systems, and methods for Translation Lookaside Buffers (TLBs) that support virtualization and multi-threading. Disclosed embodiments pertain to a TLB that includes a content addressable memory (CAM) with variable page size entries and a set associative memory with fixed page size entries. The CAM may include: a first set of logically contiguous entry locations, wherein the first set comprises a plurality of subsets, and each subset comprises logically contiguous entry locations for exclusive use of a corresponding virtual processing element (VPE); and a second set of logically contiguous entry locations, distinct from the first set, where the entry locations in the second set may be shared among available VPEs. The set associative memory may comprise a third set of logically contiguous entry locations shared among the available VPEs distinct from the first and second set of entry locations.
G06F 12/1027 - Traduction d'adresses utilisant des moyens de traduction d’adresse associatifs ou pseudo-associatifs, p. ex. un répertoire de pages actives [TLB]
G06F 12/0844 - Accès à une mémoire cache à accès multiples simultanés ou quasi-simultanés
G06F 12/1045 - Traduction d'adresses utilisant des moyens de traduction d’adresse associatifs ou pseudo-associatifs, p. ex. un répertoire de pages actives [TLB] associée à une mémoire cache de données
G06F 12/1009 - Traduction d'adresses avec tables de pages, p. ex. structures de table de page
A fault tolerant multi-threaded processor uses the temporal and/or spatial separation of instructions running in two or more different threads. An instruction is fetched, decoded and executed by each of two or more threads to generate a result for each of the two or more threads. These results are then compared using comparison hardware logic and if there is a mismatch between the results obtained, then an error or event is raised. The comparison is performed on an instruction by instruction basis so that errors are identified (and hence can be resolved) quickly.
G06F 11/00 - Détection d'erreursCorrection d'erreursContrôle de fonctionnement
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire
G06F 11/14 - Détection ou correction d'erreur dans les données par redondance dans les opérations, p. ex. en utilisant différentes séquences d'opérations aboutissant au même résultat
Cache operation in a multi-threaded processor uses a small memory structure referred to as a way enable table that stores an index to an n-way set associative cache. The way enable table includes one entry for each entry in the n-way set associative cache and each entry in the way enable table is arranged to store a thread ID. The thread ID in an entry in the way enable table is the ID of the thread associated with a data item stored in the corresponding entry in the n-way set associative cache. Prior to reading entries from the n-way set associative cache identified by an index parameter, the ways in the cache are selective enabled based on a comparison of the current thread ID and the thread IDs stored in entries in the way enable table which are identified by the same index parameter.
A method and load and store buffer for issuing a load instruction to a data cache. The method includes determining whether there are any unresolved store instructions in the store buffer that are older than the load instruction. If there is at least one unresolved store instruction in the store buffer older than the load instruction, it is determined whether the oldest unresolved store instruction in the store buffer is within a speculation window for the load instruction. If the oldest unresolved store instruction is within the speculation window for the load instruction, the load instruction is speculatively issued to the data cache. Otherwise, the load instruction is stalled until any unresolved store instructions outside the speculation window are resolved. The speculation window is a short window that defines a number of instructions or store instructions that immediately precede the load instruction.
An integrated circuit implements a multistage processing pipeline, where control is passed in the pipeline with data to be processed according to the control. At least some of the different pipeline stages can be implemented by different circuits, being clocked at different frequencies. These frequencies may change dynamically during operation of the integrated circuit. Control and data to be processed according to such control can be offset from each other in the pipeline; e.g., control can precede data by a pre-set number of clock events. To cross a clock domain, control and data can be temporarily stored in respective FIFOs. Reading of control by the destination domain is delayed by a delay amount determined so that reading of control and data can be offset from each other by a minimum number of clock events of the destination domain clock, and control is read before data is available for reading.
G06F 13/00 - Interconnexion ou transfert d'information ou d'autres signaux entre mémoires, dispositifs d'entrée/sortie ou unités de traitement
G06F 13/42 - Protocole de transfert pour bus, p. ex. liaisonSynchronisation
H04L 7/00 - Dispositions pour synchroniser le récepteur avec l'émetteur
G06F 5/06 - Procédés ou dispositions pour la conversion de données, sans modification de l'ordre ou du contenu des données maniées pour modifier la vitesse de débit des données, c.-à-d. régularisation de la vitesse
G06F 15/00 - Calculateurs numériques en généralÉquipement de traitement de données en général
52.
Register file having a plurality of sub-register files
Register files for use in an out-of-order processor that have been divided into a plurality of sub-register files. The register files also have a plurality of buffers which are each associated with one of the sub-register files. Each buffer receives and stores write operations destined for the associated sub-register file which can be later issued to the sub-register file. Specifically, each clock cycle it is determined whether there is at least one write operation in the buffer that has not been issued to the associated sub-register file. If there is at least one write operation in the buffer that has not been issued to the associated sub-register file, one of the non-issued write operations is issued to the associated sub-register file. Each sub-register file may also have an arbitration logic unit which resolves conflicts between read and write operations that want to access the associated sub-register file in the same cycle by prioritizing read operations unless a conflicting write instruction has reached commit time.
G06F 12/00 - Accès à, adressage ou affectation dans des systèmes ou des architectures de mémoires
G06F 9/30 - Dispositions pour exécuter des instructions machines, p. ex. décodage d'instructions
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire
G06F 12/0875 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p. ex. mémoires cache avec mémoire cache dédiée, p. ex. instruction ou pile
G06F 13/00 - Interconnexion ou transfert d'information ou d'autres signaux entre mémoires, dispositifs d'entrée/sortie ou unités de traitement
G06F 13/28 - Gestion de demandes d'interconnexion ou de transfert pour l'accès au bus d'entrée/sortie utilisant le transfert par rafale, p. ex. acces direct à la mémoire, vol de cycle
53.
Prioritising of instruction fetching in microprocessor systems
A method and a system are provided for prioritising the fetching of instructions for each of a plurality of executing instruction threads in a multi-threaded processor. Instructions come from at least one source of instructions. Each thread has a number of threads buffered for execution in an instruction buffer. A first metric for each thread is determined based on the number of instructions currently buffered. A second metric is then determined for each thread, this being an execution based metric. A priority order for the threads is determined from the first and second metrics, and an instruction is fetched from the source for the thread with the highest determined priority which is requesting an instruction.
Methods and migration units for use in out-of-order processors for migrating data to register file caches associated with functional units of the processor to satisfy register read operations. The migration unit receives register read operations to be executed for a particular functional unit. The migration unit reviews entries in a register renaming table to determine if the particular functional unit has recently accessed the source register and thus is likely to comprise an entry for the source register in its register file cache. In particular, the register renaming table comprises entries for physical registers that indicate what functional units have accessed the physical register. If the particular functional unit has not accessed the particular physical register the migration unit migrates data to the register file cache associated with the particular functional unit.
G06F 12/08 - Adressage ou affectationRéadressage dans des systèmes de mémoires hiérarchiques, p. ex. des systèmes de mémoire virtuelle
G06F 9/30 - Dispositions pour exécuter des instructions machines, p. ex. décodage d'instructions
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire
G06F 12/0875 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p. ex. mémoires cache avec mémoire cache dédiée, p. ex. instruction ou pile
55.
Method and apparatus for scheduling the issue of instructions in a multithreaded processor
A method to dynamically determine which instructions from a plurality of available instructions to issue in each clock cycle in a multithreaded processor capable of issuing a plurality of instructions in each clock cycle, comprises the steps of: determining a highest priority instruction from the plurality of available instructions; determining the compatibility of the highest priority instruction with each of the remaining available instructions; and issuing the highest priority instruction together with other instructions compatible with the highest priority instruction in the same clock cycle; wherein the highest priority instruction cannot be a speculative instruction. The effect of this is that speculative instructions are only ever issued together with at least one non-speculative instruction.
In one aspect, a processor has a register file, a private Level 1 (L1) cache, and an interface to a shared memory hierarchy (e.g., an Level 2 (L2) cache and so on). The processor has a Load Store Unit (LSU) that handles decoded load and store instructions. The processor may support out of order and multi-threaded execution. As store instructions arrive at the LSU for processing, the LSU determines whether a counter, from a set of counters, is allocated to a cache line affected by each store. If not, the LSU allocates a counter. If so, then the LSU updates the counter. Also, in response to a store instruction, affecting a cache line neighboring a cache line that has a counter that meets a criteria, the LSU characterizes that store instruction as one to be effected without obtaining ownership of the effected cache line, and provides that store to be serviced by an element of the shared memory hierarchy.
G06F 12/08 - Adressage ou affectationRéadressage dans des systèmes de mémoires hiérarchiques, p. ex. des systèmes de mémoire virtuelle
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire
G06F 12/084 - Systèmes de mémoire cache multi-utilisateurs, multiprocesseurs ou multitraitement avec mémoire cache partagée
G06F 12/0811 - Systèmes de mémoire cache multi-utilisateurs, multiprocesseurs ou multitraitement avec hiérarchies de mémoires cache multi-niveaux
G06F 12/0815 - Protocoles de cohérence de mémoire cache
G06F 12/0842 - Systèmes de mémoire cache multi-utilisateurs, multiprocesseurs ou multitraitement pour multitraitement ou multitâche
G06F 12/0888 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p. ex. mémoires cache utilisant la mémorisation cache sélective, p. ex. la purge du cache
57.
Method and apparatus for ensuring data cache coherency
A multithreaded processor can concurrently execute a plurality of threads in a processor core. The threads can access a shared main memory through a memory interface; the threads can generate read and write transactions that cause shared main memory access. An incoherency detection module prevents incoherency by maintaining a record of outstanding global writes, and detecting a conflicting global read. A barrier is sequenced with the conflicting global write. The conflicting global read is allowed to proceed after the sequence of the conflicting global write and the barrier are cleared. The sequence can be maintained by a separate queue for each thread of the plurality.
G06F 12/00 - Accès à, adressage ou affectation dans des systèmes ou des architectures de mémoires
G06F 12/0815 - Protocoles de cohérence de mémoire cache
G06F 9/45 - Compilation ou interprétation de langages de programmation évolués
G06F 12/0855 - Accès de mémoire cache en chevauchement, p. ex. pipeline
G06F 12/0817 - Protocoles de cohérence de mémoire cache à l’aide de méthodes de répertoire
G06F 12/0875 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p. ex. mémoires cache avec mémoire cache dédiée, p. ex. instruction ou pile
G06F 9/52 - Synchronisation de programmesExclusion mutuelle, p. ex. au moyen de sémaphores
G06F 12/0842 - Systèmes de mémoire cache multi-utilisateurs, multiprocesseurs ou multitraitement pour multitraitement ou multitâche
58.
Conditional branch prediction using a long history
Methods and conditional branch predictors for predicting an outcome of a conditional branch instruction in a program executed by a processor using a long conditional branch history include generating a first index from a first portion of the conditional branch history and a second index from a second portion of the conditional branch history. The first index is then used to identify an entry in a first pattern history table including first prediction information; and the second index is used to identify an entry in a second pattern history table including second prediction information. The outcome of the conditional branch is predicted based on the first and second prediction information.
Methods of running a 32-bit operating system on a 64-bit processor are described. In an embodiment, the processor comprises 64-bit hardware and when running a 64-bit operating system operates as a single-threaded processor. However, when running a 32-bit operating system (which may be a guest operating system running on a virtual machine), the processor operates as a two-threaded core. The register file is logically divided into two portions, one for each thread, and logic within a functional unit may be split between threads, shared between threads or duplicated to provide an instance of the logic for each thread. Configuration bits may be set to indicate whether the processor should operate as a single-threaded or multi-threaded device.
G06F 9/30 - Dispositions pour exécuter des instructions machines, p. ex. décodage d'instructions
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire
G06F 9/455 - ÉmulationInterprétationSimulation de logiciel, p. ex. virtualisation ou émulation des moteurs d’exécution d’applications ou de systèmes d’exploitation
Methods and systems that reduce the number of instance of a shared resource needed for a processor to perform an operation and/or execute a process without impacting function are provided. a method of processing in a processor is provided. Aspects include determining that an operation to be performed by the processor will require the use of a shared resource. A command can be issued to cause a second operation to not use the shared resources N cycles later. The shared resource can then be used for a first aspect of the operation at cycle X and then used for a second aspect of the operation at cycle X+N. The second operation may be rescheduled according to embodiments.
Methods and apparatus are provided for decoding instructions in a computer program wherein the instructions include one or more base instructions that are subject to modification by one or more other instructions. A decoder determines whether a first received instruction was arrived at by a non-incremental change to a program counter (i.e. a jump in the program). If the first instruction was arrived at by a non-incremental change to the program counter the decoder decodes the immediately preceding instruction to determine if the original instruction is a base instruction subject to modification by one or more other instructions. If the preceding instruction indicates that the original instruction is a base instruction an error has occurred and exception handling code is invoked.
A technique for restoring a register renaming map is described. In one example, a restore table having a number of storage locations saves a copy of the register renaming map whenever a flow-risk instruction is passed to a re-order buffer. When all storage locations are full, further instructions still pass to the re-order buffer, but a copy of the map is not saved. A storage location subsequently becomes available when its associated flow-risk instruction is executed. A register renaming map state for an unrecorded flow-risk instruction passed to the re-order buffer whilst the storage locations were full is generated and stored in the available location. This is generated using the restore table entry for a previous flow-risk instruction and re-order buffer values for intervening instructions between the previous and unrecorded flow-risk instructions. The restore table can be used to restore the map if an unexpected change in instruction flow occurs.
G06F 7/38 - Méthodes ou dispositions pour effectuer des calculs en utilisant exclusivement une représentation numérique codée, p. ex. en utilisant une représentation binaire, ternaire, décimale
G06F 15/76 - Architectures de calculateurs universels à programmes enregistrés
G06F 9/30 - Dispositions pour exécuter des instructions machines, p. ex. décodage d'instructions
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire
63.
Allocating resources to threads based on speculation metric
Methods, reservation stations and processors for allocating resources to a plurality of threads based on the extent to which the instructions associated with each of the threads are speculative. The method comprises receiving a speculation metric for each thread at a reservation station. Each speculation metric represents the extent to which the instructions associated with a particular thread are speculative. The more speculative an instruction, the more likely the instruction has been incorrectly predicted by a branch predictor. The reservation station then allocates functional unit resources (e.g. pipelines) to the threads based on the speculation metrics and selects a number of instructions from one or more of the threads based on the allocation. The selected instructions are then issued to the functional unit resources.
In one aspect, a pipelined ECC-protected cache access method and apparatus provides that during a normal operating mode, for a given cache transaction, a tag comparison action and a data RAM read are performed speculatively in a time during which an ECC calculation occurs. If a correctable error occurs, the tag comparison action and data RAM are repeated and an error mode is entered. Subsequent transactions are processed by performing the ECC calculation, without concurrent speculative actions, and a tag comparison and read are performed using only the tag data available after the ECC calculation. A reset to normal mode is effected by detecting a gap between transactions that is sufficient to avoid a conflict for use of tag comparison circuitry for an earlier transaction having a repeated tag comparison and a later transaction having a speculative tag comparison.
G06F 11/10 - Détection ou correction d'erreur par introduction de redondance dans la représentation des données, p. ex. en utilisant des codes de contrôle en ajoutant des chiffres binaires ou des symboles particuliers aux données exprimées suivant un code, p. ex. contrôle de parité, exclusion des 9 ou des 11
65.
Modeless instruction execution with 64/32-bit addressing
In an aspect, a processor supports modeless execution of 64 bit and 32 bit instructions. A Load/Store Unit (LSU) decodes an instruction that without explicit opcode data indicating whether the instruction is to operate in a 32 or 64 bit memory address space. LSU treats the instruction either as a 32 or 64 bit instruction in dependence on values in an upper 32 bits of one or more 64 bit operands supplied to create an effective address in memory. In an example, a 4 GB space addressed by 32-bit memory space is divided between upper and lower portions of a 64-bit address space, such that a 32-bit instruction is differentiated from a 64-bit instruction in dependence on whether an upper 32 bits of one or more operands is either all binary 1 or all binary 0. Such a processor may support decoding of different arithmetic instructions for 32-bit and 64-bit operations.
A method provides for decoding, in a microprocessor, an instruction into data identifying a first register, a second register, an immediate value, and an opcode identifier. The opcode identifier is interpreted as indicating that an arithmetic operation is to be performed on the first register and the second register, and that the microprocessor is to perform a change of control operation in response to the addition of the first register and the second register causing overflow or underflow. The change of control operation is to a location in a program determined based on the immediate value. A processor can be provided with a decoder and other supporting circuitry to implement such method. Overflow/underflow can be checked on word boundaries of a double-word operation.
A return stack buffer (RSB) is modified such that each entry comprises two or more address slots. When a function is called, the address following the function call is pushed to the RSB and stored in a selected one of the address slots in a top entry in the RSB. One or more pointer bits within the entry are set to indicate which slot the address was stored in.
A return stack buffers (RSB) is modified to store index values instead of addresses. When a function is called, the address following the function call is stored in a look-up table and the index at which the address is stored is pushed to the RSB. When a function returns, an index is popped from the RSB and used to identify an address in the look-up table. In another embodiment, the RSB is modified such that each entry comprises two or more address slots. When a function is called, the address following the function call is pushed to the RSB and stored in a selected one of the address slots in a top entry in the RSB. One or more pointer bits within the entry are set to indicate which slot the address was stored in.
Methods of predicting stack pointer values of variables stored in a stack are described. When an instruction is seen which stores a variable in the stack in a position offset from the stack pointer, an entry is added to a data structure which identifies the physical register which currently stores the stack pointer, the physical register which stores the value of the variable and the offset value. Subsequently when an instruction to load a variable from the stack from a position which is identified by reference to the stack pointer is seen, the data structure is searched to see if there is a corresponding entry which includes the same offset and the same physical register storing the stack pointer as the load instruction. If a corresponding entry is found the architectural register in the load instruction is mapped to the physical register storing the value of the variable from the entry.
Methods and apparatus for predicting the value of a stack pointer which store data when an instruction is seen which grows the stack. The information which is stored includes a size parameter which indicates by how much the stack is grown and one or both of: the register ID currently holding the stack pointer value or the current stack pointer value. When a subsequent instruction shrinking the stack is seen, the stored data is searched for one or more entries which has a corresponding size parameter. If such an entry is identified, the other information stored in that entry is used to predict the value of the stack pointer instead of using the instruction to calculate the new stack pointer value. Where register renaming is used, the information in the entry is used to remap the stack pointer to a different physical register.
G06F 9/30 - Dispositions pour exécuter des instructions machines, p. ex. décodage d'instructions
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire
G06F 12/0875 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p. ex. mémoires cache avec mémoire cache dédiée, p. ex. instruction ou pile
71.
Processor with virtualized instruction set architecture and methods
A processor comprises a decoder for decoding an instruction based both on an explicit opcode identifier and on metadata encoded in the instruction. For example, a relative order of source register names may be used to decode the instruction. As an example, an instruction set may have a Branch Equal (BEQ) specifying two registers (r1 and r2) that store values that are compared for equality. An instruction set can provide a single opcode identifier for BEQ and a processor can determine whether to decode a particular instance of that opcode identifier as BEQ or another instruction, in dependence on an order of appearance of the source registers in that instance. For example, the BEQ opcode can be interpreted as a branch not equal, if a higher numbered register appears before a lower numbered register. Additional forms of metadata can include interpreting a constant included in an instruction, as well as determining equality of source registers, among other forms of metadata.
Systems and methods of user interface for facilitation of high level generation of processor extensions. In accordance with a method embodiment of the present invention, an instruction format is accessed at a graphical user interface. A programming language description of a computation element for an execution unit of the processor extension is accessed. A representation of a hardware design for the processor extension comprising the instruction format and the computation element is generated.
Global register protection in a multi-threaded processor is described. In an embodiment, global resources within a multi-threaded processor are protected by performing checks, before allowing a thread to write to a global resource, to determine whether the thread has write access to the particular global resource. The check involves accessing one or more local control registers or a global control field within the multi-threaded processor and in an example, a local register associated with each other thread in the multi-threaded processor is accessed and checked to see whether it contains an identifier for the particular global resource. Only if none of the accessed local resources contain such an identifier, is the instruction issued and the thread allowed to write to the global resource. Otherwise, the instruction is blocked and an exception may be raised to alert the program that issued the instruction that the write failed.
Methods and reservation stations for selecting instructions to issue to a functional unit of an out-of-order processor. The method includes classifying each instruction into one of a number of categories based on the type of instruction. Once classified an instruction is stored in an instruction queue corresponding to the category in which it was classified. Instructions are then selected from one or more of the instruction queues to issue to the functional unit based on a relative priority of the plurality of types of instructions. This allows certain types of instructions (e.g. control transfer instructions, flag setting instructions and/or address generation instructions) to be prioritized over other types of instructions even if they are younger.
An integrated circuit implements a multistage processing pipeline, where control is passed in the pipeline with data to be processed according to the control. At least some of the different pipeline stages can be implemented by different circuits, being clocked at different frequencies. These frequencies may change dynamically during operation of the integrated circuit. Control and data to be processed according to such control can be offset from each other in the pipeline; e.g., control can precede data by a pre-set number of clock events. To cross a clock domain, control and data can be temporarily stored in respective FIFOs. Reading of control by the destination domain is delayed by a delay amount determined so that reading of control and data can be offset from each other by a minimum number of clock events of the destination domain clock, and control is read before data is available for reading.
G06F 13/00 - Interconnexion ou transfert d'information ou d'autres signaux entre mémoires, dispositifs d'entrée/sortie ou unités de traitement
G06F 13/42 - Protocole de transfert pour bus, p. ex. liaisonSynchronisation
H04L 7/00 - Dispositions pour synchroniser le récepteur avec l'émetteur
G06F 5/06 - Procédés ou dispositions pour la conversion de données, sans modification de l'ordre ou du contenu des données maniées pour modifier la vitesse de débit des données, c.-à-d. régularisation de la vitesse
G06F 15/00 - Calculateurs numériques en généralÉquipement de traitement de données en général
76.
System for providing trace data in a data processor having a pipelined architecture
The invention is a method and system for providing trace data in a pipelined data processor. Aspects of the invention include providing a trace pipeline in parallel to the execution pipeline, providing trace information on whether conditional instructions complete or not, providing trace information on the interrupt status of the processor, replacing instructions in the processor with functionally equivalent instructions that also produce trace information and modifying the scheduling of instructions in the processor based on the occupancy of a trace output buffer.
Methods and systems for improved control of traffic generated by a processor are described. In an embodiment, when a device generates a pre-fetch request for a piece of data or an instruction from a memory hierarchy, the device includes a pre-fetch identifier in the request. This identifier flags the request as a pre-fetch request rather than a non-pre-fetch request, such as a time-critical request. Based on this identifier, the memory hierarchy can then issue an abort response at times of high traffic which suppresses the pre-fetch traffic, as the pre-fetch traffic is not fulfilled by the memory hierarchy. On receipt of an abort response, the device deletes at least a part of any record of the pre-fetch request and if the data/instruction is later required, a new request is issued at a higher priority than the original pre-fetch request.
G06F 3/00 - Dispositions d'entrée pour le transfert de données destinées à être traitées sous une forme maniable par le calculateurDispositions de sortie pour le transfert de données de l'unité de traitement à l'unité de sortie, p. ex. dispositions d'interface
G06F 9/46 - Dispositions pour la multiprogrammation
G06F 12/0862 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p. ex. mémoires cache avec pré-lecture
G06F 12/0891 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p. ex. mémoires cache utilisant des moyens d’effacement, d’invalidation ou de réinitialisation
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire
G06F 12/0897 - Mémoires cache caractérisées par leur organisation ou leur structure avec plusieurs niveaux de hiérarchie de mémoire cache
Methods and indirect branch predictor logic units to predict the target addresses of indirect branch instructions. The method comprises storing in a table predicted target addresses for indirect branch instructions indexed by a combination of the indirect path history for previous indirect branch instructions and the taken/not-taken history for previous conditional branch instructions. When a new indirect branch instruction is received for prediction, the indirect path history and the taken/not-taken history are combined to generate an index for the indirect branch instruction. The generated index is then used to identify a predicted target address in the table. If the identified predicted target address is valid, then the target address of the indirect branch instruction is predicted to be the predicted target address.
A processor is configured to identify a branch instruction immediately followed by an architectural delay slot. A single bonded instruction comprising the branch instruction immediately followed by the architectural delay slot is created. The single bonded instruction is loaded into an instruction buffer.
An improved mechanism for copying data in memory is described which uses aliasing. In an embodiment, data is accessed from a first location in a memory and stored in a cache line associated with a second, different location in the memory. In response to a subsequent request for data from the second location in the memory, the cache returns the data stored in the cache line associated with the second location in the memory. The method may be implemented using additional hardware logic in the cache which is arranged to receive an aliasing request from a processor which identifies both the first and second locations in memory and triggers the accessing of data from the first location for storing in a cache line associated with the second location.
G06F 12/08 - Adressage ou affectationRéadressage dans des systèmes de mémoires hiérarchiques, p. ex. des systèmes de mémoire virtuelle
G06F 3/06 - Entrée numérique à partir de, ou sortie numérique vers des supports d'enregistrement
G06F 12/0804 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p. ex. mémoires cache avec mise à jour de la mémoire principale
G06F 12/0802 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p. ex. mémoires cache
G06F 12/0811 - Systèmes de mémoire cache multi-utilisateurs, multiprocesseurs ou multitraitement avec hiérarchies de mémoires cache multi-niveaux
Methods and migration units for use in out-of-order processors for migrating data to register file caches associated with functional units of the processor to satisfy register read operations. The migration unit receives register read operations to be executed for a particular functional unit. The migration unit reviews entries in a register renaming table to determine if the particular functional unit has recently accessed the source register and thus is likely to comprise an entry for the source register in its register file cache. In particular, the register renaming table comprises entries for physical registers that indicate what functional units have accessed the physical register. If the particular functional unit has not accessed the particular physical register the migration unit migrates data to the register file cache associated with the particular functional unit.
A processor includes a computation engine to produce a computed value for a set of operands. A cache stores the set of operands and the computed value. The cache is configured to selectively identify a match and a miss for a new set of operands. In the event of a match the computed value is supplied by the cache and a computation engine operation is aborted. In the event of a miss a new computed value for the new set of operands is computed by the computation engine and is stored in the cache.
G06F 12/00 - Accès à, adressage ou affectation dans des systèmes ou des architectures de mémoires
G06F 12/08 - Adressage ou affectationRéadressage dans des systèmes de mémoires hiérarchiques, p. ex. des systèmes de mémoire virtuelle
G06F 7/57 - Unités arithmétiques et logiques [UAL], c.-à-d. dispositions ou dispositifs pour accomplir plusieurs des opérations couvertes par les groupes ou pour accomplir des opérations logiques
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire
Methods and systems that reduce the number of instance of a shared resource needed for a processor to perform an operation and/or execute a process without impacting function are provided. a method of processing in a processor is provided. Aspects include determining that an operation to be performed by the processor will require the use of a shared resource. A command can be issued to cause a second operation to not use the shared resources N cycles later. The shared resource can then be used for a first aspect of the operation at cycle X and then used for a second aspect of the operation at cycle X+N. The second operation may be rescheduled according to embodiments.
Methods and systems that identify and power up ways for future instructions are provided. A processor includes an n-way set associative cache and an instruction fetch unit. The n-way set associative cache is configured to store instructions. The instruction fetch unit is in communication with the n-way set associative cache and is configured to power up a first way, where a first indication is associated with an instruction and indicates the way where a future instruction is located and where the future instruction is two or more instructions ahead of the current instruction.
G06F 12/08 - Adressage ou affectationRéadressage dans des systèmes de mémoires hiérarchiques, p. ex. des systèmes de mémoire virtuelle
G06F 12/0864 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p. ex. mémoires cache utilisant des moyens pseudo-associatifs, p. ex. associatifs d’ensemble ou de hachage
G06F 12/0855 - Accès de mémoire cache en chevauchement, p. ex. pipeline
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire
G06F 3/06 - Entrée numérique à partir de, ou sortie numérique vers des supports d'enregistrement
A method and load and store buffer for issuing a load instruction to a data cache. The method includes determining whether there are any unresolved store instructions in the store buffer that are older than the load instruction. If there is at least one unresolved store instruction in the store buffer older than the load instruction, it is determined whether the oldest unresolved store instruction in the store buffer is within a speculation window for the load instruction. If the oldest unresolved store instruction is within the speculation window for the load instruction, the load instruction is speculatively issued to the data cache. Otherwise, the load instruction is stalled until any unresolved store instructions outside the speculation window are resolved. The speculation window is a short window that defines a number of instructions or store instructions that immediately precede the load instruction.
Methods, reservation stations and processors for allocating resources to a plurality of threads based on the extent to which the instructions associated with each of the threads are speculative. The method comprises receiving a speculation metric for each thread at a reservation station. Each speculation metric represents the extent to which the instructions associated with a particular thread are speculative. The more speculative an instruction, the more likely the instruction has been incorrectly predicted by a branch predictor. The reservation station then allocates functional unit resources (e.g. pipelines) to the threads based on the speculation metrics and selects a number of instructions from one or more of the threads based on the allocation. The selected instructions are then issued to the functional unit resources.
Register files for use in an out-of-order processor that have been divided into a plurality of sub-register files. The register files also have a plurality of buffers which are each associated with one of the sub-register files. Each buffer receives and stores write operations destined for the associated sub-register file which can be later issued to the sub-register file. Specifically, each clock cycle it is determined whether there is at least one write operation in the buffer that has not been issued to the associated sub-register file. If there is at least one write operation in the buffer that has not been issued to the associated sub-register file, one of the non-issued write operations is issued to the associated sub-register file. Each sub-register file may also have an arbitration logic unit which resolves conflicts between read and write operations that want to access the associated sub-register file in the same cycle by prioritizing read operations unless a conflicting write instruction has reached commit time.
G06F 12/00 - Accès à, adressage ou affectation dans des systèmes ou des architectures de mémoires
G06F 12/08 - Adressage ou affectationRéadressage dans des systèmes de mémoires hiérarchiques, p. ex. des systèmes de mémoire virtuelle
G06F 9/30 - Dispositions pour exécuter des instructions machines, p. ex. décodage d'instructions
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire
G06F 13/00 - Interconnexion ou transfert d'information ou d'autres signaux entre mémoires, dispositifs d'entrée/sortie ou unités de traitement
G06F 13/28 - Gestion de demandes d'interconnexion ou de transfert pour l'accès au bus d'entrée/sortie utilisant le transfert par rafale, p. ex. acces direct à la mémoire, vol de cycle
Fill partitioning of a shared cache is described. In an embodiment, all threads running in a processor are able to access any data stored in the shared cache; however, in the event of a cache miss, a thread may be restricted such that it can only store data in a portion of the shared cache. The restrictions to storing data may be implemented for all cache miss events or for only a subset of those events. For example, the restrictions may be implemented only when the shared cache is full and/or only for particular threads. The restrictions may also be applied dynamically, for example, based on conditions associated with the cache. Different portions may be defined for different threads (e.g. in a multi-threaded processor) and these different portions may, for example, be separate and non-overlapping. Fill partitioning may be applied to any on-chip cache, for example, a L1 cache.
G06F 12/128 - Commande de remplacement utilisant des algorithmes de remplacement adaptée aux systèmes de mémoires cache multidimensionnelles, p. ex. associatives d’ensemble, à plusieurs mémoires cache, multi-ensembles ou multi-niveaux
G06F 12/127 - Commande de remplacement utilisant des algorithmes de remplacement avec maniement spécial des données, p. ex. priorité des données ou des instructions, erreurs de maniement ou repérage utilisant des algorithmes de remplacement supplémentaires
G06F 12/084 - Systèmes de mémoire cache multi-utilisateurs, multiprocesseurs ou multitraitement avec mémoire cache partagée
G06F 12/0842 - Systèmes de mémoire cache multi-utilisateurs, multiprocesseurs ou multitraitement pour multitraitement ou multitâche
Methods and branch predictors for predicting a target location of a jump table switch statement in a program. The method includes continuously monitoring instructions at the branch predictor to determine if they write to registers used to store an input variable to a jump table switch statement. Any update to a monitored register is stored in a register table maintained by the branch predictor. Then when it comes time to make a prediction for a jump table switch statement instruction the branch predictor uses the register value stored in the table is used to predict where the jump table switch statement will branch to.
Methods and apparatus for dynamically resizing circular buffers are described wherein circular buffers are dynamically allocated arrays from a pool of arrays. The method comprises receiving either a request to add data to a circular buffer or to remove data from a circular buffer. If the request is an addition request and the circular buffer is full, an array from the pool is allocated to the circular buffer. If, however, the request is a removal request and removal of the data creates an empty array, an array is de-allocated from the circular buffer and returned to the pool. Any arrays that are not allocated to a circular buffer may be disabled to conserve power.
G06F 5/10 - Procédés ou dispositions pour la conversion de données, sans modification de l'ordre ou du contenu des données maniées pour modifier la vitesse de débit des données, c.-à-d. régularisation de la vitesse ayant une séquence d'emplacements d'emmagasinage, chacun étant individuellement accessible à la fois pour des opérations de mise en file d'attente et pour des opérations de retrait de file d'attente, p. ex. utilisant une mémoire à accès aléatoire
91.
Global register protection in a multi-threaded processor
Global register protection in a multi-threaded processor is described. In an embodiment, global resources within a multi-threaded processor are protected by performing checks, before allowing a thread to write to a global resource, to determine whether the thread has write access to the particular global resource. The check involves accessing one or more local control registers or a global control field within the multi-threaded processor and in an example, a local register associated with each other thread in the multi-threaded processor is accessed and checked to see whether it contains an identifier for the particular global resource. Only if none of the accessed local resources contain such an identifier, is the instruction issued and the thread allowed to write to the global resource. Otherwise, the instruction is blocked and an exception may be raised to alert the program that issued the instruction that the write failed.
A method of sharing a plurality of registers in a shared register pool among a plurality of microprocessor threads begins with a determination that a first instruction to be executed by a microprocessor in a first microprocessor thread requires a first logical register. Next a determination is made that a second instruction to be executed by the microprocessor in a second microprocessor thread requires a second logical register. A first physical register in the shared register pool is allocated to the first microprocessor thread for execution of the first instruction and the first logical register is mapped to the first physical register. A second physical register in the shared register pool is allocated to the second microprocessor thread for execution of the second instruction. Finally, the second logical register is mapped to the second physical register.
A computer readable storage medium includes executable instructions to define a processor with guest mode control registers supporting guest mode operating behavior defined by guest context specified in the guest mode control registers. The guest mode control registers include a control bit to specify a guest access blocked register state and a shared register state. Root mode control registers support root mode operating behavior defined by root context specified in the root mode control registers. The root mode control registers include control bits to enable replicated register state access and shared register state access. The guest context and the root context support virtualization of hardware resources such that multiple operating systems supporting multiple applications are executed by the hardware resources.
G06F 9/455 - ÉmulationInterprétationSimulation de logiciel, p. ex. virtualisation ou émulation des moteurs d’exécution d’applications ou de systèmes d’exploitation
A technique for restoring a register renaming map is described. In one example, a restore table having a number of storage locations saves a copy of the register renaming map whenever a flow-risk instruction is passed to a re-order buffer. When all storage locations are full, further instructions still pass to the re-order buffer, but a copy of the map is not saved. A storage location subsequently becomes available when its associated flow-risk instruction is executed. A register renaming map state for an unrecorded flow-risk instruction passed to the re-order buffer while the storage locations were full is generated and stored in the available location. This is generated using the restore table entry for a previous flow-risk instruction and re-order buffer values for intervening instructions between the previous and unrecorded flow-risk instructions. The restore table can be used to restore the map if an unexpected change in instruction flow occurs.
A computer includes a memory and a processor connected to the memory. The processor includes memory segment configuration registers to store defined memory address segments and defined memory address segment attributes such that the processor operates in accordance with the defined memory address segments and defined memory address segment attributes to allow kernel mode access to user space virtual addresses for enhanced kernel mode memory capacity.
A method and apparatus are provided for executing instructions of a multi-threaded processor having multiple hardware threads (32, 34) with differing hardware resources comprising the steps of receiving a plurality of streams of instructions (38, 44) and determining which hardware threads are able to receive instructions for execution (40, 46), determining whether a thread determined to be available for executing an instructions has the hardware resources available required by that instructions (36) and executing the instruction in dependence on the result of the determination (50).
A processor includes a first translation look-aside buffer to support a guest operating mode. A second translation look-aside buffer supports a root operating mode. Hardware resources support the guest operating mode as controlled by guest mode control registers defining guest context. The guest context is used by the hardware resources to access the first translation look-aside buffer to translate a guest virtual address to a guest physical address. The hardware resources access the second translation look-aside buffer to translate the guest physical address to a physical address.
A processor includes guest mode control registers supporting guest mode operating behavior defined by guest context specified in the guest mode control registers. Root mode control registers support root mode operating behavior defined by root context specified in the root mode control registers. The guest context and the root context are simultaneously active to support virtualization of hardware resources such that multiple operating systems supporting multiple applications are executed by the hardware resources.
G06F 3/00 - Dispositions d'entrée pour le transfert de données destinées à être traitées sous une forme maniable par le calculateurDispositions de sortie pour le transfert de données de l'unité de traitement à l'unité de sortie, p. ex. dispositions d'interface
G06F 7/38 - Méthodes ou dispositions pour effectuer des calculs en utilisant exclusivement une représentation numérique codée, p. ex. en utilisant une représentation binaire, ternaire, décimale
G06F 9/00 - Dispositions pour la commande par programme, p. ex. unités de commande
G06F 9/44 - Dispositions pour exécuter des programmes spécifiques
G06F 9/455 - ÉmulationInterprétationSimulation de logiciel, p. ex. virtualisation ou émulation des moteurs d’exécution d’applications ou de systèmes d’exploitation
G06F 9/46 - Dispositions pour la multiprogrammation
G06F 12/00 - Accès à, adressage ou affectation dans des systèmes ou des architectures de mémoires
G06F 13/00 - Interconnexion ou transfert d'information ou d'autres signaux entre mémoires, dispositifs d'entrée/sortie ou unités de traitement
G06F 15/00 - Calculateurs numériques en généralÉquipement de traitement de données en général
G06F 21/00 - Dispositions de sécurité pour protéger les calculateurs, leurs composants, les programmes ou les données contre une activité non autorisée
99.
Horizontally-shared cache victims in multiple core processors
A processor includes multiple processor core units, each including a processor core and a cache memory. Victim lines evicted from a first processor core unit's cache may be stored in another processor core unit's cache, rather than written back to system memory. If the victim line is later requested by the first processor core unit, the victim line is retrieved from the other processor core unit's cache. The processor has low latency data transfers between processor core units. The processor transfers victim lines directly between processor core units' caches or utilizes a victim cache to temporarily store victim lines while searching for their destinations. The processor evaluates cache priority rules to determine whether victim lines are discarded, written back to system memory, or stored in other processor core units' caches. Cache priority rules can be based on cache coherency data, load balancing schemes, and architectural characteristics of the processor.
A method is provided for dynamically determining which instructions from a plurality of available instructions to issue in each clock cycle in a multithreaded processor capable of issuing a plurality of instructions in each clock cycle. The method includes the steps of: determining a highest priority instruction from the plurality of available instructions; determining the compatibility of the highest priority instruction with each of the remaining available instructions; and issuing the highest priority instruction together with other instructions compatible with the highest priority instruction in the same clock cycle. The highest priority instruction cannot be a speculative instruction. The effect of this method is that speculative instructions are only ever issued together with at least one non-speculative instruction.