The invention introduces a method for compressing texture tiles, which contains at least the following steps: lossless-compressing raw data of a texture tile; determining whether a length of the lossless-compression result of the raw data is greater than a target result; and when the length of the lossless-compression result of the raw data is greater than the target length, performing data-reduction control in layers for generating reduced data by reducing the raw data, and generating a lossless-compression result of the reduced data, thereby enabling the length of the lossless-compression result of the reduced data to be equal to the target length or shorter.
H04N 19/426 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements using memory downsizing methods
G06T 7/90 - Determination of colour characteristics
H04N 19/593 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
A data accessing method of a switch for transmitting data packets between a first source node and a first target node and between a second source node and a second target node includes: transmitting a data packet to the switch via at least one of the first communication link and the third communication link and configuring the control unit to store information contained in the data packet into the storage unit; and retrieving the information contained in the data packet from the storage unit via at least one of the second communication link and the fourth communication link. The first source node, the second source node, the first target node and the second target node share the same storage blocks.
A processor using an internal memory to store constants Kt required in a secure hash algorithm (SHA). The latency due to loading the constants Kt from an external memory, therefore, is eliminated. The processor further introduces an instruction set architecture that provides one instruction for the processor to read the constants Kt from the internal memory and perform a particular process on the read constants Kt. Thus, the SHA works efficiently.
H04L 9/00 - Arrangements for secret or secure communicationsNetwork security protocols
H04L 9/06 - Arrangements for secret or secure communicationsNetwork security protocols the encryption apparatus using shift registers or memories for blockwise coding, e.g. D.E.S. systems
G06F 9/30 - Arrangements for executing machine instructions, e.g. instruction decode
G09C 1/00 - Apparatus or methods whereby a given sequence of signs, e.g. an intelligible text, is transformed into an unintelligible sequence of signs by transposing the signs or groups of signs or by replacing them by others according to a predetermined system
H04L 9/30 - Public key, i.e. encryption algorithm being computationally infeasible to invert and users' encryption keys not requiring secrecy
A chipset implemented in a server node of a server system and including an embedded management controller (eMC) is disclosed. The eMC collects inner-node information of the server node for server system management. The eMC is coupled to a baseboard management controller (BMC) that is outside the server node and communicates with a remote console through a network. The eMC is specially designed for the corresponding server node to be differentiated from the other server nodes also coupled to the BMC. All eMCs coupled to the same BMC boot in a special way.
A schedule for refreshing a dynamic random access memory (DRAM). Access commands for a DRAM are queued in a command queue. A microcontroller uses a counter to count how many times a rank of the DRAM is refreshed entirely (whether by a one-time per-rank refresh operation or by a series of per-bank refresh operations). When the counter has not reached an upper limit and no access command corresponding to the rank is waiting in the command queue, the microcontroller repeatedly performs the per-rank refresh operation on the rank. Every refresh inspection interval, the microcontroller decreases the counter by 1.
A processing system and method for a data strobe signal (DQS). A counter circuit counts falling edges of the DQS within a valid region of the DQS and thereby generates a plurality of counting signals. An OR logic circuit receives the counting signals and a DQS window start signal and thereby generates a DQS window signal. A filter circuit is provided to gate the DQS according to the DQS window signal. The DQS window start signal is kept asserted until at least one of the counting signals changes due to the counting.
G11C 8/18 - Address timing or clocking circuitsAddress control signal generation or management, e.g. for row address strobe [RAS] or column address strobe [CAS] signals
G11C 7/10 - Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
G11C 7/22 - Read-write [R-W] timing or clocking circuitsRead-write [R-W] control signal generators or management
G11C 11/4093 - Input/output [I/O] data interface arrangements, e.g. data buffers
7.
Neural network unit with segmentable array width rotator
First/second memories hold rows of N weight/data words. Each of N processing units (PU) of index J have a register, an accumulator having an output, an arithmetic unit that performs an operation thereon to accumulate a result, the first input receives the output of the accumulator, the second input receives a respective first memory weight word, the third input receives a respective data word output by the register, and multiplexing logic receives a respective second memory data word and a data word output by the register of PU J−1 and outputs a selected data word to the register. PU J−1 for PU 0 is PU N−1. The multiplexing logic of PU N/4 also receives the data word output by the register of PU (3N/4)−1. The multiplexing logic of PU 3N/4 also receives the data word output by the register of PU (N/4)−1.
2 D bits and an extra bit. Each of N processing units (PU) of index J has first and second registers, an accumulator, an arithmetic unit that performs an operation thereon to accumulate a result, and multiplexing logic receiving memory word J, and for PUs 0 to (N/2)−1 also memory word J+(N/2). In a first mode, the multiplexing logic of PUs 0 to N−1 selects word J to output to the first register. In a second mode: when the extra bit is a zero, the multiplexing logic of PUs 0 to (N/2)−1 selects word J to output to the first register, and when the extra bit is a one, the multiplexing logic of PUs 0 through (N/2)−1 selects word J+(N/2) to output to the first register.
First/second memories hold rows of N weight/data words. Each of N processing units (PU) of index J have a register, an accumulator having an output, an arithmetic unit that performs an operation thereon to accumulate a result, the first input receives the output of the accumulator, the second input receives a respective first memory weight word, the third input receives a respective data word output by the register, and multiplexing logic receives a respective second memory data word and a data word output by the register of PU J−1 and outputs a selected data word to the register. PU J−1 for PU 0 is PU N−1. The multiplexing logic of PU 0 also receives the data word output by the register of PU (N/2)−1. The multiplexing logic of PU N/2 also receives the data word output by the register of PU N−1.
G06N 3/00 - Computing arrangements based on biological models
G06N 3/063 - Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
G06F 12/0862 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
G06N 3/04 - Architecture, e.g. interconnection topology
G06F 9/38 - Concurrent instruction execution, e.g. pipeline or look ahead
G06F 9/30 - Arrangements for executing machine instructions, e.g. instruction decode
10.
Neural network unit with segmentable array width rotator and re-shapeable weight memory to match segment width to provide common weights to multiple rotator segments
2 W bits and an extra bit. Each of N processing units (PU) of index J has first and second registers, an accumulator, an arithmetic unit performs an operation thereon to accumulate a result, first multiplexing logic for PUs 0 through (N/2)−1 receives first memory weight words J and J+(N/2) and for PUs N/2 through N−1 receives first memory weight words J and J−(N/2) and outputs a selected weight word to the first register, and second multiplexing logic receives second memory data word J and data word output by the second register of PU J−1 and outputs a selected data word to the second register. PU 0 second multiplexing logic also receives PU (N/2)−1 second register data word, and PU N/2 second multiplexing logic also receives PU N−1 second register data word.
In a neural network unit, each neural processing unit (NPU) of an array of N NPUs receives respective first and second upper and lower bytes of 2N bytes received from first and second RAMs. In a first mode, each NPU sign-extends the first upper byte to form a first 16-bit word and performs an arithmetic operation on the first 16-bit word and a second 16-bit word formed by the second upper and lower bytes. In a second mode, each NPU sign-extends the first lower byte to form a third 16-bit word and performs the arithmetic operation on the third 16-bit word and the second 16-bit word formed by the second upper and lower bytes. In a third mode, each NPU performs the arithmetic operation on a fourth 16-bit word formed by the first upper and lower bytes and the second 16-bit word formed by the second upper and lower bytes.
A neural network unit convolves a H×W×C input with F R×S×C filters to generate F Q×P outputs. N processing units (PU) each have a register receiving a memory word and a multiplexed-register selectively receiving a memory word or word rotated from an adjacent PU multiplexed-register. The N PUs are logically partitioned as G blocks each of B PUs. The PUs convolve in a column-channel-row order. For each filter column: the N registers read a memory row, each PU multiplies the register and the multiplexed-register to generate a product to accumulate, and the multiplexed-registers are rotated by one; the multiplexed-registers are rotated to align the input blocks with the adjacent PU block. This is performed for each channel. For each filter row, N multiplexed-registers read a memory row for the multiply-accumulations, F column-channel-row-sums are generated and written to the memory, then all steps are performed for each output row.
A processor comprising a plurality of processing cores, a last level cache memory (LLC) shared by the plurality of processing cores, and a neural network unit (NNU) comprising an array of neural processing units (NPU) and a memory array. The LLC comprises a plurality of slices. To transition from a first mode in which the memory array operates to store neural network weights read by the plurality of NPUs to a second mode in which the memory array operates as a slice of the LLC in addition to the plurality of slices, the processor write-back-invalidates the LLC and updates a hashing algorithm to include the memory array as a slice of the LLC in addition to the plurality of slices. To transition from the second mode to the first mode, the processor write-back-invalidates the LLC and updates the hashing algorithm to exclude the memory array from the LLC.
G06F 12/08 - Addressing or allocationRelocation in hierarchically structured memory systems, e.g. virtual memory systems
G06N 3/063 - Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
G06F 12/0811 - Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
G06F 12/0864 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
G06N 3/04 - Architecture, e.g. interconnection topology
G06F 12/084 - Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
14.
Neural network unit with neural memory and array of neural processing units that collectively perform multi-word distance rotates of row of data received from neural memory
N processing units (PU) each have an arithmetic unit (AU) that performs an operation on first, second and third inputs to generate a result to store in an accumulator having an output provided to the first input. A weight input is received by the AU second input. A multiplexed register has first, second, third and fourth data inputs and an output received by the third AU input. A first memory provides N weight words to the N weight inputs. A second memory provides N data words to the multiplexed register first data inputs. The multiplexed register output is also received by the second, third, and fourth data input of the multiplexed register one, 2{circumflex over ( )}J, and 2{circumflex over ( )}K PUs away, respectively. The N multiplexed registers collectively operate as an N-word rotater that rotates by one, 2{circumflex over ( )}J, or 2{circumflex over ( )}K words when the control input specifies the second, third, or fourth data input, respectively.
A processor comprising a mode indicator, a plurality of processing cores, and a neural network unit (NNU), comprising a memory array, an array of neural processing units (NPU), cache control logic, and selection logic that selectively couples the plurality of NPUs and the cache control logic to the memory array. When the mode indicator indicates a first mode, the selection logic enables the plurality of NPUs to read neural network weights from the memory array to perform computations using the weights. When the mode indicator indicates a second mode, the selection logic enables the plurality of processing cores to access the memory array through the cache control logic as a cache memory.
A neural network unit convolves an H×W×C input with F R×S×C filters to generate F Q×P outputs. N processing units (PU) each have a register receiving a respective word of an N-word row of a second memory and multiplexed-register selectively receiving a respective word of an N-word row of a first memory or word rotated from an adjacent PU multiplexed-register. H first memory rows hold input blocks of B words each of channels of respective 2-dimensional input row slices. R×S×C second memory rows hold filter blocks of B words each holding P copies of a filter weight. B is the smallest factor of N greater than W. The PU blocks multiply-accumulate input blocks and filter blocks in column-channel-row order; they read a row of input blocks and rotate it around the N PUs while performing multiply-accumulate operations so each PU block receives each input block before reading another row.
A processor comprises a neural network unit (NNU) and a processing complex (PC) comprising a processing core and cache memory. The NNU comprises neural processing units (NPU), cache control logic (CCL) and a memory array (MA). To transition from a first mode in which the MA operates to hold neural network weights for the array of NPUs to a second mode in which the MA and CCL operate as a victim cache, the CCL begins to cache evicted cache lines into the MA in response to eviction requests and begins to provide to the PC lines that hit in the MA in response to load requests. To transition from the second mode to the first mode, the CCL invalidates all lines of the MA, ceases to cache evicted lines into the MA in response to eviction requests, and ceases to provide to the PC lines in response to load requests.
G06F 12/00 - Accessing, addressing or allocating within memory systems or architectures
G06N 3/063 - Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
G06F 12/08 - Addressing or allocationRelocation in hierarchically structured memory systems, e.g. virtual memory systems
G06F 12/0811 - Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
G06F 12/0864 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
G06F 12/121 - Replacement control using replacement algorithms
G06N 3/04 - Architecture, e.g. interconnection topology
18.
Methods for executing a computer instruction and apparatuses using the same
The invention introduces a method for executing a computer instruction, which contains at least the following steps: decoding the computer instruction to generate a micro-instruction at least containing an opcode (operation code) and a packed operand, where the packed operand contains all n input parameters corresponding to the computer instruction; generating n addresses of the n input parameters according to the opcode and the packed operand; and reading n approximations corresponding to the n addresses from a lookup table.
The invention introduces a method for calculating floating-point operands, which contains at least the following steps: receiving an FP (floating-point) operand in a first format from a source register, wherein the first format is one of a group of first formats of different kinds; converting the FP operand in the first format into an FP operand in a second format; generating a calculation result in the second format by calculating the FP operand in the second format; converting the calculation result in the second format into a calculation result in the first format; and writing-back the calculation result of the first format.
G06F 7/499 - Denomination or exception handling, e.g. rounding or overflow
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
A scannable data synchronizer including an input circuit, first and second pass gates, first and second inverters, and a gate controller. The input circuit drives the data nodes to opposite logic states in response to an asynchronous input data signal in a normal mode and in response to scan data in a scan test mode. Each pass gate is coupled between one of the data nodes and a corresponding one of the capture nodes, and each has at least one control terminal. The inverters are cross-coupled between the second capture nodes. The gate controller can keep the pass gates at least partially open during a metastable condition of the capture nodes, and can close the pass gates when both capture nodes stabilize to opposite logic states. In the scan test mode, the scan data is used to test the latch or register functions of the scannable data synchronizer.
G01R 31/3177 - Testing of logic operation, e.g. by logic analysers
H03K 19/20 - Logic circuits, i.e. having at least two inputs acting on one outputInverting circuits characterised by logic function, e.g. AND, OR, NOR, NOT circuits
H03K 19/003 - Modifications for increasing the reliability
The invention introduces a method for accelerating hash-based compression, performed in a compression accelerator, comprising: fetching a string to be compressed from a data buffer; storing instances corresponding to the string in an intermediary buffer; issuing a hash request to a hash matcher for each instance, issuing a data request to an LSM (longest string matcher) according to a first reply sent by the hash matcher, and updating a state, a match length and a match offset of the instance according to a second reply sent by the LSM; and outputting the result to a formatter according to the state, the match length and the match offset of each instance in the original order of the associated substrings that appeared in the string.
H03M 7/34 - Conversion to or from delta modulation, i.e. one-bit differential modulation adaptive
H03M 7/30 - CompressionExpansionSuppression of unnecessary data, e.g. redundancy reduction
G06F 5/06 - Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising
G06F 9/44 - Arrangements for executing specific programs
G06F 3/06 - Digital input from, or digital output to, record carriers
22.
Processor with instruction cache that performs zero clock retires
A method of retiring cache lines from a response buffer array to an icache array of a processor including providing sequential addresses to the icache array and to a response buffer array during successive clock cycles, detecting a first address hitting the response buffer array during a first clock cycle, during a second clock cycle that follows the first clock cycle, performing a first zero clock retire to write a first cache line from the response buffer array to the icache array, and during the second clock cycle, bypassing a second address which is one of the sequential addresses. The second address is bypassed given the assumption that it will likely hit the response buffer array in a subsequent cycle. If the second address missed the response buffer array, the bypassed address is replayed with a slight time penalty, which is outweighed by the time savings of zero clock retires.
G06F 12/0875 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
G06F 9/30 - Arrangements for executing machine instructions, e.g. instruction decode
G06F 12/0855 - Overlapped cache accessing, e.g. pipeline
G06F 12/0895 - Caches characterised by their organisation or structure of parts of caches, e.g. directory or tag array
G06F 12/0862 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
G06F 12/1045 - Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] associated with a data cache
G06F 9/38 - Concurrent instruction execution, e.g. pipeline or look ahead
23.
Efficient random number generation for update events in multi-bank conditional branch predictor
A branch predictor, has a plurality of memory banks having entries that hold prediction information used to predict a direction of branch instructions fetched and executed by a processor that comprises the branch predictor. A count of events that occur in the processor is provided to hardware logic that performs an arithmetic and/or logical operation, e.g., XOR, on predetermined bits of the count to generate a random value. In response to the processor determining a correct direction of a branch instruction predicted by the branch predictor, the branch predictor uses the random value generated by the hardware logic to make a decision about updating the memory banks. Bits of a branch history pattern, along with the count, may also be used to generate the random value. The event counted may be a retire of an instruction or a cycle of a core or bus clock.
A method of operating a processor including performing successive read cycles from an instruction cache array and a line buffer array including providing sequential memory addresses, detecting a read hit in the line buffer array, and performing a zero clock retire while performing successive read cycles. The zero clock retire includes switching the instruction cache array from a read cycle to a write cycle for one cycle, selecting a line buffer and providing a cache line stored in the selected line buffer to be stored into the instruction cache array at an address stored in the selected line buffer, and bypassing a sequential memory address being provided to the instruction cache array during the zero clock retire. If the bypassed address missed the line buffer array, the bypassed address may be replayed with a slight time penalty, which is outweighed by the time savings of zero clock retires.
G06F 12/00 - Accessing, addressing or allocating within memory systems or architectures
G06F 12/0875 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
G06F 12/0862 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
G06F 12/0888 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using selective caching, e.g. bypass
25.
Controller and control method for dynamic random access memory
A schedule for refreshing a dynamic random access memory (DRAM). Access commands for a DRAM are queued in a command queue. First-rank bank-refresh time points and second-rank bank-refresh time points are alternately provided within a refresh inspection interval for the microcontroller to alternately refresh a first rank and a second rank of the DRAM bank-by-bank based on the content contained in the command queue.
The invention introduces a method for prefetching data, which contains at least the following steps: receiving a first read request and a second read request from a first LD/ST (Load/Store) queue and a second LD/ST queue, respectively, in parallel; obtaining a first cache-line number and a first offset from the first read request and a second cache-line number of a second offset from the second read request in parallel; obtaining a third cache-line number from a cache-line number register; obtaining a third offset from an offset register; determining whether an offset trend is formed according to the first to third cache-line numbers and the first to third offsets; and directing an L1 (Level-1) data cache to prefetch data of a cache line when the offset trend is formed.
G06F 12/00 - Accessing, addressing or allocating within memory systems or architectures
G06F 12/0862 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
G06F 12/0897 - Caches characterised by their organisation or structure with two or more cache hierarchy levels
27.
Branch predictor that uses multiple byte offsets in hash of instruction block fetch address and branch pattern to generate conditional branch predictor indexes
A branch predictor has a block address useable to access a block of instruction bytes of an instruction cache and first/second byte offsets within the block of instruction bytes. Hashing logic hashes a branch pattern and respective first/second address formed from the block address and the respective first/second byte offsets to generate respective first/second indexes. A conditional branch predictor receives the first/second indexes and in response provides respective first/second direction predictions of first/second conditional branch instructions in the block of instruction bytes. In one embodiment, a branch target address cache (BTAC) provides the byte offsets, and the first/second direction predictions are statically associated with first/second target addresses also provided by the BTAC. Alternatively, the byte offsets are predetermined values, and the first/second direction predictions are dynamically associated with the first/second target addresses based on the relative sizes of the byte offsets provided by the BTAC.
G06F 9/38 - Concurrent instruction execution, e.g. pipeline or look ahead
G06F 12/0875 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
28.
Pipelined processor with multi-issue microcode unit having local branch decoder
A processor has an execution pipeline that executes microinstructions and an instruction translator that translates architectural instructions into the microinstructions. The instruction translator has a memory that holds microcode instructions and provides a fetch quantum of a plurality of microcode instructions per clock cycle, a queue that holds microcode instructions provided by the memory, and a branch decoder that decodes the fetch quantum to detect local branch instructions, causes microcode instructions of the fetch quantum up to but not including a first-in-program-order local branch instruction to be written to the queue, and prevents the first-in-program-order local branch instruction and following microcode instructions of the fetch quantum from being written to the queue. Local branch instructions are resolved by the instruction translator rather than the execution pipeline. Microcode translators translate multiple microcode instructions received from the queue per clock cycle into microinstructions for provision to the execution pipeline.
A microprocessor includes FMA execution logic that determines whether to accumulate an accumulator operand C to the partial products of multiplier and multiplicand operands A and B in the partial product adder or in a second accumulation stage. The logic calculates an exponent delta of Aexp+Bexp−Cexp and determines the number of leading zeroes in C, if C is denormal. The microprocessor accumulates C with the partial products of A and B when the accumulation of C to the product of A and B could result in mass cancellation, when ExpDelta is greater than or equal to −K (where K is related to a width of a datapath in the partial product adder), and when a C is denormal and its number of leading zeroes plus K exceeds −ExpDelta. The strategic use of resources in the partial product adder and second accumulation stage reduces latency.
A computer system including a processor and a memory is provided. The processor includes a microcode executing unit and a programmable fuse which stores trusted information which is pre-generated using China commercial cryptography algorithms. The memory is operatively coupled to the processor and is configured to store a trusted module and a digital certificate of the trusted module. The microcode executing unit uses the China commercial cryptography algorithms to authenticate the digital certificate according to the trusted information, and authenticates the trusted module according to the authenticated digital certificate.
G06F 9/32 - Address formation of the next instruction, e.g. by incrementing the instruction counter
G06F 9/44 - Arrangements for executing specific programs
G06F 9/06 - Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
G06F 21/57 - Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
H04L 9/32 - Arrangements for secret or secure communicationsNetwork security protocols including means for verifying the identity or authority of a user of the system
H04L 9/06 - Arrangements for secret or secure communicationsNetwork security protocols the encryption apparatus using shift registers or memories for blockwise coding, e.g. D.E.S. systems
H04L 9/14 - Arrangements for secret or secure communicationsNetwork security protocols using a plurality of keys or algorithms
H04L 9/30 - Public key, i.e. encryption algorithm being computationally infeasible to invert and users' encryption keys not requiring secrecy
31.
Apparatuses and methods for trusted module execution
Apparatuses and methods for trusted module execution are proposed, which provide secure boot and trusted execution of system software by using the China commercial cryptography algorithms to establish the SRTM/DRTM. Conventionally, the Intel TXT which uses RSA or SHA-256 cryptography algorithms only authenticates the trusted modules. By contrast, the present application uses the China commercial cryptography algorithms and is able to authenticate the trusted modules and their digital certificates or certificate chains (which has a higher security level than just authenticating the digital certificates).
H04L 9/32 - Arrangements for secret or secure communicationsNetwork security protocols including means for verifying the identity or authority of a user of the system
G06F 21/57 - Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
H04L 9/14 - Arrangements for secret or secure communicationsNetwork security protocols using a plurality of keys or algorithms
H04L 9/30 - Public key, i.e. encryption algorithm being computationally infeasible to invert and users' encryption keys not requiring secrecy
H04L 9/06 - Arrangements for secret or secure communicationsNetwork security protocols the encryption apparatus using shift registers or memories for blockwise coding, e.g. D.E.S. systems
32.
Processor with improved alias queue and store collision detection to reduce memory violations and load replays
A register alias table for a processor including an alias queue, load and store comparators, and dependency logic. Each entry of the alias queue stores instruction pointers of a pair of colliding load and store instructions that caused a memory violation and a valid value. The store comparator compares the instruction pointer of a subsequent store instruction with those stored in the alias queue, and if a match occurs, indicates that a store index of the subsequent store instruction is valid. The load comparator determines whether the instruction pointer of a subsequent load instruction matches an instruction pointer stored in the alias queue. If so, dependency logic provides a store index, if valid, as dependency information for the subsequent load instruction.
The invention introduces a method for accelerating hash-based compression, performed in a compression accelerator, comprising: receiving, by a plurality of hash functions, a plurality of substrings from an FSM (Finite-State Machine) in parallel; mapping, by each hash function, the received substring to a hash index and directing a selector to connect to one of a plurality of match paths according to the hash index; transmitting, by a matcher of each connected match path, a no-match message to the FSM when determining that a hash table does not contain the received substring; and transmitting, by the matcher of each connected match path, a match message and a match offset of the hash table to the FSM when determining that the hash table contains the received substring, wherein the match offset corresponds to the received substring.
H03M 7/30 - CompressionExpansionSuppression of unnecessary data, e.g. redundancy reduction
G06F 5/06 - Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising
G06F 9/44 - Arrangements for executing specific programs
G06F 17/30 - Information retrieval; Database structures therefor
G06F 12/1018 - Address translation using page tables, e.g. page table structures involving hashing techniques, e.g. inverted page tables
A chip package array including a plurality of chip packages is provided. The chip packages are suitable for array arrangement to form the chip package array. Each of the chip packages includes a redistribution structure, a supporting structure, a chip, and an encapsulated material. The supporting structure is disposed on the redistribution structure and has an opening. The chip is disposed on the redistribution structure and located in the opening. The encapsulated material is located between the opening and the chip, wherein the encapsulated material is filled between the opening and the chip, and the chip and the supporting structure are respectively connected to the redistribution structure.
H01L 23/00 - Details of semiconductor or other solid state devices
H01L 21/56 - Encapsulations, e.g. encapsulating layers, coatings
H01L 25/04 - Assemblies consisting of a plurality of individual semiconductor or other solid-state devices all the devices being of a type provided for in a single subclass of subclasses , , , , or , e.g. assemblies of rectifier diodes the devices not having separate containers
H01L 23/31 - Encapsulation, e.g. encapsulating layers, coatings characterised by the arrangement
H01L 27/00 - Devices consisting of a plurality of semiconductor or other solid-state components formed in or on a common substrate
35.
Electronic structure, and electronic structure array
An electronic structure is provided with a redistribution structure and the following elements. A first supporting structure has a first opening and is disposed on a first surface of the redistribution structure. A second supporting structure has a second opening and is disposed on a second surface of the redistribution structure opposite to the first surface. A first bonding protruding portions are disposed on the first surface of the redistribution structure and located in the first opening. A second bonding protruding portions are disposed on the second surface of the redistribution structure and located in the second opening. A first encapsulated material is filled between the first opening and the first bonding protruding portions. A second encapsulated material is filled between the second opening and the second bonding protruding portions. An electronic structure array is also provided.
H01L 23/00 - Details of semiconductor or other solid state devices
H01L 25/04 - Assemblies consisting of a plurality of individual semiconductor or other solid-state devices all the devices being of a type provided for in a single subclass of subclasses , , , , or , e.g. assemblies of rectifier diodes the devices not having separate containers
H01L 27/00 - Devices consisting of a plurality of semiconductor or other solid-state components formed in or on a common substrate
A chip package process includes the following steps. A supporting structure and a carrier plate are provided. The supporting structure has a plurality of openings. The supporting structure is disposed on the carrier plate. A plurality of chips is disposed on the carrier plate. The chips are respectively located in the openings of the supporting structure. An encapsulated material is formed to cover the supporting structure and the chips. The supporting structure and the chips are located between the encapsulated material and the carrier plate. The encapsulated material is filled between the openings and the chips. The carrier plate is removed. A redistribution structure is disposed on the supporting structure, wherein the redistribution structure is connected to the chips.
H01L 21/56 - Encapsulations, e.g. encapsulating layers, coatings
H01L 23/00 - Details of semiconductor or other solid state devices
H01L 21/78 - Manufacture or treatment of devices consisting of a plurality of solid state components or integrated circuits formed in, or on, a common substrate with subsequent division of the substrate into plural individual devices
H01L 23/31 - Encapsulation, e.g. encapsulating layers, coatings characterised by the arrangement
An electronic structure process includes the following steps. A redistribution structure and a carrier plate are provided. A plurality of first bonding protruding portions and a first supporting structure are formed on the redistribution structure. A first encapsulated material is formed and filled between a first opening and the first bonding protruding portions. The carrier plate is removed. A plurality of second bonding protruding portions and a second supporting structure are formed on the redistribution structure. A second encapsulated material is formed and filled between a second opening and the second bonding protruding portions.
H01L 23/00 - Details of semiconductor or other solid state devices
H01L 25/04 - Assemblies consisting of a plurality of individual semiconductor or other solid-state devices all the devices being of a type provided for in a single subclass of subclasses , , , , or , e.g. assemblies of rectifier diodes the devices not having separate containers
H01L 27/00 - Devices consisting of a plurality of semiconductor or other solid-state components formed in or on a common substrate
38.
Pre-driver and replica circuit and driver system with the same
A pre-driver includes a first inverter, a second inverter, an amplifier, a first capacitor, and a second capacitor. The first inverter has an input terminal for receiving an input signal at an input node, and an output terminal coupled to an inner node. The second inverter has an input terminal coupled to the inner node, and an output terminal for outputting an output signal at an output node. The amplifier is configured to amplify the input signal by a gain factor so as to generate an amplified signal and an inverted amplified signal. The first capacitor has a first terminal coupled to the output node, and a second terminal for receiving the amplified signal. The second capacitor has a first terminal coupled to the inner node, and a second terminal for receiving the inverted amplified signal.
H03K 19/003 - Modifications for increasing the reliability
H03K 19/20 - Logic circuits, i.e. having at least two inputs acting on one outputInverting circuits characterised by logic function, e.g. AND, OR, NOR, NOT circuits
H03K 19/0185 - Coupling arrangementsInterface arrangements using field-effect transistors only
A single-ended-to-differential converter for driving an LVDS (Low Voltage Differential Signaling) driving circuit includes a first converting circuit, a second converting circuit, and a controller. The first converting circuit converts an input signal into a first output signal. The first converting circuit has a tunable delay time. The second converting circuit converts the input signal into a second output signal. The second converting circuit has a fixed delay time. The controller generates a first control signal and a second control signal according to the first output signal and the second output signal, so as to adjust the tunable delay time of the first converting circuit.
H03K 3/00 - Circuits for generating electric pulsesMonostable, bistable or multistable circuits
H03K 17/687 - Electronic switching or gating, i.e. not by contact-making and -breaking characterised by the use of specified components by the use, as active elements, of semiconductor devices the devices being field-effect transistors
H03K 5/151 - Arrangements in which pulses are delivered at different times at several outputs, i.e. pulse distributors with two complementary outputs
H03K 5/135 - Arrangements having a single output and transforming input signals into pulses delivered at desired time intervals by the use of time reference signals, e.g. clock signals
H03K 5/156 - Arrangements in which a continuous pulse train is transformed into a train having a desired pattern
An interpolator includes a first delay circuit, a second delay circuit, and a tunable delay circuit. The first delay circuit delays a first input signal for a fixed delay time, so as generate a first output signal. The second delay circuit delays a second input signal for the fixed delay time, so as to generate a second output signal. The tunable delay circuit delays the first input signal for a tunable delay time, so as to generate an output interpolation signal. The tunable delay time is determined according to the first output signal, the second output signal, and the output interpolation signal.
H03K 5/00 - Manipulation of pulses not covered by one of the other main groups of this subclass
H03D 13/00 - Circuits for comparing the phase or frequency of two mutually-independent oscillations
H03K 5/134 - Arrangements having a single output and transforming input signals into pulses delivered at desired time intervals using a chain of active-delay devices with field-effect transistors
H03K 17/687 - Electronic switching or gating, i.e. not by contact-making and -breaking characterised by the use of specified components by the use, as active elements, of semiconductor devices the devices being field-effect transistors
H03K 19/20 - Logic circuits, i.e. having at least two inputs acting on one outputInverting circuits characterised by logic function, e.g. AND, OR, NOR, NOT circuits
H03L 7/085 - Details of the phase-locked loop concerning mainly the frequency- or phase-detection arrangement including the filtering or amplification of its output signal
H03L 7/087 - Details of the phase-locked loop concerning mainly the frequency- or phase-detection arrangement including the filtering or amplification of its output signal using at least two phase detectors or a frequency and phase detector in the loop
H04L 7/033 - Speed or phase control by the received code signals, the signals containing no special synchronisation information using the transitions of the received signal to control the phase of the synchronising-signal- generating means, e.g. using a phase-locked loop
41.
Processor with slave free list that handles overflow of recycled physical registers and method of recycling physical registers in a processor using a slave free list
A processor including physical registers, a reorder buffer, a master free list, a slave free list, a master recycle circuit, and a slave recycle circuit. The reorder buffer includes instruction entries in which each entry stores physical register indexes for recycling physical registers. The reorder buffer retires up to N instructions in each processor cycle. Each master and slave free list includes N input ports and stores physical register indexes, in which the master free list stores indexes of physical registers to be allocated to instructions being issued. When an instruction is retired, the master recycle circuit routes a first physical register index stored in an instruction entry of the instruction to an input port of the master free list, and the slave recycle circuit routes a second physical register index stored in the instruction entry of the instruction to an input port of the slave free list.
A processor including a physical register file, a rename table, mapping logic, size tracking logic, and merge logic. The rename table maps an architectural register with a larger index and a smaller index. The mapping logic detects a partial write instruction that specifies an architectural register that is already identified by an entry of the rename table mapped to a second physical register allocated for a larger write operation, and includes an index for the allocated register for the partial write instruction into the smaller index location of the entry. The size tracking logic provides a merge indication for the partial write instruction if the write size of the previous write instruction is larger. The merge logic merges the result of the partial write instruction with the second physical register during retirement of the partial write instruction.
A processor includes an execution unit, a retirement module, a first retirement counter, a second retirement counter, and an adjustment module. The execution unit executes instructions of a first thread and a second thread by simultaneous multithreading. The retirement module retires the executed instructions of the first thread in order of the first-thread instruction sequence, and retires the executed instructions of the second thread in order of the second-thread instruction sequence. The first retirement counter determines a first multi-thread retirement rate of the first thread. The second retirement counter determines a second multi-thread retirement rate of the second thread. The adjustment module adjusts the proportions of hardware resources respectively occupied by the first thread and the second thread according to the first multi-thread retirement rate and the second multi-thread retirement rate, so that the processor executes at its most efficient level of performance.
A bias-current-control circuit is provided. The bias-current-control circuit includes a transconductance circuit, a constant-current source, and a current-mirror circuit. The transconductance circuit is connected to a node and detects a voltage signal to generate a first current. The constant-current source is connected to the node and generates a tail current. The current-mirror circuit includes a reference current terminal and a bias current terminal, and the reference current terminal is coupled to the node. A second current which flows through the reference current terminal is determined by a current difference between the tail current and the first current. A bias current which flows through the bias current terminal is generated based on the second current. Furthermore, the second current and the bias current are in a predetermined ratio.
H03L 5/00 - Automatic control of voltage, current, or power
H03L 5/02 - Automatic control of voltage, current, or power of power
G05F 1/56 - Regulating voltage or current wherein the variable actually regulated by the final control device is DC using semiconductor devices in series with the load as final control devices
H03B 5/32 - Generation of oscillations using amplifier with regenerative feedback from output to input with frequency-determining element being electromechanical resonator being a piezoelectric resonator
H03B 5/12 - Generation of oscillations using amplifier with regenerative feedback from output to input with frequency-determining element comprising lumped inductance and capacitance active element in amplifier being semiconductor device
H03B 5/36 - Generation of oscillations using amplifier with regenerative feedback from output to input with frequency-determining element being electromechanical resonator being a piezoelectric resonator active element in amplifier being semiconductor device
A switch for transmitting data packets between at least one source node and at least one target node is provided. The switch includes a storage unit, a control unit, at least one receiving port and at least one transmitting port. The storage unit includes a plurality of storage blocks and configured to cache the data packets. The control unit is configured to manage the storage blocks. The switch receives and caches the data packets transmitted from the at least one source node via the receiving port and transmits the cached data packets to the at least one target node via the transmitting port. A data accessing method adapted for the switch is also provided.
A set associative cache memory, comprising: an array of storage elements arranged as N ways; an allocation unit that allocates the storage elements of the array in response to memory accesses that miss in the cache memory; wherein each of the memory accesses has an associated memory access type (MAT) of a plurality of predetermined MATs, wherein the MAT is received by the cache memory; a mapping that, for each MAT of the plurality of predetermined MATs, associates the MAT with a subset of one or more ways of the N ways; wherein for each memory access of the memory accesses, the allocation unit allocates into a way of the subset of one or more ways that the mapping associates with the MAT of the memory access; and wherein the mapping is dynamically updatable during operation of the cache memory.
G06F 12/08 - Addressing or allocationRelocation in hierarchically structured memory systems, e.g. virtual memory systems
G06F 12/0895 - Caches characterised by their organisation or structure of parts of caches, e.g. directory or tag array
G06F 12/0846 - Cache with multiple tag or data arrays being simultaneously accessible
G06F 12/0864 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
A processor including a memory controller for interfacing an external memory and a programmable functional unit (PFU). The PFU is programmed by a PFU program to modify operation of the memory controller, in which the PFU includes programmable logic elements and programmable interconnectors. For example, the PFU is programmed by the PFU program to add a function or otherwise to modify an existing function of the memory controller enhance its functionality during operation of the processor. In this manner, the functionality and/or operation of the memory controller is not fixed once the processor is manufactured, but instead the memory controller may be modified after manufacture to improve efficiency and/or enhance performance of the processor, such as when executing a corresponding process.
A system and method of determining memory ownership on a cache line basis for detecting self-modifying code including code with looping instructions. An ownership queue includes multiple entries for determining memory ownership on a cache line basis. An ownership index and a wrap bit are determined for each cache line in the ownership queue, which are provided with each instruction derived from the same cache line. When an instruction is issued for execution, the ownership index provided with the instruction is used to access the corresponding entry in the ownership queue. If the instruction and entry wrap bits do not match, then an overwrite of the cache line is detected. The instruction is marked to invoke a first exception, which is performed when the instruction is ready to retire. The first exception flushes the processor, prevents the instruction from being retired, and re-fetches the instruction to continue processing.
G06F 12/00 - Accessing, addressing or allocating within memory systems or architectures
G06F 12/109 - Address translation for multiple virtual address spaces, e.g. segmentation
G06F 12/0893 - Caches characterised by their organisation or structure
G06F 12/0875 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
49.
System and method of determining memory ownership on cache line basis for detecting self-modifying code
System and method of determining memory ownership on cache line basis for detecting self-modifying code. An ownership queue stores cache line addresses and corresponding ownership indexes. The cache line data is translated into instructions, and each instruction is provided with an ownership index of an associated entry in the ownership queue. Each new cache line address is compared with the destination address of each store instruction, and each destination address, when determined, is compared with each cache line address in the ownership queue. Matching entries are marked as stale, and each instruction derived from a stale entry causes an exception when ready to retire. In this manner, a hit between a cache line and a corresponding store instruction causes an exception. An exception flushes the processor to resolve the potential modified code condition.
G06F 12/00 - Accessing, addressing or allocating within memory systems or architectures
G06F 12/0875 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
50.
System and method of determining memory ownership on cache line basis for detecting self-modifying code including modification of a cache line with an executing instruction
A processor that determines memory ownership on a cache line basis for detecting self-modifying code including modification of a cache line with an executing instruction. An ownership index and corresponding cache line address are entered for each cache line into an ownership queue. The ownership index is provided with each instruction derived from the cache line. When the instruction is issued, an executing bit is set in the corresponding entry. When a destination address of a store instruction matches an entry in the ownership queue, the store instruction is marked to invoke an executing exception if the executing bit of the entry is set. When a store instruction that is ready to retire is marked to invoke the executing exception, the store instruction is allowed to retire, the processor is flushed, and the next instruction after the store instruction is re-fetched to continue processing.
G06F 12/00 - Accessing, addressing or allocating within memory systems or architectures
G06F 12/0875 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
G06F 12/0891 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
A controller for controlling a dynamic random access memory (DRAM) comprising a plurality of blocks. A block is one or more units of storage in the DRAM for which the DRAM controller can selectively enable or disable refreshing. The DRAM controller includes flags each for association with a block of the blocks of the DRAM. A sanitize controller determines a block is to be sanitized and in response sets a flag associated with the block and disables refreshing the block. In response to subsequently receiving a request to read data from a location in the block, if the flag is clear, the DRAM controller reads the location and returns data read from it. If the flag is set, the DRAM controller refrains from reading the DRAM and returns a value of zero.
A set associative cache memory comprises an M×N memory array of storage entries arranged as M sets by N ways, both M and N are integers greater than one. Within each group of P mutually exclusive groups of the M sets, the N ways are separately powerable. A controller, for each group of the P groups, monitors a utilization trend of the group and dynamically causes power to be provided to a different number of ways of the N ways of the group during different time instances based on the utilization trend.
G06F 12/08 - Addressing or allocationRelocation in hierarchically structured memory systems, e.g. virtual memory systems
G06F 12/128 - Replacement control using replacement algorithms adapted to multidimensional cache systems, e.g. set-associative, multicache, multiset or multilevel
G06F 12/0864 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
G06F 12/0862 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
G06F 12/0804 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with main memory updating
G06F 12/0846 - Cache with multiple tag or data arrays being simultaneously accessible
53.
Data synchronizer for registering a data signal into a clock domain
A data synchronizer that registers an input data signal into a clock domain of a clock signal. The data synchronizer includes in input circuit, first and second pass gates, first and second inverters, gate controller, and a register. The input circuit drives first and second data nodes to opposite logic states based on the input data signal. Each pass gate is coupled between a data node and a capture node. The inverters are cross-coupled between the capture nodes. The gate controller is capable of keeping the pass gates at least partially open during a metastable condition of the capture nodes, and closes the pass gates when the capture nodes resolve to opposite logic states. The register registers a capture node to provide a registered data output in response to the clock signal. The data synchronizer may be implemented using FinFET devices.
H03K 19/00 - Logic circuits, i.e. having at least two inputs acting on one outputInverting circuits
H03K 19/003 - Modifications for increasing the reliability
H03K 19/20 - Logic circuits, i.e. having at least two inputs acting on one outputInverting circuits characterised by logic function, e.g. AND, OR, NOR, NOT circuits
H01L 27/088 - Devices consisting of a plurality of semiconductor or other solid-state components formed in or on a common substrate including integrated passive circuit elements with at least one potential-jump barrier or surface barrier the substrate being a semiconductor body including only semiconductor components of a single kind including field-effect components only the components being field-effect transistors with insulated gate
54.
System and method of determining memory ownership on cache line basis for detecting self-modifying code including code with instruction that overlaps cache line boundaries
A system and method for determining memory ownership on a cache line basis for detecting self-modifying code with instructions that overlap cache line boundaries. An ownership index and a cache line address are entered into the ownership queue for each cache line. The cache lines are translated into instructions, and a straddle bit is set for each instruction that was derived from cache line data that overlapped two cache lines. A stale bit is set for any entry of the ownership queue that collides with a store instruction. Each instruction issued for execution is marked with a first exception when the stale bit of the corresponding ownership queue entry is set, or when the straddle bit of the issued instruction and a stale bit of a next sequential entry are both set. A first exception is performed for each instruction ready to retire that is marked with the first exception.
G06F 12/00 - Accessing, addressing or allocating within memory systems or architectures
G06F 12/0875 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
G06F 12/0891 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
55.
Data synchronizer for latching an asynchronous data signal relative to a clock signal
A data synchronizer that latches an asynchronous input data signal relative to a clock signal. The data synchronizer includes in input circuit, first and second pass gates, first and second inverters, and a gate controller. The input circuit drives first and second data nodes to opposite logic states based on the asynchronous input data signal. Each pass gate is coupled between an input data node and a capture node. The inverters are cross-coupled between the capture nodes. The gate controller is capable of keeping the pass gates at least partially open during a metastable condition of the capture nodes, and closes the pass gates when the capture nodes resolve to opposite logic states. The capture nodes may be buffered in a substantially balanced manner to provide a buffered output, and the buffered output may be registered into the clock domain. The data synchronizer may be implemented using FinFET devices.
H03K 19/003 - Modifications for increasing the reliability
H03K 19/20 - Logic circuits, i.e. having at least two inputs acting on one outputInverting circuits characterised by logic function, e.g. AND, OR, NOR, NOT circuits
H01L 27/092 - Devices consisting of a plurality of semiconductor or other solid-state components formed in or on a common substrate including integrated passive circuit elements with at least one potential-jump barrier or surface barrier the substrate being a semiconductor body including only semiconductor components of a single kind including field-effect components only the components being field-effect transistors with insulated gate complementary MIS field-effect transistors
56.
Pre-driver for driving low voltage differential signaling (LVDS) driving circuit
A pre-driver for driving an LVDS (Low Voltage Differential Signaling) driving circuit is provided. The pre-driver includes a first inverter, a high-pass filter, and a second inverter. The first inverter has an input terminal coupled to an input node of the pre-driver, and an output terminal coupled to a first node. The high-pass filter is coupled between the first node and a second node. The second inverter has an input terminal coupled to the second node, and an output terminal coupled to an output node of the pre-driver. The high-pass filter is configured to improve a high-frequency response of the pre-driver.
A duty cycle calibration circuit includes a first signal-generating circuit, receiving a clock signal to generate a first signal and a second signal, wherein the second signal and the first signal are the inverse of each other and synchronous. The calibration circuit also includes a first transmission gate, supplying a supply voltage to an adjustment signal according to the first signal and the second signal, and a fourth transmission gate, coupling the inverse of the adjustment signal to a ground according to the first signal and the second signal.
H03K 3/017 - Adjustment of width or dutycycle of pulses
H03K 19/20 - Logic circuits, i.e. having at least two inputs acting on one outputInverting circuits characterised by logic function, e.g. AND, OR, NOR, NOT circuits
58.
Host interface controller and control method for storage device
A host interface controller with improved boot up efficiency, which uses a buffer mode setting register to set the operation mode of a first and a second buffer set provided within the host interface controller. When a cache memory of a central processing unit (CPU) at the host side has not started up, the first and second buffer sets operate in a cache memory mode to respond to read requests that the CPU repeatedly issues for data of specific addresses of the storage device. When the cache memory has started up, the first buffer set and the second buffer set operate in a ping-pong buffer mode to respond to read requests that the CPU issues for data of sequential addresses of the storage device.
G06F 9/44 - Arrangements for executing specific programs
G06F 12/0862 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
A processor with an expandable instruction set architecture for dynamically configuring execution resources. The processor includes a programmable execution unit (PEU) that may be programmed to perform a user-defined function in response to a user-defined instruction (UDI). The PEU includes programmable logic elements and programmable interconnectors that are collectively programmed to perform at least one processing operation. A UDI loader is responsive to a UDI load instruction that specifies a UDI and a location of programming information that is used to program the PEU. The PEU may be programmed for one or more UDIs for one or more processes. An instruction table stores each UDI and corresponding information to identify the UDI and possibly to reprogram the PEU if necessary. A UDI handler consults the instruction table to identify a received UDI and to send corresponding information to the PEU to execute the corresponding user-defined function.
A processor including a programmable prefetcher for prefetching information from an external memory. The programmable prefetcher includes a load monitor, a programmable prefetch engine, and a prefetch requester. The load monitor tracks load requests issued by the processor to retrieve information from the external memory. The programmable prefetch engine is configured to be programmed by at least one prefetch program to operate as a programmed prefetcher, such that during operation of the processor, the programmed prefetcher generates at least one prefetch address based on the load requests issued by the processor. The requester uses each generated prefetch address to prefetch information from the external memory. A prefetch memory may store one or more prefetch programs and a prefetch programmer may be included to select from among stored prefetch programs to program the prefetcher based on an executing process. Each prefetch program may be configured according to a prefetch definition.
A host interface controller having a first buffer set and a second buffer set operated in a ping-pong buffer mode by a control module to alternately work as a pre-fetch buffer set. When one buffer set between the first buffer set and the second buffer set works as the pre-fetch buffer set, the control module pre-fetches and buffers data starting from a first address of a storage device into the pre-fetch buffer set and accesses the other buffer set between the first buffer set and the second buffer set to respond to a read request that the central processing unit issues to access data of a second address of the storage device.
G06F 9/44 - Arrangements for executing specific programs
G06F 12/0862 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
A processor including a cache memory, processing logic, access logic, stride mask logic, count logic, arbitration logic, and a prefetcher. The processing logic submits load requests to access cache lines of a memory page. The access logic updates an access vector for the memory page, in which the access logic determines a minimum stride value between successive load requests. The stride mask logic provides a mask vector based on the minimum stride value. The count logic combines the mask vector with the access vector to provide an access count. The arbitration logic triggers a prefetch operation when the access count achieves a predetermined count threshold. The prefetcher performs the prefetch operation using a prefetch address determined by combining the minimum stride value with an address of a last one of the load requests. Direction of the stride may be determined, and a stable mode is described.
G06F 12/08 - Addressing or allocationRelocation in hierarchically structured memory systems, e.g. virtual memory systems
G06F 12/0862 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
An I/O circuit includes buffers, a storage module, accumulators, timers, and an arbiter. Each buffer corresponds to a respective virtual channel. Each buffer corresponds to a respective token bucket, and outputs a normal transmission request according to the amount of tokens and an accumulating signal. The storage module stores a lookup table including a plurality of weightings. Each accumulator corresponds to a respective buffer, accumulates a data volume according to the corresponding weighting, and outputs the accumulating signal. Each timer corresponds to a respective buffer, times waiting period after the corresponding buffer outputs the normal transmission request, and outputs a time-out transmission request when the waiting period exceeds a predetermined period. The arbiter receives the time-out transmission requests and the normal transmission requests, and selects one of the buffers from all of the time-out transmission requests and the normal transmission requests.
A measurement device measuring a current passing through a detection resistor coupled between a first node and a second node is provided. An interference elimination unit is coupled to the first and second nodes and selectively outputs the voltage of at least one of the first and second nodes according to a control signal. A first voltage-dividing unit is coupled to the interference elimination unit and processes the voltage of the first or second node to generate a first processed signal. A second voltage-dividing unit is coupled to the interference elimination unit and processes the voltage of the first or second node to generate a second processed signal. A processing unit is coupled to the first and second voltage-dividing units to receive the first and second processed signals and calculates the first and second processed signals to obtain the current passing through the detection resistor.
A digital-to-analog converter (DAC) and a high-voltage tolerance circuit are provided. The DAC includes a high-voltage tolerance circuit. The high-voltage tolerance circuit is configured to generate a reference voltage, and select the reference voltage or a first power-source voltage to control the node voltage of each branch of an operational amplifier circuit of the high-voltage tolerance circuit according the logical signal level of an input signal.
Code upgrades for computer components. After being powered on, a central processing unit (CPU) of a computer system loads a start-up authenticated code module (start-up ACM) to an authenticated code execution area (ACEA) within the CPU to be authenticated. When the start-up ACM passes authentication, the CPU executes the start-up ACM to connect to a server and receive a code upgrade file for a computer component of the computer system from the server.
G06F 21/57 - Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
G06F 21/64 - Protecting data integrity, e.g. using checksums, certificates or signatures
G06F 21/73 - Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information by creating or determining hardware identification, e.g. serial numbers
G06F 21/33 - User authentication using certificates
H04L 9/30 - Public key, i.e. encryption algorithm being computationally infeasible to invert and users' encryption keys not requiring secrecy
67.
Axially and centrally symmetric current source array
A current source device having a current source array includes a plurality of current source units, a plurality of least significant bits, and a plurality of most significant bits. The current source units are arranged along a plurality rows and columns of a current source array. Each of the least significant bits includes a first amount of current source units is placed at the geometric center of the current source array. Each of the most significant bits includes a second amount of current source units. The second amount is the first amount multiplied by a positive integer. The two adjacent bits in the most significant bits are centrally symmetrical to the geometric center.
H03M 1/06 - Continuously compensating for, or preventing, undesired influence of physical parameters
H03M 1/68 - Digital/analogue converters with conversions of different sensitivity, i.e. one conversion relating to the more significant digital bits and another conversion to the less significant bits
A processor with an expandable instruction set architecture for dynamically configuring execution resources. The processor includes a programmable execution unit (PEU) that may be programmed to perform a user-defined function in response to a user-defined instruction (UDI). The PEU includes programmable logic elements and programmable interconnectors that are collectively programmed to perform at least one processing operation. A UDI loader is responsive to a UDI load instruction that specifies a UDI and a location of programming information that is used to program the PEU. The PEU may be programmed for one or more UDIs for one or more processes. An instruction table stores each UDI and corresponding information to identify the UDI and possibly to reprogram the PEU if necessary. A UDI handler consults the instruction table to identify a received UDI and to send corresponding information to the PEU to execute the corresponding user-defined function.
A compiler system that converts an application source program into an executable program according to a predetermined ISA executable by a general purpose processor. The processor includes a PEU that is programmable to execute a UDI. The compiler system includes a PEU programming tool that converts a functional description of a processing operation to be performed by the PEU of the processor into programming information for programming the PEU to perform the processing operation in response to the specified UDI. The compiler system includes a compiler that converts the application source program into the executable program, which includes an optimization routine that represents a portion of the application source program with the specified UDI and that inserts the UDI into the executable program, and that further inserts into the executable program a UDI load instruction that specifies the UDI and a location of the programming information in the executable program.
A conversion system that converts a standard executable program according to a predetermined ISA into a custom executable program executable by a general purpose processor. The processor includes a PEU that is programmable to execute a UDI. The conversion system includes a PEU programming tool that converts a functional description of a processing operation to be performed by the PEU of the processor into programming information for the PEU to perform the processing operation in response to the UDI. A converter converts the standard executable program into the custom executable program and includes an optimization routine that replaces a portion of the standard executable program with the specified UDI and that inserts the UDI into the custom executable program, and that further inserts a UDI load instruction that specifies the UDI and a location of the programming information in the custom executable program.
A processor including a programmable prefetcher for prefetching information from an external memory. The programmable prefetcher includes a load monitor, a programmable prefetch engine, and a prefetch requester. The load monitor tracks load requests issued by the processor to retrieve information from the external memory. The programmable prefetch engine is configured to be programmed by at least one prefetch program to operate as a programmed prefetcher, such that during operation of the processor, the programmed prefetcher generates at least one prefetch address based on the load requests issued by the processor. The requester uses each generated prefetch address to prefetch information from the external memory. A prefetch memory may store one or more prefetch programs and a prefetch programmer may be included to select from among stored prefetch programs to program the prefetcher based on an executing process. Each prefetch program may be configured according to a prefetch definition.
G06F 12/00 - Accessing, addressing or allocating within memory systems or architectures
G06F 12/0862 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
G06F 9/38 - Concurrent instruction execution, e.g. pipeline or look ahead
G06F 12/0875 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
G06F 12/0897 - Caches characterised by their organisation or structure with two or more cache hierarchy levels
G06F 12/0855 - Overlapped cache accessing, e.g. pipeline
G06F 9/30 - Arrangements for executing machine instructions, e.g. instruction decode
72.
Processor with programmable prefetcher operable to generate at least one prefetch address based on load requests
A processor including a front end, at least one load pipeline, and a memory system that further includes a programmable prefetcher for prefetching information from an external memory. The front end converts fetched program instructions into microinstructions including load microinstructions and dispatches microinstructions for execution. The load pipeline executes dispatched load microinstructions and provides load requests to the memory system. The programmable prefetcher includes a load monitor, a programmable prefetch engine, and a prefetch requester. The load monitor tracks the load requests. The prefetch engine is configured to be programmed by at least one prefetch program to operate as a programmed prefetcher, such that during operation of the processor, the programmed prefetcher generates at least one prefetch address based on the load requests issued by the processor. The prefetch requester submits the at least one prefetch address to prefetch information from the memory system.
G06F 12/00 - Accessing, addressing or allocating within memory systems or architectures
G06F 12/0862 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
G06F 9/38 - Concurrent instruction execution, e.g. pipeline or look ahead
G06F 12/0875 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
G06F 12/0897 - Caches characterised by their organisation or structure with two or more cache hierarchy levels
G06F 12/0855 - Overlapped cache accessing, e.g. pipeline
G06F 9/30 - Arrangements for executing machine instructions, e.g. instruction decode
73.
Host controller of high-speed data interface with clock-domain crossing
A host controller with suppressed data jitter is shown, which uses a logical physical layer (LPHY) to provide groups of low-speed data, uses a clock-domain-crossing transmitter (TXCDC) to transmit the groups of the low-speed data to the corresponding electrical physical layers (EPHYs), uses the EPHYs to convert the groups of the low-speed data to high-speed data and transmit the high-speed data to the corresponding external devices, and further has a multiplexer. Each EPHY corresponds to one clock signal and operates accordingly. The multiplexer receives the clock signals of the EPHYs to output a common clock signal for the LPHY to provide the groups of low-speed data and for the TXCDC to retrieve the groups of low-speed data. With respect to each of the external devices, the TXCDC uses the clock signal corresponding to the corresponding EPHY to output the corresponding group of low-speed data to the corresponding EPHY.
A signal-generating circuit includes a first P-type transistor, a second P-type transistor, a first N-type transistor, a second N-type transistor, a first inverter, a second inverter, and a third inverter. The first P-type transistor supplies a supply voltage to a first node according to an input signal. Both of the second P-type transistor and the first N-type transistor couple the first node to a second node according to the input signal. The second N-type transistor couples the first node to a ground according to the input signal. The first inverter is coupled to the second node to generate a first signal. The second inverter is coupled between the first node and a third node. The third inverter is coupled to the third node to generate a second signal. The second signal and the first signal are the reverse of each other and synchronous.
H03K 3/017 - Adjustment of width or dutycycle of pulses
H03K 19/20 - Logic circuits, i.e. having at least two inputs acting on one outputInverting circuits characterised by logic function, e.g. AND, OR, NOR, NOT circuits
A data reception chip coupled to an external memory including a first input-output pin configured to output first data and including a comparison module and a voltage generation module is provided. The comparison module is coupled to the first input-output pin to receive the first data and to compare the first data with a first reference voltage to identify the value of the first data. The voltage generation module is configured to generate the first reference voltage. The voltage generation module includes a first resistor and a second resistor. The second resistor is connected to the first resistor in series. The first and second resistors divide a first operation voltage to generate the first reference voltage.
A control method for a data reception chip. The data reception chip includes a voltage generation module including a plurality of resistors and a selection unit. The resistors are connected in series with one another and divide an operation voltage to generate a plurality of divided voltages. The selection unit selects one of the divided voltages as a reference voltage according to a control signal. The control method includes controlling the selection unit to set the level of the reference voltage to an initial level; receiving data and comparing the data with the reference voltage to generate a compared result; determining whether the compared result is equal to pre-determined data; and directing the selection unit to select another divided voltage when the compared result is not equal to the pre-determined data.
G11C 11/4074 - Power supply or voltage generation circuits, e.g. bias voltage generators, substrate voltage generators, back-up power, power control circuits
G05F 3/20 - Regulating voltage or current wherein the variable is DC using uncontrolled devices with non-linear characteristics being semiconductor devices using diode-transistor combinations
G05F 3/30 - Regulators using the difference between the base-emitter voltages of two bipolar transistors operating at different current densities
A data reception chip coupled to an external memory comprising a first input-output pin to output first data and including a comparison module and a voltage generation module is provided. The comparison module is coupled to the first input-output pin to receive the first data and compares the first data with a first reference voltage to identify the value of the first data. The voltage generation module is configured to generate the first reference voltage and includes a plurality of first resistors and a first selection unit. The first resistors are connected in series with one another and dividing a first operation voltage to generate a plurality of first divided voltages. The first selection unit selects one of the first divided voltages as the first reference voltage according to a first control signal.
A circuit substrate for a chip bonding thereon includes a core substrate having a chip-side surface and a bump-side surface opposite to the chip-side surface, a first through via plug passing through the core substrate, a pad disposed on the bump-side surface, in contact with the first through via plug, and a first thickness enhancing conductive pattern disposed on a surface of the pad, which is away from the bump-side surface.
A chipset implemented in a server node of a server system and including an embedded management controller is disclosed. The chipset also includes a northbridge and southbridge. The embedded management controller collects inner-node information of the server node for server system management. The embedded management controller is coupled to a baseboard management controller, and the baseboard management controller is outside the server node and communicates with a remote console through network.
A system and method of performing speculative parallel execution of a cache line unaligned load instruction including speculatively predicting whether a load instruction is unaligned with a cache memory, marking the load instruction as unaligned and issuing the instruction to a scheduler, dispatching the unaligned load instruction in parallel to first and second load pipelines, determining corresponding addresses for both load pipelines to retrieve data from first and second cache lines incorporating the target load data, and merging the data retrieved from both load pipelines. Prediction may be based on matching an instruction pointer of a previous iteration of the load instruction that was qualified as actually unaligned. Prediction may be further based on using a last address and a skip stride to predict a data stride between consecutive iterations of the load instruction. The addresses for both loads are selected to incorporate the target load data.
G06F 9/38 - Concurrent instruction execution, e.g. pipeline or look ahead
G06F 9/30 - Arrangements for executing machine instructions, e.g. instruction decode
G06F 9/345 - Addressing or accessing the instruction operand or the result of multiple operands or results
G06F 9/32 - Address formation of the next instruction, e.g. by incrementing the instruction counter
G06F 12/0875 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
81.
Prefetching with level of aggressiveness based on effectiveness by memory access type
A processor includes a prefetcher that prefetches data in response to memory accesses, wherein each memory access has an associated memory access type (MAT) of a plurality of predetermined MATs. The processor also includes a table that holds scores that indicate effectiveness of the prefetcher to prefetch data with respect to the plurality of predetermined MATs. The prefetcher prefetches data in response to memory accesses at a level of aggressiveness based on the scores held in the table and the associated MATs of the memory accesses.
G06F 12/0862 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
G06F 9/345 - Addressing or accessing the instruction operand or the result of multiple operands or results
G06F 9/38 - Concurrent instruction execution, e.g. pipeline or look ahead
G06F 9/30 - Arrangements for executing machine instructions, e.g. instruction decode
G06F 12/0875 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
G06F 12/1009 - Address translation using page tables, e.g. page table structures
G06F 12/127 - Replacement control using replacement algorithms with special data handling, e.g. priority of data or instructions, handling errors or pinning using additional replacement algorithms
82.
Chipset and host controller with capability of disk encryption
A chipset and a host controller, including a storage host controller for a storage device and an encryption and decryption engine that is implemented by hardware. The storage host controller analyzes a write command to obtain write command information, and provides the write command information and write data to the encryption and decryption engine. The encryption and decryption engine combines a data drive key with the write command information to encrypt the write data and provides the encrypted write data to the storage host controller to be written into a storage device via a communication port.
G06F 21/78 - Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure storage of data
G06F 21/62 - Protecting access to data via a platform, e.g. using keys or access control rules
G06F 3/06 - Digital input from, or digital output to, record carriers
A phase detector includes a clock delay circuit, a data delay circuit, a control circuit, a D flip-flop, and a logic circuit. The clock delay circuit delays a clock signal so as to generate a delay clock signal. The data delay circuit delays a data signal so as to generate a delay data signal. The control circuit adjusts the delay time of the clock delay circuit and the delay time of the data delay circuit according to the clock signal and the delay clock signal. The D flip-flop generates a register signal according to the data signal and the clock signal. The logic circuit generates an up control signal and a down control signal according to the data signal, the delay data signal, and the register signal so as to control a charge pump of a CDR (Clock Data Recovery) circuit.
H03L 7/087 - Details of the phase-locked loop concerning mainly the frequency- or phase-detection arrangement including the filtering or amplification of its output signal using at least two phase detectors or a frequency and phase detector in the loop
84.
Neural network unit with neural memory and array of neural processing units that collectively shift row of data received from neural memory
An array of N processing units (PU) each has: an accumulator; an arithmetic unit performs an operation on first, second and third inputs to generate a result to store in the accumulator, the first input receives the accumulator output; a weight input is received by the second input to the arithmetic unit; a multiplexed register has first and second data inputs, an output received by the third input to the arithmetic unit, and a control input that controls the data input selection. The multiplexed register output is also received by an adjacent PU's multiplexed register second data input. The N PU's multiplexed registers collectively operate as an N-word rotater when the control input specifies the second data input. Respective first/second memories hold W/D rows of N weight/data words and provide the N weight/data words to the corresponding weight/multiplexed register first data inputs of the N PUs.
G06F 7/499 - Denomination or exception handling, e.g. rounding or overflow
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
85.
Processor with architectural neural network execution unit
A processor has an instruction fetch unit that fetches ISA instructions from memory and execution units that perform operations on instruction operands to generate results according to the processor's ISA. A hardware neural network unit (NNU) execution unit performs computations associated with artificial neural networks (ANN). The NNU has an array of ALUs, a first memory that holds data words associated with ANN neuron outputs, and a second memory that holds weight words associated with connections between ANN neurons. Each ALU multiplies a portion of the data words by a portion of the weight words to generate products and accumulates the products in an accumulator as an accumulated value. Activation function units normalize the accumulated values to generate outputs associated with ANN neurons. The ISA includes at least one instruction that instructs the processor to write data words and the weight words to the respective first and second memories.
G06F 9/38 - Concurrent instruction execution, e.g. pipeline or look ahead
G06F 7/499 - Denomination or exception handling, e.g. rounding or overflow
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
G06F 9/32 - Address formation of the next instruction, e.g. by incrementing the instruction counter
86.
Neural network unit with shared activation function units
A neural network unit includes first and second memories that hold rows of respective N weight and data words and provides a row of them to N corresponding neural processing units (NPU), respectively. The N NPUs each have an accumulator and an arithmetic unit that performs a series of multiply operations on pairs of weight words and data words received from the first and second memories to generate a series of products. The arithmetic unit also performs a series of addition operations on the series of products to accumulate an accumulated value in the accumulator. Activation function units (AFU) are each shared by a corresponding plurality of the N NPUs. Each AFU, in a sequential fashion with respect to each NPU of the corresponding plurality of the N NPUs, receives the accumulated value from the NPU and performs an activation function on the accumulated value to generate a result.
G06F 9/38 - Concurrent instruction execution, e.g. pipeline or look ahead
G06F 7/499 - Denomination or exception handling, e.g. rounding or overflow
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
G06F 9/32 - Address formation of the next instruction, e.g. by incrementing the instruction counter
87.
Mechanism for communication between architectural program running on processor and non-architectural program running on execution unit of the processor regarding shared resource
Functional units of a processor fetch and decode architectural instructions of an architectural program. The architectural instructions are of an architectural instruction set of the processor. An execution unit includes first and second memories, a register and processing units. The first memory holds data in rows with addresses. The second memory holds non-architectural instructions of a non-architectural program. The architectural and non-architectural instruction sets are distinct. The processing units execute the non-architectural program instructions to read data from the first memory, perform operations on the data read from the first memory to generate results, and to write the results to the first memory. The register holds information that indicates progress made by the non-architectural program during execution. The first memory is also readable and writable by the architectural program. The architectural program uses the information to decide where in the first memory to read/write data.
G06F 7/499 - Denomination or exception handling, e.g. rounding or overflow
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
G06F 9/32 - Address formation of the next instruction, e.g. by incrementing the instruction counter
88.
Apparatus employing user-specified binary point fixed point arithmetic
An apparatus includes a plurality of arithmetic logic units each having an accumulator and an integer arithmetic unit that receives and performs integer arithmetic operations on integer inputs and accumulates integer results of a series of the integer arithmetic operations into the accumulator as an integer accumulated value. A register is programmable with an indication of a number of fractional bits of the integer accumulated values and an indication of a number of fractional bits of integer outputs. A first bit width of the accumulator is greater than twice a second bit width of the integer outputs. A plurality of adjustment units scale and saturate the first bit width integer accumulated values to generate the second bit width integer outputs based on the indications of the number of fractional bits of the integer accumulated values and outputs programmed into the register.
G06F 5/01 - Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising
G06F 7/57 - Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups or for performing logical operations
G06F 7/499 - Denomination or exception handling, e.g. rounding or overflow
G06N 3/063 - Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
G06F 9/00 - Arrangements for program control, e.g. control units
89.
Processor with hybrid coprocessor/execution unit neural network unit
A processor includes a front-end portion that issues instructions to execution units that execute the issued instructions. A hardware neural network unit (NNU) execution unit includes a first memory that holds data words associated with artificial neural networks (ANN) neuron outputs, a second memory that holds weight words associated with connections between ANN neurons, and a third memory that holds a program comprising NNU instructions that are distinct, with respect to their instruction set, from the instructions issued to the NNU by the front-end portion of the processor. The program performs ANN-associated computations on the data and weight words. A first instruction instructs the NNU to transfer NNU instructions of the program from architectural general purpose registers to the third memory. A second instruction instructs the NNU to invoke the program stored in the third memory.
G06F 7/499 - Denomination or exception handling, e.g. rounding or overflow
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
90.
Neural network unit with output buffer feedback and masking capability
An output buffer holds N words arranged as N/J mutually exclusive output buffer word groups (OBWG) of J words each. N processing units (PU) are arranged as N/J mutually exclusive PU groups each having an associated OBWG. Each PU has an accumulator, an arithmetic unit, and first and second multiplexed registers each having at least J+1 inputs and an output. A first input receives a memory operand and the other J inputs receive the J words of the associated OBWG. Each accumulator provides its output to a respective output buffer word. Each arithmetic unit performs an operation on the first and second multiplexed register outputs and the accumulator output to generate a result for accumulation into the accumulator. A mask input to the output buffer controls which words, if any, of the N words retain their current value or are updated with their respective accumulator output.
G06F 9/38 - Concurrent instruction execution, e.g. pipeline or look ahead
G06F 7/499 - Denomination or exception handling, e.g. rounding or overflow
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
G06F 9/32 - Address formation of the next instruction, e.g. by incrementing the instruction counter
91.
Neural network unit that performs concurrent LSTM cell calculations
An output buffer holds N words arranged as N/J mutually exclusive output buffer word groups (OBWG) of J words each of the N words. N processing units (PU) are arranged as N/J mutually exclusive PU groups. Each PU group has an associated OBWG. Each PU includes an accumulator and an arithmetic unit that performs operations on inputs, which include the accumulator output, to generate a first result for accumulation into the accumulator. Activation function units selectively perform an activation function on the accumulator outputs to generate results for provision to the N output buffer words. For each PU group, four of the J PUs and at least one of the activation function units compute an input gate, a forget gate, an output gate and a candidate state of a Long Short Term Memory (LSTM) cell, respectively, for writing to respective first, second, third and fourth words of the associated OBWG.
G06N 3/063 - Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
G06N 3/04 - Architecture, e.g. interconnection topology
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
G06F 9/00 - Arrangements for program control, e.g. control units
92.
Neural network unit with output buffer feedback and masking capability with processing unit groups that operate as recurrent neural network LSTM cells
An output buffer holds N words arranged as N/J mutually exclusive output buffer word groups (OBWG) of J words each. N processing units (PU) are arranged as N/J mutually exclusive PU groups each having an associated OBWG. Each PU has an accumulator, arithmetic unit, and first and second multiplexed registers each having at least J+1 inputs. A first input receives a memory operand and the other J inputs receive the J words of the associated OBWG. Each accumulator provides its output to a respective OBWG. Each arithmetic unit performs an operation on the first and second multiplexed register outputs and accumulator output to generate a result for accumulation into the accumulator. A mask input to the output buffer controls which words, if any, of the N words retain their current value or are updated with their respective accumulator output. Each PU group operates as a recurrent neural network LSTM cell.
G06F 9/38 - Concurrent instruction execution, e.g. pipeline or look ahead
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
G06F 9/32 - Address formation of the next instruction, e.g. by incrementing the instruction counter
A neural network unit configurable to first/second/third configurations has N narrow and N wide accumulators, multipliers and adders. Each multiplier performs a narrow/wide multiply on first and second narrow/wide inputs to generate a narrow/wide product. A first adder input receives a corresponding narrow/wide accumulator's output and third input receives a widened corresponding narrow multiplier's narrow product in the third configuration. In the first configuration, each narrow/wide adder performs a narrow/wide addition on the first and second inputs to generate a narrow/wide sum for storage into the corresponding narrow/wide accumulator. In the second configuration, each wide adder performs a wide addition on the first and a second input to generate a wide sum for storage into the corresponding wide accumulator. In the third configuration, each wide adder performs a wide addition on the first, second and third inputs to generate a wide sum for storage into the corresponding wide accumulator.
G06F 9/38 - Concurrent instruction execution, e.g. pipeline or look ahead
G06F 7/499 - Denomination or exception handling, e.g. rounding or overflow
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
G06F 9/32 - Address formation of the next instruction, e.g. by incrementing the instruction counter
94.
Neural network unit with neural processing units dynamically configurable to process multiple data sizes
A neural network unit. A register holds an indicator that specifies narrow and wide configurations. A first memory holds rows of 2N/N narrow/wide weight words in the narrow/wide configuration. A second memory holds rows of 2N/N narrow/wide data words in the narrow/wide configuration. An array of neural processing units (NPU) is configured as 2N/N narrow/wide NPUs and to receive the 2N/N narrow/wide weight words of rows from the first memory and to receive the 2N/N narrow/wide data words of rows from the second memory in the narrow/wide configuration. In the narrow configuration, the 2N NPUs perform narrow arithmetic operations on the 2N narrow weight words and the 2N narrow data words received from the first and second memories. In the wide configuration, the N NPUs perform wide arithmetic operations on the N wide weight words and the N wide data words received from the first and second memories.
G06F 9/38 - Concurrent instruction execution, e.g. pipeline or look ahead
G06F 7/499 - Denomination or exception handling, e.g. rounding or overflow
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
G06F 9/32 - Address formation of the next instruction, e.g. by incrementing the instruction counter
95.
Neural processing unit that selectively writes back to neural memory either activation function output or accumulator value
A neural network unit includes a programmable indicator, a first memory that holds first operands, a second memory that holds second operands, neural processing units (NPU), and activation units. Each NPU has an accumulator and an arithmetic unit that performs a series of multiply operations on pairs of the first and second operands received from the first and second memories to generate a series of products, and a series of addition operations on the series of products to accumulate an accumulated value in the accumulator. The activation units perform activation functions on the accumulated values in the accumulators to generate results. When the indicator specifies the first action, the neural network unit writes to the first memory the results generated by the activation units. When the indicator specifies the second action, the neural network unit writes to the first memory the accumulated values in the accumulators.
G06F 7/499 - Denomination or exception handling, e.g. rounding or overflow
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
96.
Neural network unit employing user-supplied reciprocal for normalizing an accumulated value
A neural network unit including a register programmable with a representation of a reciprocal value of a divisor and a plurality of neural processing units (NPU). Each NPU has an ALU, an accumulator, and a reciprocal multiplier unit. The ALU performs arithmetic and logical operations on a sequence of operands to generate a sequence of results and accumulates the sequence of results as an accumulated value into the accumulator. The reciprocal multiplier unit receives the representation of the reciprocal value and the accumulated value and in response generates a result that is the quotient of the accumulated value and the divisor.
G06F 9/38 - Concurrent instruction execution, e.g. pipeline or look ahead
G06F 7/499 - Denomination or exception handling, e.g. rounding or overflow
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
G06F 9/32 - Address formation of the next instruction, e.g. by incrementing the instruction counter
A processor has functional units that fetch and decode architectural instructions of an architectural instruction set at a first rate, a register that stores a value of an indicator programmable by execution of an architectural instruction of the architectural instruction set, and an execution unit. The execution unit includes a first memory that holds data, a second memory that holds instructions of a program, and a plurality of processing units that execute the program instructions at a second rate to perform operations on data received from the first memory to generate results to be written to the first memory. The instructions are of an instruction set that is distinct from the architectural instruction set. The second rate is the first rate when the indicator is programmed with a first value and the second rate is less than the first rate when the indicator is programmed with a second value.
G06F 7/499 - Denomination or exception handling, e.g. rounding or overflow
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
G06F 9/32 - Address formation of the next instruction, e.g. by incrementing the instruction counter
98.
Direct execution by an execution unit of a micro-operation loaded into an architectural register file by an architectural instruction of a processor
A processor includes an architectural register file loadable with micro-operations by architectural instructions of an architectural instruction set of the processor and an execution unit that executes instructions. The instructions are either architectural instructions or microinstructions into which architectural instructions are translated. The execution unit includes a decoder that decodes the instructions into micro-operations, a mode indicator that indicates one of first and second modes, a pipeline of stages to which are provided micro-operations that control circuits of the stages of the pipeline, and a multiplexer. The multiplexer selects for provision to the pipeline a micro-operation received from the decoder when the mode indicator indicates the first mode and selects for provision to the pipeline a micro-operation received from the architectural register file when the mode indicator indicates the second mode.
G06F 7/499 - Denomination or exception handling, e.g. rounding or overflow
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
G06F 9/32 - Address formation of the next instruction, e.g. by incrementing the instruction counter
A neural network unit (NNU) includes N neural processing units (NPU). Each NPU has an arithmetic unit and an accumulator. First and second multiplexed registers of the N NPUs collectively selectively operate as respective first and second N-word rotaters. First and second memories respectively hold rows of N weight/data words and provide the N weight/data words of a row to corresponding ones of the N NPUs. The NPUs selectively perform: multiply-accumulate operations on rows of N weight words and on a row of N data words, using the second N-word rotater; convolution operations on rows of N weight words, using the first N-word rotater, and on rows of N data words, the rows of weight words being a data matrix, and the rows of data words being elements of a convolution kernel; and pooling operations on rows of N weight words, using the first N-word rotater.
G06F 9/38 - Concurrent instruction execution, e.g. pipeline or look ahead
G06F 7/499 - Denomination or exception handling, e.g. rounding or overflow
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
G06F 9/32 - Address formation of the next instruction, e.g. by incrementing the instruction counter
100.
Neural network unit that performs convolutions using collective shift register among array of neural processing units
A neural network unit has a first memory that holds elements of a data matrix and a second memory that holds elements of a convolution kernel. An array of neural processing units (NPU) each have a multiplexed register that receives a corresponding element of a row from the first memory and that also receives the multiplexed register output of an adjacent NPU. A register receives a corresponding element of a row from the second memory. An arithmetic unit receives the outputs of the register, the multiplexed register and an accumulator and performs a multiply-accumulate operation on them. For each sub-matrix of a plurality of sub-matrices of the data matrix, each arithmetic unit selectively receives either the element from the first memory or the adjacent NPU multiplexed register output and performs a series of the multiply-accumulate operations to accumulate into the accumulator a convolution of the sub-matrix with the convolution kernel.
G06F 7/499 - Denomination or exception handling, e.g. rounding or overflow
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers