Rather than collecting flow logs at a central location, and then processing these flow logs to create general purpose or specialized data stores, the embodiments herein rely on the network appliances to create the flow logs and metadata that indexes these flow logs. The flow logs and the metadata can then be collected at the central location (e.g., a central analyzer) and merged with flow logs and metadata generated by other network appliances to yield a data store that can be used to analyze the flow logs in computing environment (e.g., a data center).
An integrated circuit die stack is disclosed that includes a digital device layer, an underlying layer, and a cooling solution. The underlying layer has a lower power consumption relative to the digital device layer. The digital device layer is disposed closer to the cooling solution. In another example, memory layers and a digital device layer are configured into a three-dimensional memory stack. The digital device layer has a first surface (side) located closest to a cooling solution and the memory layers are located on a second surface (side) of the digital device layer opposite to the first surface (side) thereof. The cooling solution is adapted to receive and dissipate heat from the digital device layer and the memory layers.
H01L 23/427 - Cooling by change of state, e.g. use of heat pipes
H01L 25/065 - Assemblies consisting of a plurality of individual semiconductor or other solid-state devices all the devices being of a type provided for in a single subclass of subclasses , , , , or , e.g. assemblies of rectifier diodes the devices not having separate containers the devices being of a type provided for in group
H10B 80/00 - Assemblies of multiple devices comprising at least one memory device covered by this subclass
3.
DYNAMIC VOLTAGE DROP ANALYSIS USING SIMULTANEOUS SWITCHING
Dynamic voltage drop analysis for a circuit design includes generating, by computer hardware, bias information for a circuit design. The bias information specifies switching information for a plurality of instances of one or more standard cells of the circuit design. A schedule specifying switching for the plurality of instances of the circuit design is generated by the computer hardware based on the bias information. A dynamic voltage analysis is performed by the computer hardware on the circuit design to generate dynamic voltage analysis results by switching the plurality of instances of the circuit design based on the schedule.
G06F 30/367 - Design verification, e.g. using simulation, simulation program with integrated circuit emphasis [SPICE], direct methods or relaxation methods
G06F 119/06 - Power analysis or power optimisation
A data processor includes a memory controller and a physical interface circuit coupled to the memory controller. In response to a system startup, the memory controller controls the physical interface circuit to selectively train a memory based on whether a first memory clock frequency of a plurality of power states equals any other memory clock frequency of the plurality of power states.
Disclosed herein are chip packages that integrate multiple compute dies through a single interposer die to a memory stack. The interposer die includes memory controller circuitry that allowing multiple compute dies to access the memory stack in an efficient manner.
H01L 25/065 - Assemblies consisting of a plurality of individual semiconductor or other solid-state devices all the devices being of a type provided for in a single subclass of subclasses , , , , or , e.g. assemblies of rectifier diodes the devices not having separate containers the devices being of a type provided for in group
H01L 25/18 - Assemblies consisting of a plurality of individual semiconductor or other solid-state devices the devices being of types provided for in two or more different main groups of the same subclass of , , , , or
H10B 80/00 - Assemblies of multiple devices comprising at least one memory device covered by this subclass
H01L 23/42 - Fillings or auxiliary members in containers selected or arranged to facilitate heating or cooling
H01L 23/538 - Arrangements for conducting electric current within the device in operation from one component to another the interconnection structure between a plurality of semiconductor chips being formed on, or in, insulating substrates
H01L 23/31 - Encapsulation, e.g. encapsulating layers, coatings characterised by the arrangement
H01L 23/04 - ContainersSeals characterised by the shape
A processing system [100] includes one or more accelerator units (AUs) [114] each having a modular architecture. To this end, each AU includes a connection circuitry [116] and one or more memory stacks [122] disposed on the connection circuitry. Further, each AU includes one or more interposer dies [118] each disposed on the connection circuitry such that each interposer die of the one or more interposer dies is communicatively coupled to a corresponding memory stack via the connection circuitry. Further, each interposer die of each AU includes sets of circuitry [578, 580] configured to concurrently support two or more types of compute dies [300, 400].
H01L 25/065 - Assemblies consisting of a plurality of individual semiconductor or other solid-state devices all the devices being of a type provided for in a single subclass of subclasses , , , , or , e.g. assemblies of rectifier diodes the devices not having separate containers the devices being of a type provided for in group
H01L 25/18 - Assemblies consisting of a plurality of individual semiconductor or other solid-state devices the devices being of types provided for in two or more different main groups of the same subclass of , , , , or
H10B 80/00 - Assemblies of multiple devices comprising at least one memory device covered by this subclass
A technique for rendering is provided. The technique includes obtaining one or more samples for a pixel, the samples obtained for a microfacet surface from a spherical cap cut off by a lower plane positioned to exclude reflected rays that are occluded by the microfacet surface; obtaining one or more contributions corresponding to the one or more samples; determining a color for the pixel based on the one or more contributions.
A processing unit includes a plurality of processing cores and is configured to arrange a sparse matrix for parallel performance by the cores on different rows of the matrix at least in part by calculating a respective quantity of non-zero elements in each row, assigning each row to a respective collection according to the respective quantity of non-zero elements for the row, wherein the processing unit is configured to assign at least one first row of the sparse matrix to respective collections of in parallel with assigning at least one second row of the sparse matrix to respective collections, and performing at least one mathematical operation on at least a first collection of the plurality of collections in parallel with performing the at least one mathematical operation on at least a second collection of the plurality of collections.
A processing system includes one or more accelerator units (AUs) each having a modular architecture. To this end, each AU includes a connection circuitry and one or more memory stacks disposed on the connection circuitry. Further, each AU includes one or more interposer dies each disposed on the connection circuitry such that each interposer die of the one or more interposer dies is communicatively coupled to a corresponding memory stack of the memory stacks via the connection circuitry. Further, each interposer die of each AU includes circuitry configured to concurrently support two or more types of compute dies.
As part of rendering a scene including at least one graphics object in a display space, the display space is divided into a plurality of tiles. A determination is made that contents of at least two of the plurality of tiles are no longer used after a current render pass. A write back memory address associated with a second tile is changed to match a write back memory address associated with a first tile. As a result, data is overwritten on a same physical page.
A method can include overriding settings of an integrated circuit device by reading one or more settings from a setting record that correspond to a part number of the integrated circuit device. The method can also include performing an override of the settings of the integrated circuit device based on the one or more settings of the setting record that correspond to the part number of the integrated circuit device. Various other methods and systems are also disclosed.
A method for synchronizing trusted operating systems can include receiving, at a first interconnect circuit, an operating system management instruction for a first trusted operating system that is associated with a first trusted memory region of a memory device, the first trusted memory region being allocated to the first interconnect circuit. The method can also include synchronizing the operating system management instruction with a second interconnect circuit such that the operating system management instruction is applied to a second trusted operating system. The second trusted operating system is associated with a second trusted memory region of the memory device and the second trusted memory region is allocated to the second interconnect circuit. Various other methods and systems are also disclosed.
G06F 21/53 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by executing in a restricted environment, e.g. sandbox or secure virtual machine
G06F 9/50 - Allocation of resources, e.g. of the central processing unit [CPU]
G06F 21/54 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by adding security routines or objects to programs
A chip package and method for fabricating the same are provided that include a IC dies bonded to a thermal carrier having a plurality of metallic pillars. In one example, a chip package includes an interconnect routing structure and a first die disposed on a first surface of the interconnect routing structure. The first die has a circuitry connected to a circuitry of the interconnect routing structure. The chip package also includes a second die at least partially disposed over the first die. The second die has a circuitry connected to the circuitry of the first die. A thermal carrier is bonded on the second die. At least one of the thermal carrier, the first die, or the second die includes a plurality of metallic pillars configured to transfer heat, wherein the plurality of metallic pillars are electrically floating.
H01L 25/065 - Assemblies consisting of a plurality of individual semiconductor or other solid-state devices all the devices being of a type provided for in a single subclass of subclasses , , , , or , e.g. assemblies of rectifier diodes the devices not having separate containers the devices being of a type provided for in group
H01L 23/00 - Details of semiconductor or other solid state devices
H01L 23/36 - Selection of materials, or shaping, to facilitate cooling or heating, e.g. heat sinks
H01L 23/538 - Arrangements for conducting electric current within the device in operation from one component to another the interconnection structure between a plurality of semiconductor chips being formed on, or in, insulating substrates
H01L 25/00 - Assemblies consisting of a plurality of individual semiconductor or other solid-state devices
Disclosed herein are chip packages that integrate multiple compute dies through a single interposer die to a memory stack. The interposer die includes memory controller circuitry that allowing multiple compute dies to access the memory stack in an efficient manner.
H01L 25/18 - Assemblies consisting of a plurality of individual semiconductor or other solid-state devices the devices being of types provided for in two or more different main groups of the same subclass of , , , , or
H01L 23/00 - Details of semiconductor or other solid state devices
H01L 23/053 - ContainersSeals characterised by the shape the container being a hollow construction and having an insulating base as a mounting for the semiconductor body
Embodiments herein describe a circuit for detecting a single event upset (SEU). The circuit includes a latch including an output node, a first parity node, and a second parity node and correction circuitry configured to correct a single event upset (SEU) at the output node using the first and second parity nodes.
The disclosed device includes a memory-semantic fabric comprising memory components accessible by multiple processors and a controller for the memory-semantic fabric. The controller receives, from multiple processes, memory requests for a memory-semantic fabric. The controller also identifies, within the processes, a source process for each of the memory requests and prioritizes forwarding the memory requests to the memory-semantic fabric based on the source processes. Various other methods, systems, and computer-readable media are also disclosed.
The disclosed device can receive a biosignal and, using user input predictions based on the biosignal, pre-render a display frame. The device can also subsequently receive a user input, output the pre-rendered display frame based on the user input confirming the user input predictions and flush the pre-rendered display frame otherwise. The device can also modulate computing performance and power based on computing demands predicted from the biosignal. Various other methods, systems, and computer-readable media are also disclosed.
The disclosed device includes a processor and an interconnect connecting the processor to a memory. The interconnect includes an interconnect agent that can forward memory requests from the processor to the memory and receive requested data returned by the memory. The requested data can include information for a next memory request such that the interconnect agent can send, to the memory, a speculative memory request using information for the next memory request that was received in response to the memory request. Various other methods, systems, and computer-readable media are also disclosed.
A memory controller includes a command queue for receiving memory access requests and an arbiter. The arbiter is operable to allow cross-mode activations during a streak of accesses of a current mode in response to a number of cross-mode accesses present in the command queue exceeding an adaptive threshold.
In accordance with the described techniques, a host processor receives a task graph including tasks and indicating dependencies between the task graph. The host processor formats the task graph, in part, by sorting the tasks of the task graph in an order based on the dependencies between the tasks. Further, the host processor submits the formatted task graph to a scalable input/output virtualization (SIOV) device, which directs the SIOV device to process the tasks of the task graph based on the order.
Memory access validation for input/output operations using an interposer is described. In one or more implementations, an interposer is disposed logically between an input/output device and a memory. The interposer receives a plurality of requests from the input/output device to access the memory non-sequentially in association with an input/output operation. Responsive to each request, the interposer updates an accumulated error code using error-detection logic. Based upon the accumulated error code, the interposer outputs an I/O validity indicator.
A processing system includes a hardware synchronizer to synchronize the transmission of audio data from multiple I2S controllers of a processing system to one or more audio codecs. In some embodiments, each of the I2S controllers receives audio data from one or more audio data sources and stores the audio data at a buffer associated with the controller. The hardware synchronizer initiates synchronized transmission of the audio data from the plurality of controllers to the one or more codecs in response to the buffer associated with each controller being filled to a predetermined level. In some embodiments, until the controllers begin transmission of the audio data, the controllers transmit mute (null) data to the one or more codecs such that the one or more codecs receives a frame start followed by null data for each frame.
A processing system includes dispatch circuitry that sends elements to one or more processing circuits such as shader circuitry for execution. The dispatch circuitry includes a dispatch queue and an arbitration circuit. The dispatch queue stores the elements to be sent to the one or more processing circuits. The arbitration circuit schedules the elements of the dispatch queue for execution based on priority indicators corresponding to the elements. As a result, prioritization of the elements is implemented at the dispatch circuitry in hardware without changing a design of the dispatch queue to store the priority information.
An accelerated processor includes a processor core die including a plurality of compute units, the plurality of compute units including a first level (L1) cache. The accelerated processor also includes a plurality of memory cache dies coupled to the processor core die, the plurality of memory cache dies including a last level cache (LLC) such as a level 3 (L3) cache. The accelerated processor includes an LLC controller to issue a cache access request to the LLC and, based on a latency of the cache access request, direct the cache access request to a subset of the plurality of memory cache dies.
In accordance with the described techniques for data compression using reconfigurable hardware based on data redundancy patterns, a computing device includes a memory, processing-in-memory units, a host processing unit, and a compression unit having reconfigurable logic for performing multiple compression algorithms. The host processing unit issues processing-in-memory requests instructing the processing-in-memory units to scan a block of the memory for one or more data redundancy patterns, and to identify a compression algorithm of the multiple compression algorithms based on the one or more data redundancy patterns. Further, the host processing unit issues a memory request to access a memory address in the block of the memory. The memory request causes data of the memory address to be communicated from the block of the memory to the compression unit to be compressed using the compression algorithm.
A fused bounding volume hierarchy, which is a combination of a base bounding volume hierarchy and one or more non-base bounding volume hierarchies, is generated. For each non-base bounding volume hierarchy, multiple subtrees in the non-base bounding volume hierarchy that include less than a threshold number of child nodes are identified. Each of these subtrees is then fused with the base bounding volume hierarchy at one of the nodes of the base bounding volume hierarchy, and an identifier of the level of detail for the non-base bounding volume hierarchy is included in the node. When displaying a scene or image, for a particular portion of the scene or image the level of detail to use is identified. The fused bounding volume hierarchy is traversed and the geometric objects in the nodes of the fused bounding volume hierarchy corresponding to the identified level of detail are displayed.
The disclosed device includes a cache organized by sets and ways and a control circuit that selects a first way for a cache replacement from a first half of a set of ways. The control circuit also selects another way from a second half of the set of ways, and uses the second way for the cache replacement when the first way is unavailable. Various other methods, systems, and computer-readable media are also disclosed.
Scratchpad memory translation lookaside buffer techniques are described. In an implementation, the techniques described herein relate to a device including a memory management unit implemented in hardware of an integrated circuit to receive a mapping instruction from a mapping instruction source, the mapping instruction specifying a mapping between a virtual memory address and a physical memory address of a scratchpad memory and store a virtual-to-physical mapping entry in a translation lookaside buffer based on the mapping instruction.
G06F 12/1045 - Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] associated with a data cache
29.
SYSTEMS AND METHODS FOR IMPROVING EMBEDDED SUBSTRATE THERMALS
A method can include embedding one or more thermal sources in a semiconductor package substrate and positioning one or more substrate buildup layers above the one or more thermal sources. The method can also include forming one or more thermal vias in the one or more substrate buildup layers. Various other methods and systems are also disclosed.
Efficient memory operation using a destructive read memory array is described. In accordance with the described techniques, a system may include a memory configured to store data of a first logic state in a ferroelectric capacitor when an electric polarization of the ferroelectric capacitor is in a first direction. A system may include a controller configured to erase the data from the memory by commanding the electric polarization of the ferroelectric capacitor in a second direction, opposite of the first direction and skipping a subsequent write operation of a null value to the memory.
G06F 3/06 - Digital input from, or digital output to, record carriers
G11C 11/22 - Digital stores characterised by the use of particular electric or magnetic storage elementsStorage elements therefor using electric elements using ferroelectric elements
H10B 53/30 - Ferroelectric RAM [FeRAM] devices comprising ferroelectric memory capacitors characterised by the memory core region
31.
TECHNIQUE FOR GENERATING A BOUNDING VOLUME HIERARCHY
A technique for building a bounding volume hierarchy is disclosed. The technique includes for a subject node, selecting a dimension along which to perform a split to form child nodes of the subject node; assigning primitives of the subject node to the child nodes; and updating bounds for the child nodes in a next split dimension and not in the other dimensions.
A technique for rendering is provided. The technique includes determining a level of detail for a shade space texture and a screen space; shading the shade space texture having a resolution based on the level of detail; and for a reconstruction operation, performing sampling from the shade space texture, the sampling including a high frequency attenuation of samples of the shade space texture.
A technique for performing ray tracing operations is provided. The technique includes for a ray being tested for intersection with geometry associated with a bounding volume hierarchy, traversing to a pre-filtering node that includes information for filtering out triangles of a leaf node of the bounding volume hierarchy; evaluating a quantized ray that corresponds to the ray against quantized triangles of the pre-filtering node to filter out one or more triangles of the leaf node from consideration; and testing the triangles of the leaf node that are not filtered out and not testing the triangles of the leaf node that are filtered out.
Systems and techniques for selectively transferring one or more portions of a cache block in response to a request are described. Computing system components are informed as to instances where data transfer operations involve moving less than an entirety of data included in a cache block cache block. In one example, executable code for a computational task includes hints that identify when memory requests involve accessing and transmitting less than an entirety of a cache block and cause system components to communicate a subset of the cache block during a memory access. In another example, a data differentiator unit is implemented to analyze a cache block and return a portion of the cache block that is selected based on one or more criteria specified for a computational task. The described techniques thus overcome conventional drawbacks facing systems that transmit an entire cache block when only a portion is needed.
G06F 12/0891 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
A technique for rendering is provided. The technique includes performing a visibility pass that designates portions of shade space textures visible in a scene, wherein the visibility pass generates tiles that cover shade space textures visible in the scene; performing a rate controller operation on output of the visibility pass using spatially-adaptive sampling; performing a sparse shade space shading operation on the tiles that cover the shade space textures visible in the scene based on a result of the spatially-adaptive sampling; performing a regularization operation based on an output of the sparse shade space shading operation; and performing a reconstruction operation using output from the regularization operation to produce a final scene.
A technique for rendering is provided. The technique includes performing a visibility pass that designates portions of shade space textures visible in a scene, wherein the visibility pass generates tiles that cover shade space textures visible in the scene; performing a rate controller operation on output of the visibility pass using spatiotemporal adaptive sampling; performing a shade space shading operation on the tiles that cover the shade space textures visible in the scene based on a result of the spatiotemporal adaptive sampling; performing a regularization operation based on an output of the shade space shading operation; and performing a reconstruction operation using output from the regularization operation to produce a final scene.
Devices and methods for rendering objects using ray tracing are provided which include during a build time: generating an accelerated hierarchy structure comprising data representing an approximate volume bounding a group of geometric shapes representing the objects in the scene and data representing the geometric shapes; and generating additional data used to transform rays, to be cast in the scene, from a high precision space to a low precision space; and during a render time occurring after the build time: performing ray intersection tests, using the additional data generated during the build time, for the rays in the scene; and rendering the scene based on the ray intersection tests. Because the additional data is generated prior to render time, the additional data can be used to perform the ray intersection testing more efficiently.
The disclosed device includes power circuits that can communicate with a control circuit. In response to a power circuit communicating a low efficiency state, the control circuit can redistribute at least a portion of a load of the power circuit to one or more other power circuits. Various other methods, systems, and computer-readable media are also disclosed.
The disclosed device includes a heterogeneous processor architecture having heterogeneous processors, and a control circuit that can assign, in response to an interrupt, the interrupt to one of the heterogenous processors that is selected based on power efficiency. Various other methods, systems, and computer-readable media are also disclosed.
A data interface connector and method of manufacture and/or assembly thereof can include first electrical terminals at a first end of the data interface connector, the first electrical terminals being configured to interface with a mating data interface connector conforming to a first data interface specification. The data interface connector and method of manufacture and/or assembly thereof can include second electrical terminals at a second end of the data interface connector, the second electrical terminals being configured to interface with data interface pads on a circuit board; where the data interface pads have pitches and lengths according to a second data interface specification.
Systems, apparatuses, and methods for rendering textures by prefetching texture data are disclosed. Source texture data is identified based at least in part on one or more programmable instructions. A prefetch of the source texture data is caused based on level of details associated with the source texture data. Further, a list data blocks of the source texture data and a mapping between each data block and corresponding allocated memory address space allocated to each data block on the memory device is maintained. Responsive to a request to load a given data block, the given data block from the memory device is loaded using the list.
Disclosed is a computer-implemented method for model ensemble acceleration in an active learning loop. The method includes receiving a set of datapoint inputs, where each datapoint input is an unlabeled equivalent of other datapoint inputs in the set of datapoint inputs and has a different applied weight value. The method then executes a set of neural network models, where the execution of each neural network model is based on the received set of datapoint inputs. The outputs from the set of neural network models are analyzed, where an inference computation is performed, and a label for the set of datapoints is determined. The method then stores the labeled set of datapoint inputs in a database. Various other methods, systems, and computer-readable media are also disclosed.
An apparatus and method for reducing the memory bandwidth of executing machine learning models. A computing system includes two or more processing nodes, each including at least one or more processors and a corresponding local memory. Switch circuitry communicates with at least the local memories and a system memory of the computing system. The switch includes multiple direct memory access (DMA) interfaces. Each of one or more processing nodes stores multiple embedding rows of embedding tables. A processor of the processing node identifies two or more embedding rows as source operands of a reduction operation. The switch executes memory access requests to retrieve data of the two or more embedding rows from the corresponding local memory, and generates a result by performing the reduction operation. The switch sends the result to the local memory.
An apparatus and method for efficiently managing memory requests. An integrated circuit includes multiple compute circuits, each capable of processing a data block of multiple data blocks. An amount of available data storage space of a cache is smaller than storage space in a memory for storing the multiple data blocks. In various implementations, the multiple compute circuits process data blocks in a contiguous manner, and pointer updating circuitry assigns data block identifiers in a contiguous manner. The circuitry updates the pointer of an initial data block to use for a particular stage of data processing to a value which increases cache hits during the particular stage of data processing. The circuitry accounts for the number of data blocks of intermediate results to increase or decrease for a particular stage of data processing when updating the pointers.
A hybrid bonding method includes fabricating plural semiconductor devices in a region of a bottom wafer adjacent to a front surface thereof, fusion bonding the front surface to a carrier substrate, thinning the bottom wafer opposite to the front surface to expose conductive regions of the semiconductor devices, forming a dielectric layer over a backside of the semiconductor devices, forming openings in the dielectric layer to expose the conductive regions, forming metal pads within the openings, dicing the bottom wafer and the carrier substrate to singulate the plural semiconductor devices, bonding the dielectric layer overlying the backside of the semiconductor devices to a dielectric layer overlying a front surface of a top wafer, bonding the metal pads within the openings in the dielectric layer to metal pads overlying the front surface of the top wafer, and removing the carrier substrate from the front surface of the bottom wafer.
H01L 21/20 - Deposition of semiconductor materials on a substrate, e.g. epitaxial growth
H01L 21/683 - Apparatus specially adapted for handling semiconductor or electric solid state devices during manufacture or treatment thereofApparatus specially adapted for handling wafers during manufacture or treatment of semiconductor or electric solid state devices or components for supporting or gripping
H01L 23/00 - Details of semiconductor or other solid state devices
46.
SYSTEMS AND METHODS FOR ENABLING A FEATURE OF A SEMICONDUCTOR DEVICE
A computer-implemented method for enabling a feature of a semiconductor device can include receiving, by at least one processor of a semiconductor device, a command to enable a feature of the semiconductor device. The method can also include burning, by the at least one processor and in response to the command, an electronic fuse of the semiconductor device. Various other methods, systems, and computer-readable media are also disclosed.
G06F 1/08 - Clock generators with changeable or programmable clock frequency
G06F 21/64 - Protecting data integrity, e.g. using checksums, certificates or signatures
G06F 21/73 - Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information by creating or determining hardware identification, e.g. serial numbers
H01L 23/525 - Arrangements for conducting electric current within the device in operation from one component to another including external interconnections consisting of a multilayer structure of conductive and insulating layers inseparably formed on the semiconductor body with adaptable interconnections
47.
VOLTAGE REGULATOR WITH PROGRAMMABLE TELEMETRY CONFIGURATION
An apparatus can include: a processor; a voltage regulator configured to provide a processor voltage and a processor current to the processor; and a voltage regulator controller that can include a current sensor comprising an analog-to-digital converter (ADC) having an ADC input range and configured to provide current data based on an ADC input voltage, and a configuration manager configured to receive processor power data and adjust the ADC input range based on the processor power data. Various other methods, systems, and computer-readable media are also disclosed.
H03M 1/36 - Analogue value compared with reference values simultaneously only, i.e. parallel type
H03M 1/16 - Conversion in steps with each step involving the same or a different conversion means and delivering more than one bit with scale factor modification, i.e. by changing the amplification between the steps
H04Q 9/00 - Arrangements in telecontrol or telemetry systems for selectively calling a substation from a main station, in which substation desired apparatus is selected for applying a control signal thereto or for obtaining measured values therefrom
A multi-chiplet system includes a first chiplet comprising a first transceiver and a first chiplet-to-chiplet (C2C) interface module, and a second chiplet comprising programmable logic circuitry and a second C2C interface module. The first transceiver is configured to generate a clock, which is transmitted from the first C2C interface module to the second C2C interface module, through a clock transmission wire, for data transfer between the first chiplet and the second chiplet.
A method and computing device is provided for filtering objects of interest of images. The computing device comprises an image capturing device and memory configured to store objects of interest. In one example, the computing device comprises a processor configured to, for a captured image, determine one or more regions of interest in the image based on the objects of interest and modify the image based on the determined regions of interest. In another example, the computing device comprises a first processor configured to determine one or more regions of interest to be modified in an image based on the one or more objects of interest and a second processor configured to convert the image to be processed by the first processor and modify the image based on regions of interest determined by the first processor. The image is displayed without the one or more objects of interest being viewable.
Methods and apparatus employ a plurality of heterogeneous compute units and a plurality of non-compute units operatively coupled to the plurality of compute units. Power management logic (PML) determines a memory bandwidth level associated with a respective workload running on each of a plurality of heterogeneous compute units on the IC, and adjusts a power level of at least one non-compute unit of a memory system on the IC from a first power level to a second power level, based on the determined memory bandwidth levels. Memory access latency is also taken into account in some examples to adjust a power level of non-compute units.
The disclosed device includes a processor and an interconnect connecting the processor to a memory. The interconnect includes an interconnect agent that can forward memory requests from the processor to the memory and receive requested data returned by the memory. The requested data can include information for a next memory request such that the interconnect agent can send, to the memory, a speculative memory request using information for the next memory request that was received in response to the memory request. Various other methods, systems, and computerreadable media are also disclosed.
G06F 15/78 - Architectures of general purpose stored program computers comprising a single central processing unit
G06F 9/38 - Concurrent instruction execution, e.g. pipeline or look ahead
G06F 12/0811 - Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
G06F 12/0862 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
Systems, apparatuses, and methods for rendering textures by prefetching texture data are disclosed. Source texture data is identified based at least in part on one or more programmable instructions. A prefetch of the source texture data is caused based on level of details associated with the source texture data. Further, a list data blocks of the source texture data and a mapping between each data block and corresponding allocated memory address space allocated to each data block on the memory device is maintained. Responsive to a request to load a given data block, the given data block from the memory device is loaded using the list.
G06F 12/0862 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
A technique for performing ray tracing operations is provided. The technique includes for a ray being tested for intersection with geometry associated with a bounding volume hierarchy, traversing to a pre-filtering node that includes information for filtering out triangles of a leaf node of the bounding volume hierarchy; evaluating a quantized ray that corresponds to the ray against quantized triangles of the pre-filtering node to filter out one or more triangles of the leaf node from consideration; and testing the triangles of the leaf node that are not filtered out and not testing the triangles of the leaf node that are filtered out.
Devices and methods for rendering objects using ray tracing are provided which include generating a low resolution version of a high resolution mesh representing objects in the scene, determining points on curved surfaces of curved surface patches defined for one of triangles and bi-linear quadrangles of the low resolution version of the high resolution mesh, performing ray intersection testing by casting rays toward surfaces of the high resolution mesh which are approximated from new points calculated by offset values along interpolated normals from the points on the curved surfaces of the curved surface patches and rendering the objects in the scene based on the ray intersection testing.
An apparatus and method for reducing the memory bandwidth of executing machine learning models. A computing system includes two or more processing nodes, each including at least one or more processors and a corresponding local memory. Switch circuitry communicates with at least the local memories and a system memory of the computing system. The switch includes multiple direct memory access (DMA) interfaces. Each of one or more processing nodes stores multiple embedding rows of embedding tables. A processor of the processing node identifies two or more embedding rows as source operands of a reduction operation. The switch executes memory access requests to retrieve data of the two or more embedding rows from the corresponding local memory, and generates a result by performing the reduction operation. The switch sends the result to the local memory.
A processing system includes two or more graphics cores each disposed on respective dies and configured for concurrent processing of command packets. To this end, the processing system is configured to determine two or more command partitions associated with a command packet and to assign each command partition to a graphics core. Each graphics core then executes the same command packet by only performing instructions of the command packet associated with the command partitions assigned to the graphics core. Further, after executing an instructions of the command packet based on one or more assigned partitions, each graphics core adjusts one or more counters used to synchronize the execution of the command packet across the graphics cores.
An apparatus and method for efficiently managing performance and power consumption among replicated functional blocks of an integrated circuit despite different circuit behavior amongst the functional blocks due to manufacturing variations. An integrated circuit includes multiple replicated functional blocks, each being a semiconductor die with a corresponding communication fabric for routing packets. A second functional block placed between a first functional block and a third functional block routes packets to destinations from at least the first and the third functional blocks, and provides higher performance than the first and the third functional blocks due to semiconductor manufacturing variations. A power manager assigns a single power supply voltage to the replicated functional blocks, and assigns a target clock frequency to the first and the third functional blocks. The power manager assigns another clock frequency greater than the target clock frequency to the second functional block.
In accordance with the described techniques, a host processor receives a task graph including tasks and indicating dependencies between the task graph. The host processor formats the task graph, in part, by sorting the tasks of the task graph in an order based on the dependencies between the tasks. Further, the host processor submits the formatted task graph to a scalable input/output virtualization (SIOV) device, which directs the SIOV device to process the tasks of the task graph based on the order.
A computer-implemented method for ensuring processing unit hardware state integrity in live migration can include participating as a source, by a processing unit, in a live migration procedure by injecting, into a live migration data package containing a state of the processing unit, a signature verifying the state. The method can additionally include participating as a target, by the processing unit, in an additional live migration procedure migrating an additional live migration data package containing an additional state of an additional processing unit by performing an integrity check based on an additional signature, in the additional live migration data package, verifying the additional state. Various other methods, systems, and computer-readable media are also disclosed.
A processing system stores a boot image for a critical domain of a system-on-a-chip (SOC) at a bank of a static random-access memory (SRAM) that is shared by the critical domain and a non-critical domain and that is powered independently from the non-critical domain. The SOC includes a secure processor that loads the boot image to the bank of the SRAM and then blocks subsequent write access to the bank. Because the critical domain is powered independently from the non-critical domain, the bank of the SRAM retains the boot image without regard to the power state of the non-critical domain. In addition, the critical domain implements a boot process that is decoupled from a CPU at the non-critical domain, ensuring that the critical domain can initiate a re-boot sequence even if the non-critical domain is not powered.
A technique for rendering is provided. The technique includes performing a visibility operation to generate shade space visibility information and reconstruction information; performing a shade space shading operation based on the shade space visibility information generate shaded shade space textures; and performing a reconstruction operation based on the reconstruction information and the shaded shade space textures.
The disclosed device includes a processing component having various compute blocks, and a control circuit that switches at least one of the compute blocks from a normal voltage rail for the processing component to a second voltage rail in response to power gating a normal voltage rail. Various other methods, systems, and computer-readable media are also disclosed.
Non-blocking processing system are described. In accordance with the described techniques, a pending range store receives, at a start of a bulk memory operation, a pending memory range of the bulk memory operation. A logic unit includes at least one of check conflict logic or check address logic. The logic unit detects a conflicting memory access based on a target address of the pending memory range conflicting with a memory access request separate from the bulk memory operation. The logic unit performs at least a portion of the bulk memory operation associated with the target address before the memory access request is allowed to proceed.
A technique for rendering is provided. The technique includes performing a visibility pass that designates portions of shade space textures visible in a scene, wherein the visibility pass generates tiles that cover the shade space textures visible in the scene; performing a temporal rate controller operation; performing a shade space shading operation on the tiles that cover the shade space textures visible in the scene based on a temporal shading rate output by the temporal rate controller operation, wherein only a subset of samples in the tiles that cover the shade space textures visible in the scene are shaded in the shade space shading operation; and performing a reconstruction operation using output from the shade space shading operation to produce a final scene.
An apparatus includes a controller that generates a configuration base address register (BAR) allocation lookup table in a first memory, such as SRAM. The configuration BAR allocation lookup table includes at least one BAR configuration entry associated with each of a plurality of peripheral devices wherein each BAR configuration entry includes configuration data for configuring at least one corresponding physical BAR associated with a corresponding peripheral device. The controller configures one or more physical BARs based on the configuration data in the configuration BAR allocation lookup table. Associated methods are also presented.
An apparatus and method for efficiently managing performance and power consumption among replicated functional blocks of an integrated circuit despite different circuit behavior amongst the functional blocks due to manufacturing variations. An integrated circuit includes multiple replicated functional blocks, each being a semiconductor die with a corresponding communication fabric for routing packets. A second functional block placed between a first functional block and a third functional block routes packets to destinations from at least the first and the third functional blocks, and provides higher performance than the first and the third functional blocks due to semiconductor manufacturing variations. A power manager assigns a single power supply voltage to the replicated functional blocks, and assigns a target clock frequency to the first and the third functional blocks. The power manager assigns another clock frequency greater than the target clock frequency to the second functional block.
An apparatus and method for efficiently performing address translation requests. An integrated circuit includes a system memory that stores address mappings, and the circuitry of one or more clients processes one or more applications and generate address translation requests. A translation lookaside buffer (TLB) stores, in multiple entries, address mappings retrieved from the system memory. Circuitry of a client processes one or more applications and generates address translation requests. The entries of the TLB stores address mappings corresponding to different address mapping types and different virtual functions to avoid searches of multiple other lower-level TLBs that are significantly larger and have larger access. In addition, the TLB is implemented with a relatively small number of entries and uses fully associative data storage arrangement to further reduce access latencies.
To help enable the faster translation of identifiers during the replay of a captured workload, a processor is configured to generate and insert one or more translation commands into the captured workload. To this end, the processor includes one or more processor cores configured to capture a workload that includes graphics calls referencing an identifier stored in a memory object of a processing unit. Based on the graphics calls referencing the identifier stored in a memory object of a processing unit, the processor cores generate a translation command that includes instructions configured to translate the identifier stored in a memory object of a processing unit to a runtime identifier. After generating the translation command, the processor cores then insert the generated translation command in the captured workload at a location based on the graphics calls referencing the identifier.
A processing system includes two or more graphics cores each disposed on respective dies and configured for concurrent processing of command packets. To this end, the processing system is configured to determine two or more command partitions associated with a command packet and to assign each command partition to a graphics core. Each graphics core then executes the same command packet by only performing instructions of the command packet associated with the command partitions assigned to the graphics core. Further, after executing an instructions of the command packet based on one or more assigned partitions, each graphics core adjusts one or more counters used to synchronize the execution of the command packet across the graphics cores.
Fast Fourier transforms for processing-in-memory are described. In accordance with the described techniques, a computing device includes a memory, a host processing unit, and a processing-in-memory unit that operates on data of one or more banks of the memory. The host processing unit stores interacting elements of a fast Fourier transform at locations in the one or more banks. The locations are mapped to a lane of the processing-in-memory unit. The host processing unit issues processing-in-memory commands instructing the processing-in-memory unit to load the interacting elements from the locations into the lane of the processing-in-memory unit, and execute an operation on the interacting elements.
G06F 17/14 - Fourier, Walsh or analogous domain transformations
G06F 7/48 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state deviceMethods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices
G06F 13/16 - Handling requests for interconnection or transfer for access to memory bus
71.
Speculative Cache Invalidation for Processing-in-Memory Instructions
Speculative cache invalidation techniques for processing-in-memory instructions are described. In one example, a system includes a cache system including a plurality of cache levels and a cache coherence controller. The cache coherence controller is configured to perform a cache directory lookup using a cache directory. The cache directory lookup is configured to indicate whether data associated with a memory address specified by a processing-in-memory request is valid in memory. The system employs speculative evaluation logic to identify whether the data associated with the processing-in-memory request is stored in the cache system before the processing-in-memory request is transmitted to the cache coherence controller. If the data is stored in the cache system, the cache system locally invalidates or flushes the data to avoid stalling the processing-in-memory request during a cache directory lookup.
G06F 12/0897 - Caches characterised by their organisation or structure with two or more cache hierarchy levels
G06F 12/0891 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
72.
Preemptive Flushing of Processing-in-Memory Data Structures
Preemptive flushing of data involved in executing a processing-in-memory command, from a cache system to main memory that is accessible by a processing-in-memory component, is described. In one example, a system includes an asynchronous flush controller that receives an indication of a subsequent processing-in-memory command to be executed as part of performing a computational task. While earlier commands of the computational task are executed, the asynchronous flush controller evicts or invalidates data elements involved in executing the subsequent processing-in-memory command from the cache system, such that the processing-in-memory command can proceed without stalling.
G06F 12/0897 - Caches characterised by their organisation or structure with two or more cache hierarchy levels
G06F 12/0891 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
73.
Method and Apparatus for Collaborative Memory Accesses
Method and apparatus for collaborative memory accesses is described. A system includes a memory controller that receives a command from a host. The command is associated with at least one of a plurality of data elements. The memory controller causes execution of data casting operations that adjust a bit size of the plurality of data elements to generate casted data elements. The system includes an interface for communicating data between the host and a memory.
Methods and systems are provided for generating a stylized representation of a non-player character (NPC) in a virtual environment. A multimodal plurality of inputs regarding characteristics of the NPC is received, which is processed to generate visual data representing the NPC's appearance and to generate behavior data representing the NPC's actions. The generated visual data and behavior data are adapted to a selected character model to create an adapted configuration model, which is used to generate rendering information for the NPC.
A memory system includes a memory controller and memory circuitry. The memory controller outputs a first training signal. The memory circuitry is coupled to the memory controller. The memory circuitry includes a memory device and multiplexing data buffer circuitry. The multiplexing data buffer circuitry is coupled to the memory device. The multiplexing data buffer circuitry includes first circuitry and second circuitry. The second circuitry is coupled to the memory device. The second circuitry receives the first training signal from memory controller comprising first training data associated with the first circuitry, writes the first training data to the memory device, and read the written first training data from the memory device, and outputs the written first training data to the memory controller. The memory controller is configured to determine equalization parameters for the first circuitry based on the written first training data.
The disclosed device includes power circuits that can communicate with a control circuit. In response to a power circuit communicating a low efficiency state, the control circuit can redistribute at least a portion of a load of the power circuit to one or more other power circuits. Various other methods, systems, and computer-readable media are also disclosed.
The disclosed device includes a heterogeneous processor architecture having heterogeneous processors, and a control circuit that can assign, in response to an interrupt, the interrupt to one of the heterogenous processors that is selected based on power efficiency. Various other methods, systems, and computer-readable media are also disclosed.
A processing system includes dispatch circuitry that sends elements to one or more processing circuits such as shader circuitry for execution. The dispatch circuitry includes a dispatch queue and an arbitration circuit. The dispatch queue stores the elements to be sent to the one or more processing circuits. The arbitration circuit schedules the elements of the dispatch queue for execution based on priority indicators corresponding to the elements. As a result, prioritization of the elements is implemented at the dispatch circuitry in hardware without changing a design of the dispatch queue to store the priority information.
A data processor that is operable to be coupled to a memory includes a memory operation array, a controller, a refresh logic circuit, and a selector. The memory operation array is for storing memory operations for a first power state of the memory. The controller is responsive to a power state change request to execute a plurality of memory operations from the memory operation array when the first power state is selected. The refresh logic circuit generates refresh cycles periodically for the memory. The selector is for multiplexing the refresh cycles with the memory operations during a power state change to the first power state.
Systems, apparatuses, and methods for implementing efficient power optimization in a computing system are disclosed. A system management unit configured to track computing activity of a computing device while processing each frame of a plurality of frames. The computing activity is tracked at least for a given period of time comprising a plurality of time slices. The system management unit further correlates a time slice associated with a given frame with a time slice associated with at least one previously processed frame from the plurality of frames, based at least in part on the tracked computing activity. The system management unit predicts a clock frequency to render the given frame, based at least in part on the correlation and renders the given frame using the predicted clock frequency.
G09G 5/36 - Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of individual graphic patterns using a bit-mapped memory
81.
DEVICES, SYSTEMS, AND METHODS FOR DYNAMICALLY CHANGING FREQUENCIES OF CLOCKS FOR THE DATA LINK LAYER WITHOUT DOWNTIME
An exemplary method for dynamically changing frequencies of clocks for the data link layer without downtime involves switching a first queue on a first end of a data link and a second queue on a second end of the data link from a pacing mode to an asynchronous mode. The exemplary method also involves modifying a frequency of a clock associated with the data link. The exemplary method further involves returning the first queue and the second queue from the asynchronous mode to the pacing mode upon modifying the frequency of the clock. Various other devices, systems, and methods are also disclosed.
G06F 1/08 - Clock generators with changeable or programmable clock frequency
G06F 1/324 - Power saving characterised by the action undertaken by lowering clock frequency
G06F 5/06 - Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising
A multi-chiplet system includes a first chiplet comprising a first transceiver and a first chiplet-to-chiplet (C2C) interface module, and a second chiplet comprising programmable logic circuitry and a second C2C interface module. The first transceiver is configured to generate a clock, which is transmitted from the first C2C interface module to the second C2C interface module, through a clock transmission wire, for data transfer between the first chiplet and the second chiplet.
A computer-implemented method for ensuring processing unit hardware state integrity in live migration can include participating as a source, by a processing unit, in a live migration procedure by injecting, into a live migration data package containing a state of the processing unit, a signature verifying the state. The method can additionally include participating as a target, by the processing unit, in an additional live migration procedure migrating an additional live migration data package containing an additional state of an additional processing unit by performing an integrity check based on an additional signature, in the additional live migration data package, verifying the additional state. Various other methods, systems, and computer-readable media are also disclosed.
Enhanced methods for memory context restore are described. A device may include a physical layer (PHY) having an interface to support communication of command signals and data with a physical memory. The PHY implements a training mode to train the interface, detect values of a plurality of parameters as part of training the interface, and store the detected values as initial training data. The PHY also implements a retraining mode to use the initial training data as seed data to retrain the interface.
Selectively bypassing cache directory lookups for processing-in-memory instructions is described. In one example, a system maintains information describing a status—clean or dirty—of a memory address, where a dirty status indicates that the memory address is modified in a cache and thus different than the memory address as represented in system memory. A processing-in-memory request involving the memory address is assigned a cache directory bypass bit based on the status of the memory address. The cache directory bypass bit for a processing-in-memory request controls whether a cache directory lookup is performed after the processing-in-memory request is issued by a processor core and before the processing-in-memory request is executed by a processing-in-memory component.
A data processor that is operable to be coupled to a memory includes a memory operation array, a controller, a refresh logic circuit, and a selector. The memory operation array is for storing memory operations for a first power state of the memory. The controller is responsive to a power state change request to execute a plurality of memory operations from the memory operation array when the first power state is selected. The refresh logic circuit generates refresh cycles periodically for the memory. The selector is for multiplexing the refresh cycles with the memory operations during a power state change to the first power state.
A circuit design emulation system having a plurality of integrated circuits (ICs) includes a first IC. The first IC includes an originator circuit configured to issue a request of a transaction directed to a completer circuit. The request is specified in a communication protocol. The first IC includes a completer transactor circuit coupled to the originator circuit and configured to translate the request into request data. The first IC includes a first interface circuit configured to synchronize the request data from an originator clock domain to a transceiver clock domain operating at a higher frequency than the originator clock domain. The first IC includes a first transceiver circuit configured to convey the request data over a communication link that operates asynchronously to the originator clock domain.
Graph analytics system are described. In accordance with the described techniques, a graph having vertices that include a first vertex and a second vertex that are associated with access control metadata are received. An updated graph is output based on a merging of the first vertex and the second vertex into a merged vertex of a group of vertices based on the first vertex and the second vertex being associated with access control metadata common to the first vertex and the second vertex and based on a reordering technique. A single copy of the access control metadata is stored for the first vertex and the second vertex.
Techniques for detecting a digital pseudo-random sequence (PRS) using fast locking, including repeatedly computing a first PRS seed based on an ADC output, generating a PRS sequence based on the first seed, computing a second PRS seed based on the sequence, and comparing the sequence to the ADC output (comparison results may be provided as a bool signal), until the sequence matches the ADC output. Thereafter, the technique may include re-computing the sequence based on the second seed, re-computing the second seed based on the re-computed sequence and comparing the re-computed sequence to the ADC output. The technique may further include setting a lock when a threshold number of sequences computed from the second seed match the ADC output, and reverting to computing the sequence based on the first seed if a sequence computed from the second seed does not match the ADC output and the lock is not set.
A processor employs work items to manage traversal of an acceleration structure, such as a ray tracing structure, at a hardware traversal engine of a processing unit. The work items are structures having a relatively small memory footprint, where each work item is associated both with a ray and with a corresponding portion of the acceleration structure. The hardware traversal engine employs a work items to manage the traversal of the corresponding portion of the acceleration structure for the corresponding ray.
Disclosed herein are a power cable and a power supply system including the power cable. The power cable includes a 12VHPWR connector disposed at a first end of the power cable; a plurality of PCIE 2×4 connectors disposed at a second end of the power cable; and a converting circuit coupled with the 12VHPWR connector via a first cable and coupled with the plurality of the PCIE 2×4 connectors via a plurality of second cables. The converting circuit includes at least one AND gate configured to couple sensing pins of the plurality of PCIE 2×4 connectors with a sensing pin of the 12VHPWR connector.
H03K 19/20 - Logic circuits, i.e. having at least two inputs acting on one outputInverting circuits characterised by logic function, e.g. AND, OR, NOR, NOT circuits
92.
SYSTEMS AND METHODS FOR COOLING ACCELERATORS HAVING BACK SIDE POWER DELIVERY COMPONENTS
A method for cooling accelerators having back side power delivery components can include providing a printed circuit board having a first side that includes an integrated circuit and a first set of one or more power delivery components and a second side that is opposite the first side and that includes a second set of one or more power delivery components. The method can also include positioning a first cooling system to cool the integrated circuit and the first set of one or more power delivery components. The method can further include positioning a second cooling system to cool the second set of one or more power delivery components. Various other methods and systems are also disclosed.
The disclosed computing device can include host circuitry configured to provide a physical function and guest circuitry configured to provide a virtual function. The host circuitry is configured to dynamically assign request identifiers for accessing at least the host circuitry in a manner that allows the request identifiers to change on a command-to-command basis instead of a time-to-time basis that uses fixed value request identifiers in time slices. Various other methods, systems, and computer-readable media are also disclosed.
The disclosed computer-implemented method for video encoding rate control can include governing, by at least one processor, a video encoding rate at least partly in response to video encoding quality information. The method can additionally include generating, by the at least one processor, an encoded video data bitstream based on input pixel data and according to the video encoding rate. The method can also include determining, by the at least one processor, the video encoding quality information based on reconstructed pixel data. Various other methods, systems, and computer-readable media are also disclosed.
H04N 19/149 - Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
H04N 19/154 - Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
95.
METHOD AND APPARATUS FOR ENABLING MIMD-LIKE EXECUTION FLOW ON SIMD PROCESSING ARRAY SYSTEMS
A method, apparatus and computer readable medium that use of a lightweight finite state machine (FSM) control flow block to enable limited execution of data-dependent control flow, thereby enhancing the control flow flexibility of array scale SIMD processors. In certain cases, the FSM block contains registers responsible for decoding and managing single global instructions into multiple local instructions that can incorporate data-dependent control flow.
Devices and methods for rendering objects using ray tracing are provided which include generating a low resolution version of a high resolution mesh representing objects in the scene, determining points on curved surfaces of curved surface patches defined for one of triangles and bi-linear quadrangles of the low resolution version of the high resolution mesh, performing ray intersection testing by casting rays toward surfaces of the high resolution mesh which are approximated from new points calculated by offset values along interpolated normals from the points on the curved surfaces of the curved surface patches and rendering the objects in the scene based on the ray intersection testing.
A computer-implemented method for dynamic resource management can include evaluating, by at least one processor, whether a priority of one or more processes associated with a request for one or more shared resources meets a threshold condition. The method can additionally include determining, by the at least one processor and in response to an evaluation that the priority meets the threshold condition, whether the one or more shared resources is available to meet the request. The method can further include completing, by the at least one processor and in response to a determination that the one or more shared resources is available, execution of the one or more processes. Various other methods, systems, and computer-readable media are also disclosed.
A memory system includes a memory controller and memory circuitry. The memory controller outputs a first training signal. The memory circuitry is coupled to the memory controller. The memory circuitry includes a memory device and multiplexing data buffer circuitry. The multiplexing data buffer circuitry is coupled to the memory device. The multiplexing data buffer circuitry includes first circuitry and second circuitry. The second circuitry is coupled to the memory device. The second circuitry receives the first training signal from memory controller comprising first training data associated with the first circuitry, writes the first training data to the memory device, and read the written first training data from the memory device, and outputs the written first training data to the memory controller. The memory controller is configured to determine equalization parameters for the first circuitry based on the written first training data.
An exemplary method for dynamically changing frequencies of clocks for the data link layer without downtime involves switching a first queue on a first end of a data link and a second queue on a second end of the data link from a pacing mode to an asynchronous mode. The exemplary method also involves modifying a frequency of a clock associated with the data link. The exemplary method further involves returning the first queue and the second queue from the asynchronous mode to the pacing mode upon modifying the frequency of the clock. Various other devices, systems, and methods are also disclosed.
A disclosed semiconductor device includes (1) a silicon stack comprising a front-side Back-End-of-Line (BEOL) stack and a back side BEOL stack, the front-side BEOL stack comprising a plurality of signal routes and the back-side BEOL stack comprising a plurality of power delivery routes, and (2) a plurality of auxiliary power paths formed within the front-side BEOL stack and electrically coupled to the plurality of power delivery routes of the back-side BEOL stack via a plurality of programmable switches, the plurality of power delivery routes, the plurality of programmable switches, and the plurality of auxiliary power paths forming a programmable power delivery network (PDN). Various other apparatuses, systems, and methods of operation are also disclosed.