A system comprises a machine check architecture and a processor. The machine check architecture is configured to log hardware errors. The processor is configured to obtain a log of one or more of the hardware errors from the machine check architecture and/or to generate a copy of the log. The processor is further configured to either (1) deliver the log to an in-band agent and the copy of the log to an out-of-band agent or (2) deliver the copy of the log to the in-band agent and the log to the out-of-band agent. Various other devices, systems, and methods are also disclosed.
A power manager of an apparatus exposes an application programming interface(API) usable for applications to specify priority and quality-of-service (QoS) parameters (e.g., bandwidth requirements) for a workload. An application, for instance, specifies the priority and QoS parameters for a workload to be processed using a hardware compute unit. The power manager employs the priority and QoS parameters to configure the bandwidth allocation to access a memory system. In particular, the bandwidth allocation and prioritization are dynamically extended to real-time and best-effort workloads to satisfy specified QoS parameters for inference workloads and improve user experiences.
Temporary system adjustment for component overclocking is described. In accordance with the described techniques, a processor and/or memory are operated according to first settings. During operation of the processor and/or the memory according to the first settings, a signal triggers a temporary adjustment of operation of the processor and/or the memory according to second settings. Responsive to the request, operation of the processor and/or the memory is switched to the second settings without rebooting. After a duration, operation of the processor and/or the memory is switched back to the first settings. In one or more implementations, at least one of the first settings or the second settings overclock the processor and/or the memory.
An exemplary method for performing lane-specific error detection in high-speed data links involves receiving, at a receiver, an ordered set of data from a transmitter communicatively coupled to the receiver via a data link. The exemplary method also involves identifying, in the ordered set of data, an error-detection reference. The exemplary method further involves performing, in connection with one data lane of the data link, an error-detection operation on the ordered set of data based at least in part on the error-detection reference. Various other devices, systems, and methods are also disclosed.
An exemplary method for performing lane-specific error detection in high-speed data links involves receiving, at a receiver, an ordered set of data from a transmitter communicatively coupled to the receiver via a data link. The exemplary method also involves identifying, in the ordered set of data, an error-detection reference. The exemplary method further involves performing, in connection with one data lane of the data link, an error-detection operation on the ordered set of data based at least in part on the error-detection reference. Various other devices, systems, and methods are also disclosed.
A system includes a memory device and a memory controller operatively connected to the memory device via a physical layer interface (PHY). The PHY includes an active first-in-first-out (FIFO) buffer configured to receive commands from the memory controller. The PHY also includes one or more on-demand FIFO buffers configured to be selectively enabled by the active first-in-first-out buffer to handle a data payload. The system ensures efficient power usage by gating clocks and clock distribution to the one or more on-demand FIFO buffers.
A technique for rendering is provided. The technique includes mapping a randomization portion of an item of identifying information to a random block of an address space; mapping a linear portion of the item of identifying information to an element within the block; and accessing the element.
Systems and methods described herein use multiple reduced-precision intersection testers in parallel to determine candidate nodes to traverse in a wide BVH. Primitives are quantized to generate primitive packets, that are stored compactly in, with, or near a leaf node. At the leaves of the BVH, these intersection testers test a ray simultaneously against a plurality of triangles in the primitive packet to find candidate triangles that require full-precision intersection. Triangles or primitives that generate an inconclusive result during low-precision testing are retested using full-precision testers to definitively determine ray-triangle hits or misses. Testing the quantized triangles simultaneously using low-precision testers culls instances wherein the ray misses a box or a triangle that need not be tested using higher precision.
Devices and methods for allocating components of a safety critical system are provided. The processing device comprises resources including memory, a host processor and a plurality of processors connected to the resources via a shared pathway of a network and configured to execute an application based on instructions from the host processor. Each of the plurality of processors is assigned to one of a plurality of criticality domain levels and isolated pathways are created, via the shared pathway, between the plurality of processors and the plurality of resources based on which of the processors are assigned to one or more of the plurality of criticality domain levels to access one or more of the plurality of resources. The application is executed using the network. The isolated pathways are, for example, created by disabling one or more switches. Alternatively, the isolated pathways are created via programmable logic.
G06F 9/50 - Allocation of resources, e.g. of the central processing unit [CPU]
G06F 9/38 - Concurrent instruction execution, e.g. pipeline or look ahead
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 15/173 - Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star or snowflake
G06F 15/78 - Architectures of general purpose stored program computers comprising a single central processing unit
10.
A WAY TO LAUNCH A LARGE NUMBER OF GAME INSTANCES IN DIFFERENT LEVELS ON A CLOUD PLATFORM
Systems and methods for efficient sharing of memory space in cloud-based applications are described. Data that can be shared between multiple instances of an application is identified and a dedicated memory space is allocated to such data. Whether the data can be shared or not is determined based on the data's content, to avoid corruption and irregular allocations. In conditions where data needs to be shared, a processing circuitry can determine if the data is already in use by another application instance. If so, a shared memory comprising the data is identified and a reference counter for the shared memory is updated. If no other application instances currently use the data, a selected shared memory is assigned to the data and the data is copied from its dedicated memory space to the selected shared memory. In either condition, the original memory space is freed-up, thereby ensuring efficient memory usage.
A technique includes determining a base decision rate; monitoring for key events; and based on the base decision rate and the monitoring, determining a time at which to generate an action to be performed by an application entity of an application. The base decision rate includes a baseline rate at which the application is directed to determine new actions to be performed by the application entity. In some examples, the base decision rate is determined using a trained AI model, by applying information about the state of the application to the model and obtaining the base decision rate in response. In some examples, key events are unexpected events that occur in the application. In some examples, since the base decision rate represents a rate at which to generate actions, given the current state of the application, the key events, which represent unexpected events, override or modify the base decision rate.
A63F 13/67 - Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor adaptively or by learning from player actions, e.g. skill level adjustment or by storing successful combat sequences for re-use
A63F 13/56 - Computing the motion of game characters with respect to other game characters, game objects or elements of the game scene, e.g. for simulating the behaviour of a group of virtual soldiers or for path finding
12.
DETECTING ERRORS WITHIN DATA PATH CIRCUITRY OF A MEMORY DEVICE
A memory device includes core circuitry including memory cells, and write data path circuitry coupled to the core circuitry. The write data path circuitry determines a second parity bit from a second signal and a poison bit. The second signal and the poison bit are determined by processing a first data signal. Further, the write data path circuitry detects a first error within the second signal based on a comparison between a first parity bit and the second parity bit, and outputs a first error signal comprising the first error.
Devices and methods for allocating components of a safety critical system are provided. The processing device comprises resources including memory, a host processor and a plurality of processors connected to the resources via a shared pathway of a network and configured to execute an application based on instructions from the host processor. Each of the plurality of processors is assigned to one of a plurality of criticality domain levels and isolated pathways are created, via the shared pathway, between the plurality of processors and the plurality of resources based on which of the processors are assigned to one or more of the plurality of criticality domain levels to access one or more of the plurality of resources. The application is executed using the network. The isolated pathways are, for example, created by disabling one or more switches. Alternatively, the isolated pathways are created via programmable logic.
A processing unit identifies errors in a sequence of shaders by capturing snapshots at different points in the shader sequence, wherein each snapshot represents image data or other data at the corresponding point in the shader sequence. A snapshot comparer compares the snapshots to corresponding reference snapshots, wherein each reference snapshot represents the expected data at the corresponding point in the sequence of shaders. Errors in one or more of the shaders in the shader sequence are identified based on the comparison.
To perform ray traversals of displaced micro-meshes (DMMs), a processing system includes an accelerator unit (AU). The AU is configured to first generate a DMM including one or more base triangles. The AU then generates an initial bounding volume around a first base triangle of the DMM. Further, the AU bounds one or more sides of the initial bounding volume with respective bounding volumes to produce a prism bounding volume around the base triangle. The AU is then configured to determine whether a ray intersects the prism volume bounding the first base triangle of the DMM.
A memory device includes core circuitry including memory cells, and write data path circuitry coupled to the core circuitry. The write data path circuitry determines a second parity bit from a second signal and a poison bit. The second signal and the poison bit are determined by processing a first data signal. Further, the write data path circuitry detects a first error within the second signal based on a comparison between a first parity bit and the second parity bit, and outputs a first error signal comprising the first error.
The disclosed computer-implemented method includes transforming, from a time domain into a frequency domain, a sound signal into a transformed sound signal. The transformed sound signal has a phase component and a magnitude component. The method also includes filtering the phase component of the transformed sound signal by applying a quantized mask from a machine-learning model to the phase component, and generating a filtered sound signal by transforming, from the frequency domain into the time domain, the transformed sound signal comprising the magnitude component and the filtered phase component. Various other methods, systems, and computer-readable media are also disclosed.
A computing system includes a display controller and a processing device external to the display controller. The display controller includes a content verification circuit configured to generate a derived value representing visual content of interest (COI) within an image frame for a region of interest (ROI) on at least one display device. The processing device includes an error-detection circuit configured to perform an error-detection process for the visual COI based on the derived value.
A system utilizes a hybrid chroma subsampling process in which a source device generates a plurality of motion layers from an input image of a video stream. Each motion layer is associated with a different motion criterium and includes data from the image only for those regions of pixels that meet the corresponding motion criterium. The source device generates each motion layer with a different degree of chroma subsampling based on the motion criterium associated with the motion layer. The resulting plurality of motion layers are transmitted to a sink device. The sink device decodes the motion layers and then generates a composite image from the resulting motion layers, the composite image representing the input image with different degrees of chroma subsampling for different regions based on the degree of motion in each region.
H04N 19/132 - Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
H04N 19/139 - Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
H04N 19/17 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
H04N 19/182 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
H04N 19/186 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
H04N 19/85 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
20.
REDUCING 3D LOOKUP TABLE INTERPOLATION ERROR WHILE MINIMIZING ON-CHIP STORAGE
Systems, apparatuses, and methods for reducing three dimensional (3D) lookup table (LUT) interpolation error while minimizing on-chip storage are disclosed. A processor generates a plurality of mappings from a first gamut to a second gamut at locations interspersed throughout a 3D representation of the pixel component space. For example, in one implementation, the processor calculates mappings for 17×17×17 vertices within the 3D representation. Other implementations can include other numbers of vertices. Rather than increasing the number of vertices to reduce interpolation error, the processor calculates mappings for centroids of the sub-cubes defined by the vertices within the 3D representation of the first gamut. This results in a smaller increase to the LUT size as compared to increasing the number of vertices. The centroid mappings are used for performing tetrahedral interpolation to map source pixels in the first gamut into the second gamut with a reduced amount of interpolation error.
G09G 5/06 - Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the way in which colour is displayed using colour palettes, e.g. look-up tables
21.
DEVICES AND SYSTEMS FOR FLYING BITLINE WITH JUMPER CELL
The disclosed device can include a bitcell array located on a first metal layer including a first subarray of bitcells and a second subarray of bitcells; a first write driver device coupled to the first subarray of bitcells from a first end of the first subarray; a second write driver device coupled to the second subarray of bitcells from a first end of the second subarray; a third write driver device coupled to the first subarray of bitcells from a second end of the first subarray; and a fourth write driver device coupled to the second subarray of bitcells from the second end of the second subarray. Various other devices, systems, and methods of manufacture are also disclosed.
Systems and methods for efficient memory management during ray tracing are described. A ray tracing system assigns a memory stack to a ray. The ray, when intersection tested against objects of a node, accesses data that is stored in the memory stack. When data is to be consumed from the memory stack by the ray, the ray tracing system uses a memory pointer associated with the ray to locate the requested data. When data is to be stored to the memory stack, the memory allocation circuitry stores data in a free memory block and uses a linked list to link the memory block with other memory blocks storing additional data for the ray.
A display system includes a rendering device and a display device having a plurality of individually-controllable illumination regions. The rendering device is to render a frame for display at the display device during a frame period and to determine a brightness representation for each region of a plurality of regions of the frame, each region of the frame corresponding to an illumination region of the display device. The rendering device further is to set, for each illumination region, an illumination configuration to be applied by the display device for the illumination region during at least one of the frame period and a subsequent frame period based on the brightness representation for the corresponding region of the frame, wherein the illumination configuration controls at least one of an illumination level, a duration, and a position of an illumination strobe to be implemented for the corresponding illumination region.
G09G 3/20 - Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes for presentation of an assembly of a number of characters, e.g. a page, by composing the assembly by combination of individual elements arranged in a matrix
G09G 3/32 - Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes for presentation of an assembly of a number of characters, e.g. a page, by composing the assembly by combination of individual elements arranged in a matrix using controlled light sources using electroluminescent panels semiconductive, e.g. using light-emitting diodes [LED]
G09G 3/3208 - Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes for presentation of an assembly of a number of characters, e.g. a page, by composing the assembly by combination of individual elements arranged in a matrix using controlled light sources using electroluminescent panels semiconductive, e.g. using light-emitting diodes [LED] organic, e.g. using organic light-emitting diodes [OLED]
G09G 3/34 - Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes for presentation of an assembly of a number of characters, e.g. a page, by composing the assembly by combination of individual elements arranged in a matrix by control of light from an independent source
G09G 3/36 - Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes for presentation of an assembly of a number of characters, e.g. a page, by composing the assembly by combination of individual elements arranged in a matrix by control of light from an independent source using liquid crystals
24.
INTERMINGLING MEMORY RESOURCES USING MULTIPLE ADDRESSING MODES
Systems and techniques enable intermingled use of disparate addressing modes for memory access requests directed to system memory resources. Within a processing system, a memory access request indicating a multi-bit physical memory address is received. Based on a bit pattern indicated by a first subset of bits of the multi-bit physical memory address, an addressing mode to be used for fulfilling the memory access request is determined, such as by selecting an addressing mode table entry that is keyed to the bit pattern. The memory access request is fulfilled in accordance with the determined addressing mode.
An exemplary stiffener comprises an inner perimeter that substantially surrounds at least one dimension of an integrated circuit coupled to a substrate. The inner perimeter of stiffener comprises a set of boundaries and at least one recess formed into at least one of the boundaries. In addition, the exemplary stiffener also comprises an outer perimeter that extends further outward from the integrated circuit than the inner perimeter. Various other apparatuses, systems, and methods are also disclosed.
A system on a chip (SOC) includes a critical domain including components configured to perform critical operations and a non-critical domain including components configured to perform non-critical operations. To help perform such operations, the critical domain and non-critical domain share a static random-access memory (SRAM) that includes a first subset of memory banks assigned to the critical domain and a second subset of memory banks assigned to the non-critical domain. The SOC further includes a memory scrubbing circuitry configured to sequentially check each memory bank of the SRAM for errors. To this end, the memory scrubbing circuitry is configured to check a respective memory bank for errors each time an event trigger occurs by implementing one or more error correction codes.
A computing device can include circuitry configured to generate a set of two-dimensional textures based at least in part on a set of images that depict a head of a user. The circuitry can be further configured to generate a three-dimensional avatar that depicts the head of the user with substantially even illumination by applying a blend of the set of two-dimensional textures to a head model. The computing device can also include an output device configured to facilitate presentation of the three-dimensional avatar of the user. Various other devices, systems, and methods are also disclosed.
A multipurpose wordline underdrive circuit includes a wordline driver and a pulldown network. The pulldown network includes a first current-carrying terminal electrically coupled to the wordline driver and a second current-carrying terminal electrically coupled to a control signal. The pulldown network also includes a current-regulation terminal electrically coupled to an additional control signal. Various other devices, systems, and methods are also disclosed.
A processing system includes one or more sensors configured to generate sensor data while a memory of the processing system is in a low-power state. As the sensors generate the sensor data, the sensor data is stored in a buffer. The processing system further includes a sensor data management circuitry that tracks a usage of the buffer. Based on the usage of the buffer exceeding a threshold, the sensor data management circuitry is configured to wake at least a portion of the memory from the low-power state. Once the memory exits the low-power state, the processing system transfers the sensor data from the buffer to one or more locations within the memory. After writing the sensor data to the memory, the processing system then places at least a portion of the memory back in the low-power state.
A processing system includes one or more sensors configured to generate sensor data while a memory of the processing system is in a low-power state. As the sensors generate the sensor data, the sensor data is stored in a buffer. The processing system further includes a sensor data management circuitry that tracks a usage of the buffer. Based on the usage of the buffer exceeding a threshold, the sensor data management circuitry is configured to wake at least a portion of the memory from the low-power state. Once the memory exits the low-power state, the processing system transfers the sensor data from the buffer to one or more locations within the memory. After writing the sensor data to the memory, the processing system then places at least a portion of the memory back in the low-power state.
A method can include overriding settings of an integrated circuit device by reading one or more settings from a setting record that correspond to a part number of the integrated circuit device. The method can also include performing an override of the settings of the integrated circuit device based on the one or more settings of the setting record that correspond to the part number of the integrated circuit device. Various other methods and systems are also disclosed.
A method for synchronizing trusted operating systems can include receiving, at a first interconnect circuit, an operating system management instruction for a first trusted operating system that is associated with a first trusted memory region of a memory device, the first trusted memory region being allocated to the first interconnect circuit. The method can also include synchronizing the operating system management instruction with a second interconnect circuit such that the operating system management instruction is applied to a second trusted operating system. The second trusted operating system is associated with a second trusted memory region of the memory device and the second trusted memory region is allocated to the second interconnect circuit. Various other methods and systems are also disclosed.
G06F 21/53 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by executing in a restricted environment, e.g. sandbox or secure virtual machine
G06F 9/50 - Allocation of resources, e.g. of the central processing unit [CPU]
G06F 21/54 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by adding security routines or objects to programs
33.
Processing performance adjustment using biosignals
The disclosed device can receive a biosignal and, using user input predictions based on the biosignal, pre-render a display frame. The device can also subsequently receive a user input, output the pre-rendered display frame based on the user input confirming the user input predictions and flush the pre-rendered display frame otherwise. The device can also modulate computing performance and power based on computing demands predicted from the biosignal. Various other methods, systems, and computer-readable media are also disclosed.
A fused bounding volume hierarchy, which is a combination of a base bounding volume hierarchy and one or more non-base bounding volume hierarchies, is generated. For each non-base bounding volume hierarchy, multiple subtrees in the non-base bounding volume hierarchy that include less than a threshold number of child nodes are identified. Each of these subtrees is then fused with the base bounding volume hierarchy at one of the nodes of the base bounding volume hierarchy, and an identifier of the level of detail for the non-base bounding volume hierarchy is included in the node. When displaying a scene or image, for a particular portion of the scene or image the level of detail to use is identified. The fused bounding volume hierarchy is traversed and the geometric objects in the nodes of the fused bounding volume hierarchy corresponding to the identified level of detail are displayed.
A technique for rendering is provided. The technique includes determining a level of detail for a shade space texture and a screen space; shading the shade space texture having a resolution based on the level of detail; and for a reconstruction operation, performing sampling from the shade space texture, the sampling including a high frequency attenuation of samples of the shade space texture.
A technique for performing ray tracing operations is provided. The technique includes for a ray being tested for intersection with geometry associated with a bounding volume hierarchy, traversing to a pre-filtering node that includes information for filtering out triangles of a leaf node of the bounding volume hierarchy; evaluating a quantized ray that corresponds to the ray against quantized triangles of the pre-filtering node to filter out one or more triangles of the leaf node from consideration; and testing the triangles of the leaf node that are not filtered out and not testing the triangles of the leaf node that are filtered out.
A technique for rendering is provided. The technique includes performing a visibility pass that designates portions of shade space textures visible in a scene, wherein the visibility pass generates tiles that cover shade space textures visible in the scene; performing a rate controller operation on output of the visibility pass using spatially-adaptive sampling; performing a sparse shade space shading operation on the tiles that cover the shade space textures visible in the scene based on a result of the spatially-adaptive sampling; performing a regularization operation based on an output of the sparse shade space shading operation; and performing a reconstruction operation using output from the regularization operation to produce a final scene.
A technique for rendering is provided. The technique includes performing a visibility pass that designates portions of shade space textures visible in a scene, wherein the visibility pass generates tiles that cover shade space textures visible in the scene; performing a rate controller operation on output of the visibility pass using spatiotemporal adaptive sampling; performing a shade space shading operation on the tiles that cover the shade space textures visible in the scene based on a result of the spatiotemporal adaptive sampling; performing a regularization operation based on an output of the shade space shading operation; and performing a reconstruction operation using output from the regularization operation to produce a final scene.
Devices and methods for rendering objects using ray tracing are provided which include during a build time: generating an accelerated hierarchy structure comprising data representing an approximate volume bounding a group of geometric shapes representing the objects in the scene and data representing the geometric shapes; and generating additional data used to transform rays, to be cast in the scene, from a high precision space to a low precision space; and during a render time occurring after the build time: performing ray intersection tests, using the additional data generated during the build time, for the rays in the scene; and rendering the scene based on the ray intersection tests. Because the additional data is generated prior to render time, the additional data can be used to perform the ray intersection testing more efficiently.
A computer-implemented method for enabling a feature of a semiconductor device can include receiving, by at least one processor of a semiconductor device, a command to enable a feature of the semiconductor device. The method can also include burning, by the at least one processor and in response to the command, an electronic fuse of the semiconductor device. Various other methods, systems, and computer-readable media are also disclosed.
G06F 1/08 - Clock generators with changeable or programmable clock frequency
G06F 21/64 - Protecting data integrity, e.g. using checksums, certificates or signatures
G06F 21/73 - Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information by creating or determining hardware identification, e.g. serial numbers
H01L 23/525 - Arrangements for conducting electric current within the device in operation from one component to another including external interconnections consisting of a multilayer structure of conductive and insulating layers inseparably formed on the semiconductor body with adaptable interconnections
41.
IMAGING PRIVACY FILTER FOR OBJECTS OF INTEREST IN HARDWARE FIRMWARE PLATFORM
A method and computing device is provided for filtering objects of interest of images. The computing device comprises an image capturing device and memory configured to store objects of interest. In one example, the computing device comprises a processor configured to, for a captured image, determine one or more regions of interest in the image based on the objects of interest and modify the image based on the determined regions of interest. In another example, the computing device comprises a first processor configured to determine one or more regions of interest to be modified in an image based on the one or more objects of interest and a second processor configured to convert the image to be processed by the first processor and modify the image based on regions of interest determined by the first processor. The image is displayed without the one or more objects of interest being viewable.
Methods and apparatus employ a plurality of heterogeneous compute units and a plurality of non-compute units operatively coupled to the plurality of compute units. Power management logic (PML) determines a memory bandwidth level associated with a respective workload running on each of a plurality of heterogeneous compute units on the IC, and adjusts a power level of at least one non-compute unit of a memory system on the IC from a first power level to a second power level, based on the determined memory bandwidth levels. Memory access latency is also taken into account in some examples to adjust a power level of non-compute units.
A multi-chiplet system includes a first chiplet comprising a first transceiver and a first chiplet-to-chiplet (C2C) interface module, and a second chiplet comprising programmable logic circuitry and a second C2C interface module. The first transceiver is configured to generate a clock, which is transmitted from the first C2C interface module to the second C2C interface module, through a clock transmission wire, for data transfer between the first chiplet and the second chiplet.
A technique for performing ray tracing operations is provided. The technique includes for a ray being tested for intersection with geometry associated with a bounding volume hierarchy, traversing to a pre-filtering node that includes information for filtering out triangles of a leaf node of the bounding volume hierarchy; evaluating a quantized ray that corresponds to the ray against quantized triangles of the pre-filtering node to filter out one or more triangles of the leaf node from consideration; and testing the triangles of the leaf node that are not filtered out and not testing the triangles of the leaf node that are filtered out.
A processing system includes two or more graphics cores each disposed on respective dies and configured for concurrent processing of command packets. To this end, the processing system is configured to determine two or more command partitions associated with a command packet and to assign each command partition to a graphics core. Each graphics core then executes the same command packet by only performing instructions of the command packet associated with the command partitions assigned to the graphics core. Further, after executing an instructions of the command packet based on one or more assigned partitions, each graphics core adjusts one or more counters used to synchronize the execution of the command packet across the graphics cores.
A computer-implemented method for ensuring processing unit hardware state integrity in live migration can include participating as a source, by a processing unit, in a live migration procedure by injecting, into a live migration data package containing a state of the processing unit, a signature verifying the state. The method can additionally include participating as a target, by the processing unit, in an additional live migration procedure migrating an additional live migration data package containing an additional state of an additional processing unit by performing an integrity check based on an additional signature, in the additional live migration data package, verifying the additional state. Various other methods, systems, and computer-readable media are also disclosed.
A processing system includes two or more graphics cores each disposed on respective dies and configured for concurrent processing of command packets. To this end, the processing system is configured to determine two or more command partitions associated with a command packet and to assign each command partition to a graphics core. Each graphics core then executes the same command packet by only performing instructions of the command packet associated with the command partitions assigned to the graphics core. Further, after executing an instructions of the command packet based on one or more assigned partitions, each graphics core adjusts one or more counters used to synchronize the execution of the command packet across the graphics cores.
A technique for rendering is provided. The technique includes performing a visibility operation to generate shade space visibility information and reconstruction information; performing a shade space shading operation based on the shade space visibility information generate shaded shade space textures; and performing a reconstruction operation based on the reconstruction information and the shaded shade space textures.
A technique for rendering is provided. The technique includes performing a visibility pass that designates portions of shade space textures visible in a scene, wherein the visibility pass generates tiles that cover the shade space textures visible in the scene; performing a temporal rate controller operation; performing a shade space shading operation on the tiles that cover the shade space textures visible in the scene based on a temporal shading rate output by the temporal rate controller operation, wherein only a subset of samples in the tiles that cover the shade space textures visible in the scene are shaded in the shade space shading operation; and performing a reconstruction operation using output from the shade space shading operation to produce a final scene.
An apparatus and method for efficiently performing address translation requests. An integrated circuit includes a system memory that stores address mappings, and the circuitry of one or more clients processes one or more applications and generate address translation requests. A translation lookaside buffer (TLB) stores, in multiple entries, address mappings retrieved from the system memory. Circuitry of a client processes one or more applications and generates address translation requests. The entries of the TLB stores address mappings corresponding to different address mapping types and different virtual functions to avoid searches of multiple other lower-level TLBs that are significantly larger and have larger access. In addition, the TLB is implemented with a relatively small number of entries and uses fully associative data storage arrangement to further reduce access latencies.
A processing system stores a boot image for a critical domain of a system-on-a-chip (SOC) at a bank of a static random-access memory (SRAM) that is shared by the critical domain and a non-critical domain and that is powered independently from the non-critical domain. The SOC includes a secure processor that loads the boot image to the bank of the SRAM and then blocks subsequent write access to the bank. Because the critical domain is powered independently from the non-critical domain, the bank of the SRAM retains the boot image without regard to the power state of the non-critical domain. In addition, the critical domain implements a boot process that is decoupled from a CPU at the non-critical domain, ensuring that the critical domain can initiate a re-boot sequence even if the non-critical domain is not powered.
Systems, apparatuses, and methods for implementing efficient power optimization in a computing system are disclosed. A system management unit configured to track computing activity of a computing device while processing each frame of a plurality of frames. The computing activity is tracked at least for a given period of time comprising a plurality of time slices. The system management unit further correlates a time slice associated with a given frame with a time slice associated with at least one previously processed frame from the plurality of frames, based at least in part on the tracked computing activity. The system management unit predicts a clock frequency to render the given frame, based at least in part on the correlation and renders the given frame using the predicted clock frequency.
G09G 5/36 - Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of individual graphic patterns using a bit-mapped memory
53.
DEVICES, SYSTEMS, AND METHODS FOR DYNAMICALLY CHANGING FREQUENCIES OF CLOCKS FOR THE DATA LINK LAYER WITHOUT DOWNTIME
An exemplary method for dynamically changing frequencies of clocks for the data link layer without downtime involves switching a first queue on a first end of a data link and a second queue on a second end of the data link from a pacing mode to an asynchronous mode. The exemplary method also involves modifying a frequency of a clock associated with the data link. The exemplary method further involves returning the first queue and the second queue from the asynchronous mode to the pacing mode upon modifying the frequency of the clock. Various other devices, systems, and methods are also disclosed.
G06F 1/08 - Clock generators with changeable or programmable clock frequency
G06F 1/324 - Power saving characterised by the action undertaken by lowering clock frequency
G06F 5/06 - Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising
A multi-chiplet system includes a first chiplet comprising a first transceiver and a first chiplet-to-chiplet (C2C) interface module, and a second chiplet comprising programmable logic circuitry and a second C2C interface module. The first transceiver is configured to generate a clock, which is transmitted from the first C2C interface module to the second C2C interface module, through a clock transmission wire, for data transfer between the first chiplet and the second chiplet.
A computer-implemented method for ensuring processing unit hardware state integrity in live migration can include participating as a source, by a processing unit, in a live migration procedure by injecting, into a live migration data package containing a state of the processing unit, a signature verifying the state. The method can additionally include participating as a target, by the processing unit, in an additional live migration procedure migrating an additional live migration data package containing an additional state of an additional processing unit by performing an integrity check based on an additional signature, in the additional live migration data package, verifying the additional state. Various other methods, systems, and computer-readable media are also disclosed.
A processor employs work items to manage traversal of an acceleration structure, such as a ray tracing structure, at a hardware traversal engine of a processing unit. The work items are structures having a relatively small memory footprint, where each work item is associated both with a ray and with a corresponding portion of the acceleration structure. The hardware traversal engine employs a work items to manage the traversal of the corresponding portion of the acceleration structure for the corresponding ray.
The disclosed computing device can include host circuitry configured to provide a physical function and guest circuitry configured to provide a virtual function. The host circuitry is configured to dynamically assign request identifiers for accessing at least the host circuitry in a manner that allows the request identifiers to change on a command-to-command basis instead of a time-to-time basis that uses fixed value request identifiers in time slices. Various other methods, systems, and computer-readable media are also disclosed.
An exemplary method for dynamically changing frequencies of clocks for the data link layer without downtime involves switching a first queue on a first end of a data link and a second queue on a second end of the data link from a pacing mode to an asynchronous mode. The exemplary method also involves modifying a frequency of a clock associated with the data link. The exemplary method further involves returning the first queue and the second queue from the asynchronous mode to the pacing mode upon modifying the frequency of the clock. Various other devices, systems, and methods are also disclosed.
A method for increasing capacitance density within an integrated passive device can include forming a first trench capacitor within a substrate, forming a second trench capacitor within an insulating layer overlying the substrate, and connecting the first and second trench capacitors through connection vias that extend through the insulating layer to form an integrated passive device (IPD) capacitor. A high capacitance density device can include a stacked and co-integrated architecture of two or more tiers of trench capacitors.
H01L 23/522 - Arrangements for conducting electric current within the device in operation from one component to another including external interconnections consisting of a multilayer structure of conductive and insulating layers inseparably formed on the semiconductor body
A method for increasing capacitance density within an integrated passive device can include forming a first trench capacitor within a first insulating layer overlying a substrate, forming a second trench capacitor within a second insulating layer overlying the first insulating layer, and connecting the first and second trench capacitors through connection vias that extend through the second insulating layer to form an integrated passive device (IPD) capacitor. A high capacitance density device can include a stacked and co-integrated architecture of two or more such layers.
H01L 23/522 - Arrangements for conducting electric current within the device in operation from one component to another including external interconnections consisting of a multilayer structure of conductive and insulating layers inseparably formed on the semiconductor body
61.
AREA-OPTIMIZED CELLS FOR LOW POWER TECHNOLOGY NODES
Embodiments herein describe identifying voltage potentials in separate cells that can be combined so that a dummy gate between or in the cells can be removed. For example, some combinational logic cells such as XOR gates, XNOR gates, and half-adders are formed from coupling two combinational cells in sequence. Typically, a dummy gate is placed between those cells since they have different voltage potentials. However, if the cells have the same voltage potentials, then the dummy gate can be removed and the cells can overlap by sharing a net. This can reduce the overall size of the cell.
H01L 27/02 - Devices consisting of a plurality of semiconductor or other solid-state components formed in or on a common substrate including integrated passive circuit elements with at least one potential-jump barrier or surface barrier
Examples herein describe techniques for directed refresh management (DRFM) address capture in high-bandwidth memory (HBM). Some examples are based on an activate command that includes a DRFM flag, including examples in which the activate command is received and processed while a bank is open, examples in which an address of a target row is captured without opening the corresponding bank, and examples in which the address of a target row is captured based further on a mode registers. Other examples are based on a precharge command that includes a DRFM flag.
Examples herein describe techniques for directed refresh management (DRFM) address capture in high-bandwidth memory (HBM). Some examples are based on an activate command that includes a DRFM flag, including examples in which the activate command is received and processed while a bank is open, examples in which an address of a target row is captured without opening the corresponding bank, and examples in which the address of a target row is captured based further on a mode registers. Other examples are based on a precharge command that includes a DRFM flag.
A device includes a plurality of processing elements (PEs). A symmetric memory is allocated in each of the plurality of PEs. The device includes a switch connected to the plurality of PEs. The switch is to: receive, from a first processing element (PE) of the plurality of PEs, a message that includes a buffer offset, compute, based on the buffer offset, a first memory address of a first buffer in a first symmetric memory of the first PE and a second memory address of a second buffer in a second symmetric memory of a second PE of the plurality of PEs, and initiate, based on the first memory address and the second memory address, a direct memory access operation to access the first buffer and the second buffer.
G06F 13/16 - Handling requests for interconnection or transfer for access to memory bus
G06F 13/28 - Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access, cycle steal
A device includes a plurality of processing elements (PEs). A symmetric memory is allocated in each of the plurality of PEs. The device includes a switch connected to the plurality of PEs. The switch is to: receive, from a first processing element (PE) of the plurality of PEs, a message that includes a buffer offset, compute, based on the buffer offset, a first memory address of a first buffer in a first symmetric memory of the first PE and a second memory address of a second buffer in a second symmetric memory of a second PE of the plurality of PEs, and initiate, based on the first memory address and the second memory address, a direct memory access operation to access the first buffer and the second buffer.
Techniques for performing memory operations are disclosed herein. The techniques include obtaining statistics for operation of a device, the statistics including either or both of performance statistics and memory access statistics; generating a plurality of visualizations of the statistics in one of an overlay mode or a scene annotation mode; and displaying the plurality of visualizations.
The disclosed device includes a cache that stores sets of settings for memory states, and registers that store a current set of settings for a memory. The device also includes a control circuit that can read, from the cache in response to the memory transitioning to a new memory state, a new set of settings corresponding to the new memory state, and write, to the plurality of registers, the new set of settings. Various other methods, systems, and computer-readable media are also disclosed.
A computer-implemented method for electromagnetic imaging can include capturing, by at least one processor, electromagnetic image data of a sample. The method can also include converting, by the at least one processor, the electromagnetic image data to a multi-layer rasterized image. The method can further include comparing, by the at least one processor, the multi-layer rasterized image to a design file. Various other methods, systems, and computer-readable media are also disclosed.
A virtual function (VF) of a virtual machine is enabled to directly reset a processing portion of a processing unit. The VF initiates the reset of the processing portion directly and a host driver associated with the processing unit is bypassed during the reset process. By allowing for a direct reset of the processing portion, a processing system reduces the overhead associated with the reset process, enhances system security, and improves overall VM and hardware isolation at the processing system.
G06F 9/455 - EmulationInterpretationSoftware simulation, e.g. virtualisation or emulation of application or operating system execution engines
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
A graphics processing unit (GPU) of a processing system transmits pixel data for a frame to a display in a compressed burst, so that the pixel data is communicated at a rate that is higher than the rate at which the display scans out the pixel data to refresh the frame at a display panel. By transmitting pixel data for the frame in a compressed burst, the GPU shortens the time spent transmitting the pixel data and extends the time before the next frame of pixel data is to be transmitted. During the extended time before the next frame of pixel data is to be transmitted, the GPU saves power by placing portions of the processing system in a reduced power mode.
Techniques are described for avoiding reinitialization of attributes for successive redundant pixel waves (301, 302, 303) when rendering graphical primitives (105, 110). Attributes of a first primitive are read from a parameter cache (240) to initialize a first pixel wave. Attributes are stored in blocks of a local data store (250) associated with a compute unit (245) rendering the pixel wave. A tracking array is maintained to indicate the local data store blocks storing the attributes. When a second pixel wave associated with the first primitive is detected, reading of the attributes is omitted based on the tracking array.
G09G 5/36 - Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of individual graphic patterns using a bit-mapped memory
Techniques are described for avoiding reinitialization of attributes for successive redundant pixel waves when rendering graphical primitives. Attributes of a first primitive are read from a parameter cache to initialize a first pixel wave. Attributes are stored in blocks of a local data store associated with a compute unit rendering the pixel wave. A tracking array is maintained to indicate the local data store blocks storing the attributes. When a second pixel wave associated with the first primitive is detected, reading of the attributes is omitted based on the tracking array.
An apparatus and method for efficiently providing stability when updated graphics drivers are used in different hardware configurations. A client device includes one or more processors or another type of an integrated circuit that receives a given graphics driver package. When executed by the client device, the operating system stores the components of the authenticated graphics driver package in a protected system folder as part of a staging step. When the client device executes an application, the client device selects between a user mode driver (UMD) of the given graphics driver package and UMDs of the previously staged graphics driver packages. This selection is based on history information collected during past execution of the application. The client device executes the application using installations of the selected UMD and the kernel mode driver (KMD) of the given graphics driver package.
An apparatus translates transaction requests using a bus protocol translation lookup table (LUT) that comprises bus protocol translation data. A bus protocol translation controller generates the outgoing translated transaction request by translating the incoming transaction request using the bus protocol translation data from the bus protocol translation LUT. The controller translates a received response from the target unit to a response in a first bus protocol for a corresponding requesting unit. Associated methods are also presented. In some examples, the bus protocol translation data corresponds to each of a plurality of requesting units for translating between an incoming transaction request sent via a first bus protocol to an outgoing translated transaction request sent via a second bus protocol for the at least one target unit.
To leverage an amount of unused bandwidth at a hardware encoder to generate motion estimation data, a processing unit includes a hardware encoder configured to perform a first encoding job including encoder sessions to encode a captured frame, determine motion estimation data for a rendered frame, and encode the rendered frame. Further, the processing unit includes a pre-processing circuitry configured to determine a set of motion estimation parameters based on an encoder delay associated with the performance of the first encoding job by the hardware encoder. The hardware encoder is then configured to perform a second encoding job using the determined set of motion estimation parameters.
Systems and methods for crowdsourcing cloud application execution are described. An application system receives, from a client device, a first request to initiate an application session. The application system identifies a host device to fulfill the first request. The application system then initiates execution of the application session on the host device and generates, for the client device, a plurality of controls to control the application session executing on the host device. The host device is incentivized for each application session hosted.
A computer-implemented method for enabling debugging can include receiving, at a peripheral device connected through an expansion socket to a base CPU platform, a scan dump instruction from a network computing device connected to the base CPU platform across a network connection and executing, by a System-on-Chip at the peripheral device in response to the scan dump instruction, a debugging procedure. The debugging procedure can include capturing a snapshot of memory of the peripheral device and transmitting the snapshot to the network computing device through memory addresses that have been assigned to memory-mapped input/output. Various other methods, systems, and computer-readable media are also disclosed.
Techniques for implementing adaptive interpolation filter search in video encoding are disclosed. Conventional interpolation filter searching is simplified to implement adaptive interpolation filter search by selecting one or more first filter types to determine one or more initial interpolation costs. After identifying an MV that produces a target interpolation error for one of the one or more first filter types, one or more secondary interpolation costs are calculated for one or more additional filter types based on the identified MV, and one of the one or more first filter type and one or more additional filter types that results in minimal interpolation error is selected as the interpolation filter type.
H04N 19/117 - Filters, e.g. for pre-processing or post-processing
H04N 19/105 - Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
H04N 19/139 - Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
H04N 19/156 - Availability of hardware or computational resources, e.g. encoding based on power-saving criteria
H04N 19/172 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
The disclosed computer-implemented method for multi-lane data bus inversion can include receiving data for transmission via a plurality of data lanes, each data lane corresponding to one of a plurality of inversion bits, and, for each data lane within the plurality of data lanes, applying the corresponding inversion bit to each bit within the data lane. Various other methods, apparatuses, and systems are also disclosed.
A method for hardware management of DMA transfer commands includes accessing, by a first DMA engine, a DMA transfer command and determining a first portion of a data transfer requested by the DMA transfer command. Transfer of a first portion of the data transfer by the first DMA engine is initiated based at least in part on the DMA transfer command. Similarly, a second portion of the data transfer by a second DMA engine is initiated based at least in part on the DMA transfer command. After transferring the first portion and the second portion of the data transfer, an indication is generated that signals completion of the data transfer requested by the DMA transfer command.
G06F 3/06 - Digital input from, or digital output to, record carriers
G06F 13/28 - Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access, cycle steal
A technique for performing video operations is provided. The technique includes characterizing a frame as a flash frame; setting the flash frame as a non-intra frame; prohibiting encoding of frames other than the flash frame with reference to the flash frame; and applying a positive quantization parameter (“QP”) offset to the flash frame.
A system is provided that includes a computing device operable to render video content for display on a display device and to periodically refresh that display device. The video content includes at least one application window. A desktop compositor is operable to wake and execute commands to compose video frames that are composited surfaces that include the at least one application window and to initiate a buffer flip to deliver the video frames to the display device. A high resolution timer is operable to cause the desktop compositor to wake and execute the commands in multiple instances between display refreshes.
G09G 5/36 - Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of individual graphic patterns using a bit-mapped memory
G09G 5/399 - Control of the bit-mapped memory using two or more bit-mapped memories, the operations of which are switched in time, e.g. ping-pong buffers
83.
SYSTEMS AND METHODS FOR IMPLEMENTING SECURE PERFORMANCE COUNTERS FOR GUEST VIRTUAL MACHINES
The disclosed computing device can include guest circuitry configured to provide a virtual function, authorization circuitry configured to authorize host circuitry to access an architecture performance counter for the virtual function, and security circuitry configured to perform a security action based on the authorization. Various other methods, systems, and computer-readable media are also disclosed.
An apparatus and method for efficiently performance among replicated functional blocks of an integrated circuit despite different circuit behavior amongst the functional blocks due to manufacturing variations. An integrated circuit includes multiple replicated functional blocks, each being a semiconductor die with an instantiated copy of particular integrated circuitry for processing a work block. One or more of the functional blocks of the integrated circuit belong in a different performance category or bin than other functional blocks due to manufacturing variations across semiconductor dies. A scheduler assigns work blocks to the functional blocks based on whether a functional block is from a high-performance bin and whether a workload of a work block is a computation intensive workload. The scheduler assigns work blocks work blocks marked as having a memory access intensive workload to functional blocks from a lower performance bin.
A technique for performing ray tracing operations is provided. The technique includes, traversing through a bounding volume hierarchy for a ray to arrive at a well-fit bounding volume that is associated with first node, wherein the first node is one of a traversal node or a procedural node, and wherein the well-fit bounding volume comprises geometry other than a single axis-aligned bounding box for the first node; evaluating the ray for intersection with the well-fit bounding volume; determining whether to execute a first shader program associated with the first node based on the evaluating, wherein the first shader program comprises a traversal shader program or a procedural shader program; and executing or not executing the first shader program based on the determining.
A technique for performing ray tracing operations is provided. The technique includes detecting intersection of a ray with a split bounding volume of an instance of a bounding volume hierarchy; determining whether the split bounding volume meets an instance traversal limiting criterion; and continuing BVH traversal based on the determining.
A cache controller of a processing system implementing a non-uniform memory architecture (NUMA) adjusts a cache replacement priority of local and non-local data stored at a cache based on a cache replacement policy. Local data is data that is accessed by the cache via a local memory channel and non-local data is data that is accessed by the cache via a non-local memory channel. The cache controller assigns priorities to local and non-local data stored at the cache based on a cache replacement policy and selects data for replacement at the cache based, at least in part, on the assigned priorities.
G06F 12/08 - Addressing or allocationRelocation in hierarchically structured memory systems, e.g. virtual memory systems
G06F 12/0811 - Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
G06F 12/126 - Replacement control using replacement algorithms with special data handling, e.g. priority of data or instructions, handling errors or pinning
88.
INTERMEDIATE CACHE MANAGEMENT FOR NON-UNIFORM MEMORY ARCHITECTURE
A cache controller (104) of a processing system (100) implementing a non-uniform memory architecture (NUMA) adjusts a cache replacement priority of local and non-local data stored at a cache based on a cache replacement policy (112). Local data (326) is data that is accessed by the cache (102) via a local memory channel (106) and non-local data (324) is data that is accessed by the cache (102) via a non-local memory channel (116). The cache controller assigns priorities to local and non-local data stored at the cache based on a cache replacement policy and selects data for replacement at the cache based, at least in part, on the assigned priorities.
An apparatus and method for efficiently creating layout of standard cells to improve floor planning of a chip. In various implementations, an integrated circuit uses multiple standard cells with an absence of diffusion breaks at cell boundaries. The standard cells use vertically stacked non-planer transistors. Multiple transistors are formed with an active region having a length between a source region and a drain region of a single transistor. Therefore, the active regions of these transistors are not formed across multiple gate terminals. By having active regions of these transistors formed across a single gate terminal of a single transistor, there is sufficient spacing to provide electrical isolation between two active regions of the two adjoining standard cells. This is true even when the two adjoining standard cells share a source/drain region at the cell boundaries. Accordingly, forming diffusion breaks at the edges of these standard cells can be skipped.
Task scheduling based on component margins is described. In accordance with the described techniques, a scheduler of an operating system accesses a margin table when a request to perform tasks is received. The scheduler schedules tasks on various components of a system based on margins of those components. When a request to perform a task is received, for example, the scheduler accesses the margin table and selects a component to perform the task based on the margin information included in the margin table as well as based on the task, such as whether the task benefits more from being performed fast or being performed accurately. The scheduler then schedules the task using the selected component.
An apparatus and method for efficiently managing memory bandwidth within a communication fabric. A computing system includes multiple clients, a display controller, and a communication fabric that transfers data between the multiple clients, the display controller, and a memory subsystem. A control circuit with power management circuitry determines that one or more conditions are satisfied for changing a power-performance state (P-state) of the memory subsystem. The control circuit asserts indications on a sideband interface specifying to the communication fabric that the display controller is to have an increased bandwidth of data transfer between the display controller and the memory subsystem. Using the increased bandwidth provided by the communication fabric, the display controller prefetches display data from a frame buffer of the memory subsystem prior to the P-state change. Afterward, the memory subsystem performs the P-state change and the corresponding training of the memory interface.
A distribution system receives data access requests associated with at least two virtual channels over at least two physical data communication channels. The requests are distributed into at least two sequencers of the distribution system based on a virtual channel associated with each request. The distribution system includes at least two memory modules—one for each of the at least two physical data communication channels. Requests stored in the sequencers are written to the memory modules according to a pattern that assigns sequential requests associated with a common virtual channel to a sequential ordering of the memory modules. The requests are then granted by an arbiter of the distribution system by retrieving requests associated with a common virtual channel from the memory modules using the sequential ordering of the memory modules.
An apparatus and method for efficiently creating layout of standard cells to improve floor planning of a chip. In various implementations, an integrated circuit uses multiple standard cells with an absence of diffusion breaks at cell boundaries. The standard cells use vertically stacked non-planer transistors. Multiple transistors are formed with an active region having a length between a source region and a drain region of a single transistor. Therefore, the active regions of these transistors are not formed across multiple gate terminals. By having active regions of these transistors formed across a single gate terminal of a single transistor, there is sufficient spacing to provide electrical isolation between two active regions of the two adjoining standard cells. This is true even when the two adjoining standard cells share a source/drain region at the cell boundaries. Accordingly, forming diffusion breaks at the edges of these standard cells can be skipped.
H01L 27/02 - Devices consisting of a plurality of semiconductor or other solid-state components formed in or on a common substrate including integrated passive circuit elements with at least one potential-jump barrier or surface barrier
H01L 21/822 - Manufacture or treatment of devices consisting of a plurality of solid state components or integrated circuits formed in, or on, a common substrate with subsequent division of the substrate into plural individual devices to produce devices, e.g. integrated circuits, each consisting of a plurality of components the substrate being a semiconductor, using silicon technology
H01L 27/06 - Devices consisting of a plurality of semiconductor or other solid-state components formed in or on a common substrate including integrated passive circuit elements with at least one potential-jump barrier or surface barrier the substrate being a semiconductor body including a plurality of individual components in a non-repetitive configuration
H01L 29/423 - Electrodes characterised by their shape, relative sizes or dispositions not carrying the current to be rectified, amplified or switched
A distribution system receives data access requests associated with at least two virtual channels over at least two physical data communication channels. The requests are distributed into at least two sequencers of the distribution system based on a virtual channel associated with each request. The distribution system includes at least two memory modules – one for each of the at least two physical data communication channels. Requests stored in the sequencers are written to the memory modules according to a pattern that assigns sequential requests associated with a common virtual channel to a sequential ordering of the memory modules. The requests are then granted by an arbiter of the distribution system by retrieving requests associated with a common virtual channel from the memory modules using the sequential ordering of the memory modules.
H04L 67/60 - Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
95.
SHADER COMPILER AND SHADER PROGRAM CENTRIC MITIGATION OF CURRENT TRANSIENTS THAT CAUSE VOLTAGE TRANSIENTS ON A POWER RAIL
An apparatus and method for efficiently managing voltage transients on a power rail caused by current transients of an integrated circuit. In various implementations, a computing system includes a processing circuit that executes instructions of a compiler that includes a current transients mitigator. When executing the instructions of the current transients mitigator, the processing circuit generates an estimate of a time rate of current flow being drawn from or returned to the power rail based on instruction types of a first sequence of instructions. Based on the estimate exceeds a threshold, the processing circuit replaces the first sequence of instructions with a second sequence of instructions that provides a smaller estimate. The second sequence is issued to the one or more compute circuits that utilize the power rail, rather than the first sequence.
An electronic device includes a package substrate having a top surface and a bottom surface. An integrated circuit (“IC”) die is disposed on the top surface of the package substrate. The electronic device also includes a stiffener having a ring body and a plurality of support members. The ring body is secured to the top surface of the package substrate and circumscribes the IC die. The plurality of support members extend from the ring body to below the bottom surface of the package substrate.
H01L 25/065 - Assemblies consisting of a plurality of individual semiconductor or other solid-state devices all the devices being of a type provided for in a single subclass of subclasses , , , , or , e.g. assemblies of rectifier diodes the devices not having separate containers the devices being of a type provided for in group
A memory system includes a memory controller, a physical layer (PHY), and a memory (e.g., DRAM). Data is written to and read from the memory in different manners for different memory technologies, such as using different signals or signal timings for different memory technologies. Various runtime services specific to the memory technology are performed by the PHY rather than the memory controller. Examples of such runtime services include performing a training routine to train or re-train an interface between the PHY and the memory, performing a power management routine (e.g., to put the main memory in a self-refresh mode), and so forth.
G11C 11/406 - Management or control of the refreshing or charge-regeneration cycles
G11C 11/4074 - Power supply or voltage generation circuits, e.g. bias voltage generators, substrate voltage generators, back-up power, power control circuits
G11C 11/4093 - Input/output [I/O] data interface arrangements, e.g. data buffers
98.
System agnostic autonomous system state management
A computing device is provided which comprises memory and a processor in communication with the memory. The processor is configured to autonomously acquire input parameter values, comprising one of monitored device input parameter values from a component of the computing device and monitored user input parameter values. The processor is also configured to select, from a plurality of modes of operation, a mode of operation comprising parameter settings which are determined based on the acquired input parameter values, each of the plurality of modes of operation comprising different parameter settings configured to control the computing device to operate at a different level of performance. The processor is also configured to control operation of the computing device by tuning the parameter settings of the computing device according to the selected mode of operation comprising the determined parameter settings.
Apparatuses, systems and methods for routing requests and responses targeting a shared resource. A queue in a communication fabric is located in a path between the requesters and a shared resource. In some embodiments, the shared resource is a shared address translation cache stored in an endpoint. The physical channel between the queue and the shared resource supports multiple virtual channels. The queue assigns at least one entry to each virtual channel of a group of virtual channels where the group includes a virtual channel for each address translation request type from a single requester of the multiple requesters. When the at least one entry for a given requester is de-allocated, the queue allocates this entry only with requests from the assigned virtual channel even if the empty entry is the only available entry of the queue.
Systems, apparatuses, and methods for implementing duplicated registers for access by initiators across multiple semiconductor dies are disclosed. A system includes multiple initiators on multiple semiconductor dies of a chiplet processor. One of the semiconductor dies is the master die, and this master die has copies of registers which can be accessed by the multiple initiators on the multiple semiconductor dies. When a given initiator on a given secondary die generates a register access, the register access is routed to the master die and a particular duplicate copy of the register maintained for the given secondary die. From the point of view of software, the multiple semiconductor dies appear as a single die, and the multiple initiators appear as a single initiator. Multiple types of registers can be maintained by the master die, with a flush register being one of the register types.