A method can include overriding settings of an integrated circuit device by reading one or more settings from a setting record that correspond to a part number of the integrated circuit device. The method can also include performing an override of the settings of the integrated circuit device based on the one or more settings of the setting record that correspond to the part number of the integrated circuit device. Various other methods and systems are also disclosed.
A method for synchronizing trusted operating systems can include receiving, at a first interconnect circuit, an operating system management instruction for a first trusted operating system that is associated with a first trusted memory region of a memory device, the first trusted memory region being allocated to the first interconnect circuit. The method can also include synchronizing the operating system management instruction with a second interconnect circuit such that the operating system management instruction is applied to a second trusted operating system. The second trusted operating system is associated with a second trusted memory region of the memory device and the second trusted memory region is allocated to the second interconnect circuit. Various other methods and systems are also disclosed.
G06F 21/53 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by executing in a restricted environment, e.g. sandbox or secure virtual machine
G06F 9/50 - Allocation of resources, e.g. of the central processing unit [CPU]
G06F 21/54 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by adding security routines or objects to programs
3.
Processing performance adjustment using biosignals
The disclosed device can receive a biosignal and, using user input predictions based on the biosignal, pre-render a display frame. The device can also subsequently receive a user input, output the pre-rendered display frame based on the user input confirming the user input predictions and flush the pre-rendered display frame otherwise. The device can also modulate computing performance and power based on computing demands predicted from the biosignal. Various other methods, systems, and computer-readable media are also disclosed.
A technique for rendering is provided. The technique includes determining a level of detail for a shade space texture and a screen space; shading the shade space texture having a resolution based on the level of detail; and for a reconstruction operation, performing sampling from the shade space texture, the sampling including a high frequency attenuation of samples of the shade space texture.
A technique for performing ray tracing operations is provided. The technique includes for a ray being tested for intersection with geometry associated with a bounding volume hierarchy, traversing to a pre-filtering node that includes information for filtering out triangles of a leaf node of the bounding volume hierarchy; evaluating a quantized ray that corresponds to the ray against quantized triangles of the pre-filtering node to filter out one or more triangles of the leaf node from consideration; and testing the triangles of the leaf node that are not filtered out and not testing the triangles of the leaf node that are filtered out.
A technique for rendering is provided. The technique includes performing a visibility pass that designates portions of shade space textures visible in a scene, wherein the visibility pass generates tiles that cover shade space textures visible in the scene; performing a rate controller operation on output of the visibility pass using spatially-adaptive sampling; performing a sparse shade space shading operation on the tiles that cover the shade space textures visible in the scene based on a result of the spatially-adaptive sampling; performing a regularization operation based on an output of the sparse shade space shading operation; and performing a reconstruction operation using output from the regularization operation to produce a final scene.
A technique for rendering is provided. The technique includes performing a visibility pass that designates portions of shade space textures visible in a scene, wherein the visibility pass generates tiles that cover shade space textures visible in the scene; performing a rate controller operation on output of the visibility pass using spatiotemporal adaptive sampling; performing a shade space shading operation on the tiles that cover the shade space textures visible in the scene based on a result of the spatiotemporal adaptive sampling; performing a regularization operation based on an output of the shade space shading operation; and performing a reconstruction operation using output from the regularization operation to produce a final scene.
Devices and methods for rendering objects using ray tracing are provided which include during a build time: generating an accelerated hierarchy structure comprising data representing an approximate volume bounding a group of geometric shapes representing the objects in the scene and data representing the geometric shapes; and generating additional data used to transform rays, to be cast in the scene, from a high precision space to a low precision space; and during a render time occurring after the build time: performing ray intersection tests, using the additional data generated during the build time, for the rays in the scene; and rendering the scene based on the ray intersection tests. Because the additional data is generated prior to render time, the additional data can be used to perform the ray intersection testing more efficiently.
A computer-implemented method for enabling a feature of a semiconductor device can include receiving, by at least one processor of a semiconductor device, a command to enable a feature of the semiconductor device. The method can also include burning, by the at least one processor and in response to the command, an electronic fuse of the semiconductor device. Various other methods, systems, and computer-readable media are also disclosed.
G06F 1/08 - Clock generators with changeable or programmable clock frequency
G06F 21/64 - Protecting data integrity, e.g. using checksums, certificates or signatures
G06F 21/73 - Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information by creating or determining hardware identification, e.g. serial numbers
H01L 23/525 - Arrangements for conducting electric current within the device in operation from one component to another including external interconnections consisting of a multilayer structure of conductive and insulating layers inseparably formed on the semiconductor body with adaptable interconnections
10.
Fused Bounding Volume Hierarchy for Multiple Levels of Detail
A fused bounding volume hierarchy, which is a combination of a base bounding volume hierarchy and one or more non-base bounding volume hierarchies, is generated. For each non-base bounding volume hierarchy, multiple subtrees in the non-base bounding volume hierarchy that include less than a threshold number of child nodes are identified. Each of these subtrees is then fused with the base bounding volume hierarchy at one of the nodes of the base bounding volume hierarchy, and an identifier of the level of detail for the non-base bounding volume hierarchy is included in the node. When displaying a scene or image, for a particular portion of the scene or image the level of detail to use is identified. The fused bounding volume hierarchy is traversed and the geometric objects in the nodes of the fused bounding volume hierarchy corresponding to the identified level of detail are displayed.
A multi-chiplet system includes a first chiplet comprising a first transceiver and a first chiplet-to-chiplet (C2C) interface module, and a second chiplet comprising programmable logic circuitry and a second C2C interface module. The first transceiver is configured to generate a clock, which is transmitted from the first C2C interface module to the second C2C interface module, through a clock transmission wire, for data transfer between the first chiplet and the second chiplet.
A method and computing device is provided for filtering objects of interest of images. The computing device comprises an image capturing device and memory configured to store objects of interest. In one example, the computing device comprises a processor configured to, for a captured image, determine one or more regions of interest in the image based on the objects of interest and modify the image based on the determined regions of interest. In another example, the computing device comprises a first processor configured to determine one or more regions of interest to be modified in an image based on the one or more objects of interest and a second processor configured to convert the image to be processed by the first processor and modify the image based on regions of interest determined by the first processor. The image is displayed without the one or more objects of interest being viewable.
Methods and apparatus employ a plurality of heterogeneous compute units and a plurality of non-compute units operatively coupled to the plurality of compute units. Power management logic (PML) determines a memory bandwidth level associated with a respective workload running on each of a plurality of heterogeneous compute units on the IC, and adjusts a power level of at least one non-compute unit of a memory system on the IC from a first power level to a second power level, based on the determined memory bandwidth levels. Memory access latency is also taken into account in some examples to adjust a power level of non-compute units.
A technique for performing ray tracing operations is provided. The technique includes for a ray being tested for intersection with geometry associated with a bounding volume hierarchy, traversing to a pre-filtering node that includes information for filtering out triangles of a leaf node of the bounding volume hierarchy; evaluating a quantized ray that corresponds to the ray against quantized triangles of the pre-filtering node to filter out one or more triangles of the leaf node from consideration; and testing the triangles of the leaf node that are not filtered out and not testing the triangles of the leaf node that are filtered out.
A processing system includes two or more graphics cores each disposed on respective dies and configured for concurrent processing of command packets. To this end, the processing system is configured to determine two or more command partitions associated with a command packet and to assign each command partition to a graphics core. Each graphics core then executes the same command packet by only performing instructions of the command packet associated with the command partitions assigned to the graphics core. Further, after executing an instructions of the command packet based on one or more assigned partitions, each graphics core adjusts one or more counters used to synchronize the execution of the command packet across the graphics cores.
A computer-implemented method for ensuring processing unit hardware state integrity in live migration can include participating as a source, by a processing unit, in a live migration procedure by injecting, into a live migration data package containing a state of the processing unit, a signature verifying the state. The method can additionally include participating as a target, by the processing unit, in an additional live migration procedure migrating an additional live migration data package containing an additional state of an additional processing unit by performing an integrity check based on an additional signature, in the additional live migration data package, verifying the additional state. Various other methods, systems, and computer-readable media are also disclosed.
A technique for rendering is provided. The technique includes performing a visibility operation to generate shade space visibility information and reconstruction information; performing a shade space shading operation based on the shade space visibility information generate shaded shade space textures; and performing a reconstruction operation based on the reconstruction information and the shaded shade space textures.
A technique for rendering is provided. The technique includes performing a visibility pass that designates portions of shade space textures visible in a scene, wherein the visibility pass generates tiles that cover the shade space textures visible in the scene; performing a temporal rate controller operation; performing a shade space shading operation on the tiles that cover the shade space textures visible in the scene based on a temporal shading rate output by the temporal rate controller operation, wherein only a subset of samples in the tiles that cover the shade space textures visible in the scene are shaded in the shade space shading operation; and performing a reconstruction operation using output from the shade space shading operation to produce a final scene.
An apparatus and method for efficiently performing address translation requests. An integrated circuit includes a system memory that stores address mappings, and the circuitry of one or more clients processes one or more applications and generate address translation requests. A translation lookaside buffer (TLB) stores, in multiple entries, address mappings retrieved from the system memory. Circuitry of a client processes one or more applications and generates address translation requests. The entries of the TLB stores address mappings corresponding to different address mapping types and different virtual functions to avoid searches of multiple other lower-level TLBs that are significantly larger and have larger access. In addition, the TLB is implemented with a relatively small number of entries and uses fully associative data storage arrangement to further reduce access latencies.
A processing system includes two or more graphics cores each disposed on respective dies and configured for concurrent processing of command packets. To this end, the processing system is configured to determine two or more command partitions associated with a command packet and to assign each command partition to a graphics core. Each graphics core then executes the same command packet by only performing instructions of the command packet associated with the command partitions assigned to the graphics core. Further, after executing an instructions of the command packet based on one or more assigned partitions, each graphics core adjusts one or more counters used to synchronize the execution of the command packet across the graphics cores.
A processing system stores a boot image for a critical domain of a system-on-a-chip (SOC) at a bank of a static random-access memory (SRAM) that is shared by the critical domain and a non-critical domain and that is powered independently from the non-critical domain. The SOC includes a secure processor that loads the boot image to the bank of the SRAM and then blocks subsequent write access to the bank. Because the critical domain is powered independently from the non-critical domain, the bank of the SRAM retains the boot image without regard to the power state of the non-critical domain. In addition, the critical domain implements a boot process that is decoupled from a CPU at the non-critical domain, ensuring that the critical domain can initiate a re-boot sequence even if the non-critical domain is not powered.
Systems, apparatuses, and methods for implementing efficient power optimization in a computing system are disclosed. A system management unit configured to track computing activity of a computing device while processing each frame of a plurality of frames. The computing activity is tracked at least for a given period of time comprising a plurality of time slices. The system management unit further correlates a time slice associated with a given frame with a time slice associated with at least one previously processed frame from the plurality of frames, based at least in part on the tracked computing activity. The system management unit predicts a clock frequency to render the given frame, based at least in part on the correlation and renders the given frame using the predicted clock frequency.
G09G 5/36 - Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of individual graphic patterns using a bit-mapped memory
23.
DEVICES, SYSTEMS, AND METHODS FOR DYNAMICALLY CHANGING FREQUENCIES OF CLOCKS FOR THE DATA LINK LAYER WITHOUT DOWNTIME
An exemplary method for dynamically changing frequencies of clocks for the data link layer without downtime involves switching a first queue on a first end of a data link and a second queue on a second end of the data link from a pacing mode to an asynchronous mode. The exemplary method also involves modifying a frequency of a clock associated with the data link. The exemplary method further involves returning the first queue and the second queue from the asynchronous mode to the pacing mode upon modifying the frequency of the clock. Various other devices, systems, and methods are also disclosed.
G06F 1/08 - Clock generators with changeable or programmable clock frequency
G06F 1/324 - Power saving characterised by the action undertaken by lowering clock frequency
G06F 5/06 - Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising
A multi-chiplet system includes a first chiplet comprising a first transceiver and a first chiplet-to-chiplet (C2C) interface module, and a second chiplet comprising programmable logic circuitry and a second C2C interface module. The first transceiver is configured to generate a clock, which is transmitted from the first C2C interface module to the second C2C interface module, through a clock transmission wire, for data transfer between the first chiplet and the second chiplet.
A computer-implemented method for ensuring processing unit hardware state integrity in live migration can include participating as a source, by a processing unit, in a live migration procedure by injecting, into a live migration data package containing a state of the processing unit, a signature verifying the state. The method can additionally include participating as a target, by the processing unit, in an additional live migration procedure migrating an additional live migration data package containing an additional state of an additional processing unit by performing an integrity check based on an additional signature, in the additional live migration data package, verifying the additional state. Various other methods, systems, and computer-readable media are also disclosed.
The disclosed computing device can include host circuitry configured to provide a physical function and guest circuitry configured to provide a virtual function. The host circuitry is configured to dynamically assign request identifiers for accessing at least the host circuitry in a manner that allows the request identifiers to change on a command-to-command basis instead of a time-to-time basis that uses fixed value request identifiers in time slices. Various other methods, systems, and computer-readable media are also disclosed.
A processor employs work items to manage traversal of an acceleration structure, such as a ray tracing structure, at a hardware traversal engine of a processing unit. The work items are structures having a relatively small memory footprint, where each work item is associated both with a ray and with a corresponding portion of the acceleration structure. The hardware traversal engine employs a work items to manage the traversal of the corresponding portion of the acceleration structure for the corresponding ray.
An exemplary method for dynamically changing frequencies of clocks for the data link layer without downtime involves switching a first queue on a first end of a data link and a second queue on a second end of the data link from a pacing mode to an asynchronous mode. The exemplary method also involves modifying a frequency of a clock associated with the data link. The exemplary method further involves returning the first queue and the second queue from the asynchronous mode to the pacing mode upon modifying the frequency of the clock. Various other devices, systems, and methods are also disclosed.
A method for increasing capacitance density within an integrated passive device can include forming a first trench capacitor within a substrate, forming a second trench capacitor within an insulating layer overlying the substrate, and connecting the first and second trench capacitors through connection vias that extend through the insulating layer to form an integrated passive device (IPD) capacitor. A high capacitance density device can include a stacked and co-integrated architecture of two or more tiers of trench capacitors.
H01L 23/522 - Arrangements for conducting electric current within the device in operation from one component to another including external interconnections consisting of a multilayer structure of conductive and insulating layers inseparably formed on the semiconductor body
A method for increasing capacitance density within an integrated passive device can include forming a first trench capacitor within a first insulating layer overlying a substrate, forming a second trench capacitor within a second insulating layer overlying the first insulating layer, and connecting the first and second trench capacitors through connection vias that extend through the second insulating layer to form an integrated passive device (IPD) capacitor. A high capacitance density device can include a stacked and co-integrated architecture of two or more such layers.
H01L 23/522 - Arrangements for conducting electric current within the device in operation from one component to another including external interconnections consisting of a multilayer structure of conductive and insulating layers inseparably formed on the semiconductor body
31.
AREA-OPTIMIZED CELLS FOR LOW POWER TECHNOLOGY NODES
Embodiments herein describe identifying voltage potentials in separate cells that can be combined so that a dummy gate between or in the cells can be removed. For example, some combinational logic cells such as XOR gates, XNOR gates, and half-adders are formed from coupling two combinational cells in sequence. Typically, a dummy gate is placed between those cells since they have different voltage potentials. However, if the cells have the same voltage potentials, then the dummy gate can be removed and the cells can overlap by sharing a net. This can reduce the overall size of the cell.
H01L 27/02 - Devices consisting of a plurality of semiconductor or other solid-state components formed in or on a common substrate including integrated passive circuit elements with at least one potential-jump barrier or surface barrier
Examples herein describe techniques for directed refresh management (DRFM) address capture in high-bandwidth memory (HBM). Some examples are based on an activate command that includes a DRFM flag, including examples in which the activate command is received and processed while a bank is open, examples in which an address of a target row is captured without opening the corresponding bank, and examples in which the address of a target row is captured based further on a mode registers. Other examples are based on a precharge command that includes a DRFM flag.
Examples herein describe techniques for directed refresh management (DRFM) address capture in high-bandwidth memory (HBM). Some examples are based on an activate command that includes a DRFM flag, including examples in which the activate command is received and processed while a bank is open, examples in which an address of a target row is captured without opening the corresponding bank, and examples in which the address of a target row is captured based further on a mode registers. Other examples are based on a precharge command that includes a DRFM flag.
A device includes a plurality of processing elements (PEs). A symmetric memory is allocated in each of the plurality of PEs. The device includes a switch connected to the plurality of PEs. The switch is to: receive, from a first processing element (PE) of the plurality of PEs, a message that includes a buffer offset, compute, based on the buffer offset, a first memory address of a first buffer in a first symmetric memory of the first PE and a second memory address of a second buffer in a second symmetric memory of a second PE of the plurality of PEs, and initiate, based on the first memory address and the second memory address, a direct memory access operation to access the first buffer and the second buffer.
A device includes a plurality of processing elements (PEs). A symmetric memory is allocated in each of the plurality of PEs. The device includes a switch connected to the plurality of PEs. The switch is to: receive, from a first processing element (PE) of the plurality of PEs, a message that includes a buffer offset, compute, based on the buffer offset, a first memory address of a first buffer in a first symmetric memory of the first PE and a second memory address of a second buffer in a second symmetric memory of a second PE of the plurality of PEs, and initiate, based on the first memory address and the second memory address, a direct memory access operation to access the first buffer and the second buffer.
Techniques for performing memory operations are disclosed herein. The techniques include obtaining statistics for operation of a device, the statistics including either or both of performance statistics and memory access statistics; generating a plurality of visualizations of the statistics in one of an overlay mode or a scene annotation mode; and displaying the plurality of visualizations.
The disclosed device includes a cache that stores sets of settings for memory states, and registers that store a current set of settings for a memory. The device also includes a control circuit that can read, from the cache in response to the memory transitioning to a new memory state, a new set of settings corresponding to the new memory state, and write, to the plurality of registers, the new set of settings. Various other methods, systems, and computer-readable media are also disclosed.
A virtual function (VF) of a virtual machine is enabled to directly reset a processing portion of a processing unit. The VF initiates the reset of the processing portion directly and a host driver associated with the processing unit is bypassed during the reset process. By allowing for a direct reset of the processing portion, a processing system reduces the overhead associated with the reset process, enhances system security, and improves overall VM and hardware isolation at the processing system.
G06F 9/455 - EmulationInterpretationSoftware simulation, e.g. virtualisation or emulation of application or operating system execution engines
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
A computer-implemented method for electromagnetic imaging can include capturing, by at least one processor, electromagnetic image data of a sample. The method can also include converting, by the at least one processor, the electromagnetic image data to a multi-layer rasterized image. The method can further include comparing, by the at least one processor, the multi-layer rasterized image to a design file. Various other methods, systems, and computer-readable media are also disclosed.
A graphics processing unit (GPU) of a processing system transmits pixel data for a frame to a display in a compressed burst, so that the pixel data is communicated at a rate that is higher than the rate at which the display scans out the pixel data to refresh the frame at a display panel. By transmitting pixel data for the frame in a compressed burst, the GPU shortens the time spent transmitting the pixel data and extends the time before the next frame of pixel data is to be transmitted. During the extended time before the next frame of pixel data is to be transmitted, the GPU saves power by placing portions of the processing system in a reduced power mode.
Techniques are described for avoiding reinitialization of attributes for successive redundant pixel waves (301, 302, 303) when rendering graphical primitives (105, 110). Attributes of a first primitive are read from a parameter cache (240) to initialize a first pixel wave. Attributes are stored in blocks of a local data store (250) associated with a compute unit (245) rendering the pixel wave. A tracking array is maintained to indicate the local data store blocks storing the attributes. When a second pixel wave associated with the first primitive is detected, reading of the attributes is omitted based on the tracking array.
G09G 5/36 - Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of individual graphic patterns using a bit-mapped memory
Techniques are described for avoiding reinitialization of attributes for successive redundant pixel waves when rendering graphical primitives. Attributes of a first primitive are read from a parameter cache to initialize a first pixel wave. Attributes are stored in blocks of a local data store associated with a compute unit rendering the pixel wave. A tracking array is maintained to indicate the local data store blocks storing the attributes. When a second pixel wave associated with the first primitive is detected, reading of the attributes is omitted based on the tracking array.
An apparatus and method for efficiently providing stability when updated graphics drivers are used in different hardware configurations. A client device includes one or more processors or another type of an integrated circuit that receives a given graphics driver package. When executed by the client device, the operating system stores the components of the authenticated graphics driver package in a protected system folder as part of a staging step. When the client device executes an application, the client device selects between a user mode driver (UMD) of the given graphics driver package and UMDs of the previously staged graphics driver packages. This selection is based on history information collected during past execution of the application. The client device executes the application using installations of the selected UMD and the kernel mode driver (KMD) of the given graphics driver package.
Systems and methods for crowdsourcing cloud application execution are described. An application system receives, from a client device, a first request to initiate an application session. The application system identifies a host device to fulfill the first request. The application system then initiates execution of the application session on the host device and generates, for the client device, a plurality of controls to control the application session executing on the host device. The host device is incentivized for each application session hosted.
A computer-implemented method for enabling debugging can include receiving, at a peripheral device connected through an expansion socket to a base CPU platform, a scan dump instruction from a network computing device connected to the base CPU platform across a network connection and executing, by a System-on-Chip at the peripheral device in response to the scan dump instruction, a debugging procedure. The debugging procedure can include capturing a snapshot of memory of the peripheral device and transmitting the snapshot to the network computing device through memory addresses that have been assigned to memory-mapped input/output. Various other methods, systems, and computer-readable media are also disclosed.
Techniques for implementing adaptive interpolation filter search in video encoding are disclosed. Conventional interpolation filter searching is simplified to implement adaptive interpolation filter search by selecting one or more first filter types to determine one or more initial interpolation costs. After identifying an MV that produces a target interpolation error for one of the one or more first filter types, one or more secondary interpolation costs are calculated for one or more additional filter types based on the identified MV, and one of the one or more first filter type and one or more additional filter types that results in minimal interpolation error is selected as the interpolation filter type.
H04N 19/117 - Filters, e.g. for pre-processing or post-processing
H04N 19/105 - Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
H04N 19/139 - Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
H04N 19/156 - Availability of hardware or computational resources, e.g. encoding based on power-saving criteria
H04N 19/172 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
The disclosed computer-implemented method for multi-lane data bus inversion can include receiving data for transmission via a plurality of data lanes, each data lane corresponding to one of a plurality of inversion bits, and, for each data lane within the plurality of data lanes, applying the corresponding inversion bit to each bit within the data lane. Various other methods, apparatuses, and systems are also disclosed.
A method for hardware management of DMA transfer commands includes accessing, by a first DMA engine, a DMA transfer command and determining a first portion of a data transfer requested by the DMA transfer command. Transfer of a first portion of the data transfer by the first DMA engine is initiated based at least in part on the DMA transfer command. Similarly, a second portion of the data transfer by a second DMA engine is initiated based at least in part on the DMA transfer command. After transferring the first portion and the second portion of the data transfer, an indication is generated that signals completion of the data transfer requested by the DMA transfer command.
G06F 3/06 - Digital input from, or digital output to, record carriers
G06F 13/28 - Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access, cycle steal
49.
METHOD AND APPARATUS FOR DYNAMICALLY REDUCING APPLICATION RENDER-TO-ON SCREEN TIME IN A DESKTOP ENVIRONMENT
A system is provided that includes a computing device operable to render video content for display on a display device and to periodically refresh that display device. The video content includes at least one application window. A desktop compositor is operable to wake and execute commands to compose video frames that are composited surfaces that include the at least one application window and to initiate a buffer flip to deliver the video frames to the display device. A high resolution timer is operable to cause the desktop compositor to wake and execute the commands in multiple instances between display refreshes.
G09G 5/36 - Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of individual graphic patterns using a bit-mapped memory
G09G 5/399 - Control of the bit-mapped memory using two or more bit-mapped memories, the operations of which are switched in time, e.g. ping-pong buffers
A technique for performing video operations is provided. The technique includes characterizing a frame as a flash frame; setting the flash frame as a non-intra frame; prohibiting encoding of frames other than the flash frame with reference to the flash frame; and applying a positive quantization parameter (“QP”) offset to the flash frame.
The disclosed computing device can include guest circuitry configured to provide a virtual function, authorization circuitry configured to authorize host circuitry to access an architecture performance counter for the virtual function, and security circuitry configured to perform a security action based on the authorization. Various other methods, systems, and computer-readable media are also disclosed.
An apparatus and method for efficiently performance among replicated functional blocks of an integrated circuit despite different circuit behavior amongst the functional blocks due to manufacturing variations. An integrated circuit includes multiple replicated functional blocks, each being a semiconductor die with an instantiated copy of particular integrated circuitry for processing a work block. One or more of the functional blocks of the integrated circuit belong in a different performance category or bin than other functional blocks due to manufacturing variations across semiconductor dies. A scheduler assigns work blocks to the functional blocks based on whether a functional block is from a high-performance bin and whether a workload of a work block is a computation intensive workload. The scheduler assigns work blocks work blocks marked as having a memory access intensive workload to functional blocks from a lower performance bin.
A cache controller of a processing system implementing a non-uniform memory architecture (NUMA) adjusts a cache replacement priority of local and non-local data stored at a cache based on a cache replacement policy. Local data is data that is accessed by the cache via a local memory channel and non-local data is data that is accessed by the cache via a non-local memory channel. The cache controller assigns priorities to local and non-local data stored at the cache based on a cache replacement policy and selects data for replacement at the cache based, at least in part, on the assigned priorities.
G06F 12/08 - Addressing or allocationRelocation in hierarchically structured memory systems, e.g. virtual memory systems
G06F 12/0811 - Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
G06F 12/126 - Replacement control using replacement algorithms with special data handling, e.g. priority of data or instructions, handling errors or pinning
A technique for performing ray tracing operations is provided. The technique includes, traversing through a bounding volume hierarchy for a ray to arrive at a well-fit bounding volume that is associated with first node, wherein the first node is one of a traversal node or a procedural node, and wherein the well-fit bounding volume comprises geometry other than a single axis-aligned bounding box for the first node; evaluating the ray for intersection with the well-fit bounding volume; determining whether to execute a first shader program associated with the first node based on the evaluating, wherein the first shader program comprises a traversal shader program or a procedural shader program; and executing or not executing the first shader program based on the determining.
A technique for performing ray tracing operations is provided. The technique includes detecting intersection of a ray with a split bounding volume of an instance of a bounding volume hierarchy; determining whether the split bounding volume meets an instance traversal limiting criterion; and continuing BVH traversal based on the determining.
A cache controller (104) of a processing system (100) implementing a non-uniform memory architecture (NUMA) adjusts a cache replacement priority of local and non-local data stored at a cache based on a cache replacement policy (112). Local data (326) is data that is accessed by the cache (102) via a local memory channel (106) and non-local data (324) is data that is accessed by the cache (102) via a non-local memory channel (116). The cache controller assigns priorities to local and non-local data stored at the cache based on a cache replacement policy and selects data for replacement at the cache based, at least in part, on the assigned priorities.
Task scheduling based on component margins is described. In accordance with the described techniques, a scheduler of an operating system accesses a margin table when a request to perform tasks is received. The scheduler schedules tasks on various components of a system based on margins of those components. When a request to perform a task is received, for example, the scheduler accesses the margin table and selects a component to perform the task based on the margin information included in the margin table as well as based on the task, such as whether the task benefits more from being performed fast or being performed accurately. The scheduler then schedules the task using the selected component.
An apparatus and method for efficiently creating layout of standard cells to improve floor planning of a chip. In various implementations, an integrated circuit uses multiple standard cells with an absence of diffusion breaks at cell boundaries. The standard cells use vertically stacked non-planer transistors. Multiple transistors are formed with an active region having a length between a source region and a drain region of a single transistor. Therefore, the active regions of these transistors are not formed across multiple gate terminals. By having active regions of these transistors formed across a single gate terminal of a single transistor, there is sufficient spacing to provide electrical isolation between two active regions of the two adjoining standard cells. This is true even when the two adjoining standard cells share a source/drain region at the cell boundaries. Accordingly, forming diffusion breaks at the edges of these standard cells can be skipped.
An apparatus and method for efficiently managing memory bandwidth within a communication fabric. A computing system includes multiple clients, a display controller, and a communication fabric that transfers data between the multiple clients, the display controller, and a memory subsystem. A control circuit with power management circuitry determines that one or more conditions are satisfied for changing a power-performance state (P-state) of the memory subsystem. The control circuit asserts indications on a sideband interface specifying to the communication fabric that the display controller is to have an increased bandwidth of data transfer between the display controller and the memory subsystem. Using the increased bandwidth provided by the communication fabric, the display controller prefetches display data from a frame buffer of the memory subsystem prior to the P-state change. Afterward, the memory subsystem performs the P-state change and the corresponding training of the memory interface.
A distribution system receives data access requests associated with at least two virtual channels over at least two physical data communication channels. The requests are distributed into at least two sequencers of the distribution system based on a virtual channel associated with each request. The distribution system includes at least two memory modules—one for each of the at least two physical data communication channels. Requests stored in the sequencers are written to the memory modules according to a pattern that assigns sequential requests associated with a common virtual channel to a sequential ordering of the memory modules. The requests are then granted by an arbiter of the distribution system by retrieving requests associated with a common virtual channel from the memory modules using the sequential ordering of the memory modules.
An apparatus and method for efficiently creating layout of standard cells to improve floor planning of a chip. In various implementations, an integrated circuit uses multiple standard cells with an absence of diffusion breaks at cell boundaries. The standard cells use vertically stacked non-planer transistors. Multiple transistors are formed with an active region having a length between a source region and a drain region of a single transistor. Therefore, the active regions of these transistors are not formed across multiple gate terminals. By having active regions of these transistors formed across a single gate terminal of a single transistor, there is sufficient spacing to provide electrical isolation between two active regions of the two adjoining standard cells. This is true even when the two adjoining standard cells share a source/drain region at the cell boundaries. Accordingly, forming diffusion breaks at the edges of these standard cells can be skipped.
H01L 27/02 - Devices consisting of a plurality of semiconductor or other solid-state components formed in or on a common substrate including integrated passive circuit elements with at least one potential-jump barrier or surface barrier
H01L 21/822 - Manufacture or treatment of devices consisting of a plurality of solid state components or integrated circuits formed in, or on, a common substrate with subsequent division of the substrate into plural individual devices to produce devices, e.g. integrated circuits, each consisting of a plurality of components the substrate being a semiconductor, using silicon technology
H01L 27/06 - Devices consisting of a plurality of semiconductor or other solid-state components formed in or on a common substrate including integrated passive circuit elements with at least one potential-jump barrier or surface barrier the substrate being a semiconductor body including a plurality of individual components in a non-repetitive configuration
H01L 29/423 - Electrodes characterised by their shape, relative sizes or dispositions not carrying the current to be rectified, amplified or switched
A distribution system receives data access requests associated with at least two virtual channels over at least two physical data communication channels. The requests are distributed into at least two sequencers of the distribution system based on a virtual channel associated with each request. The distribution system includes at least two memory modules – one for each of the at least two physical data communication channels. Requests stored in the sequencers are written to the memory modules according to a pattern that assigns sequential requests associated with a common virtual channel to a sequential ordering of the memory modules. The requests are then granted by an arbiter of the distribution system by retrieving requests associated with a common virtual channel from the memory modules using the sequential ordering of the memory modules.
H04L 67/60 - Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
63.
SHADER COMPILER AND SHADER PROGRAM CENTRIC MITIGATION OF CURRENT TRANSIENTS THAT CAUSE VOLTAGE TRANSIENTS ON A POWER RAIL
An apparatus and method for efficiently managing voltage transients on a power rail caused by current transients of an integrated circuit. In various implementations, a computing system includes a processing circuit that executes instructions of a compiler that includes a current transients mitigator. When executing the instructions of the current transients mitigator, the processing circuit generates an estimate of a time rate of current flow being drawn from or returned to the power rail based on instruction types of a first sequence of instructions. Based on the estimate exceeds a threshold, the processing circuit replaces the first sequence of instructions with a second sequence of instructions that provides a smaller estimate. The second sequence is issued to the one or more compute circuits that utilize the power rail, rather than the first sequence.
An electronic device includes a package substrate having a top surface and a bottom surface. An integrated circuit (“IC”) die is disposed on the top surface of the package substrate. The electronic device also includes a stiffener having a ring body and a plurality of support members. The ring body is secured to the top surface of the package substrate and circumscribes the IC die. The plurality of support members extend from the ring body to below the bottom surface of the package substrate.
H01L 25/065 - Assemblies consisting of a plurality of individual semiconductor or other solid-state devices all the devices being of a type provided for in a single subclass of subclasses , , , , or , e.g. assemblies of rectifier diodes the devices not having separate containers the devices being of a type provided for in group
A memory system includes a memory controller, a physical layer (PHY), and a memory (e.g., DRAM). Data is written to and read from the memory in different manners for different memory technologies, such as using different signals or signal timings for different memory technologies. Various runtime services specific to the memory technology are performed by the PHY rather than the memory controller. Examples of such runtime services include performing a training routine to train or re-train an interface between the PHY and the memory, performing a power management routine (e.g., to put the main memory in a self-refresh mode), and so forth.
G11C 11/406 - Management or control of the refreshing or charge-regeneration cycles
G11C 11/4074 - Power supply or voltage generation circuits, e.g. bias voltage generators, substrate voltage generators, back-up power, power control circuits
G11C 11/4093 - Input/output [I/O] data interface arrangements, e.g. data buffers
66.
SYSTEM AGNOSTIC AUTONOMOUS SYSTEM STATE MANAGEMENT
A computing device is provided which comprises memory and a processor in communication with the memory. The processor is configured to autonomously acquire input parameter values, comprising one of monitored device input parameter values from a component of the computing device and monitored user input parameter values. The processor is also configured to select, from a plurality of modes of operation, a mode of operation comprising parameter settings which are determined based on the acquired input parameter values, each of the plurality of modes of operation comprising different parameter settings configured to control the computing device to operate at a different level of performance. The processor is also configured to control operation of the computing device by tuning the parameter settings of the computing device according to the selected mode of operation comprising the determined parameter settings.
Apparatuses, systems and methods for routing requests and responses targeting a shared resource. A queue in a communication fabric is located in a path between the requesters and a shared resource. In some embodiments, the shared resource is a shared address translation cache stored in an endpoint. The physical channel between the queue and the shared resource supports multiple virtual channels. The queue assigns at least one entry to each virtual channel of a group of virtual channels where the group includes a virtual channel for each address translation request type from a single requester of the multiple requesters. When the at least one entry for a given requester is de-allocated, the queue allocates this entry only with requests from the assigned virtual channel even if the empty entry is the only available entry of the queue.
Systems, apparatuses, and methods for implementing duplicated registers for access by initiators across multiple semiconductor dies are disclosed. A system includes multiple initiators on multiple semiconductor dies of a chiplet processor. One of the semiconductor dies is the master die, and this master die has copies of registers which can be accessed by the multiple initiators on the multiple semiconductor dies. When a given initiator on a given secondary die generates a register access, the register access is routed to the master die and a particular duplicate copy of the register maintained for the given secondary die. From the point of view of software, the multiple semiconductor dies appear as a single die, and the multiple initiators appear as a single initiator. Multiple types of registers can be maintained by the master die, with a flush register being one of the register types.
Systems, apparatuses, and methods for performing machine learning content categorization leveraging video encoding pre-processing are disclosed. A system includes at least a motion vector unit and a machine learning (ML) engine. The motion vector unit pre-processes a frame to determine if there is temporal locality with previous frames. If the objects of the scene have not changed by a threshold amount, then the ML engine does not process the frame, saving computational resources that would typically be used. Otherwise, if there is a change of scene or other significant changes, then the ML engine is activated to process the frame. The ML engine can then generate a QP map and/or perform content categorization analysis on this frame and a subset of the other frames of the video sequence.
A processor supports secure memory access in a virtualized computing environment by employing requestor identifiers at bus devices (such as a graphics processing unit) to identify the virtual machine associated with each memory access request. The virtualized computing environment uses the requestor identifiers to control access to different regions of system memory, ensuring that each VM accesses only those regions of memory that the VM is allowed to access. The virtualized computing environment thereby supports efficient memory access by the bus devices while ensuring that the different regions of memory are protected from unauthorized access.
An apparatus and method for efficiently managing power consumption among multiple, replicated functional blocks of an integrated circuit. An integrated circuit includes multiple, replicated memories that use separate power domains. Data of a given type is stored in an interleaved manner among the multiple memories. When control circuitry detects an idle state, commands are sent to the multiple memories specifying storing data of the given type in a contiguous manner in the memories connected to multiple functional blocks. Subsequently, the control circuitry transitions all but one of the memories to the sleep state. The memories rotate amongst themselves with a single memory being in the active state and servicing requests based on which data of the given type is targeted by the requests.
An apparatus and method for efficiently managing power consumption among multiple, replicated functional blocks of an integrated circuit. An integrated circuit includes multiple, replicated functional blocks that use separate power domains. Data of a given type is stored in an interleaved manner among the multiple functional blocks. When control circuitry detects a low-performance mode, commands are sent to the multiple functional blocks specifying storing data of the given type in a contiguous manner in one or more of the caches of the multiple functional blocks and the memories connected to the multiple functional blocks. Following, the control circuitry transitions the memories to a sleep state and transitions all but one of the functional blocks to the sleep state. The functional blocks rotate amongst themselves with a single functional block being in the active state and servicing requests based on which data of the given type is targeted by the requests.
G09G 5/36 - Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of individual graphic patterns using a bit-mapped memory
A chip package includes a package substrate, an integrated circuit (IC) die disposed on the package substrate, and a lid assembly disposed over the IC die. The lid assembly includes a top plate having a lower surface facing the IC die and an outer shoulder. The lid assembly also includes a retainer having a lower surface secured to the package substrate and an inner shoulder retaining to the outer shoulder. The inner shoulder is configured to limit upward movement of the top plate, and expansion of the retainer is decoupled from expansion of the top plate.
H01L 23/051 - ContainersSeals characterised by the shape the container being a hollow construction and having a conductive base as a mounting as well as a lead for the semiconductor body another lead being formed by a cover plate parallel to the base plate, e.g. sandwich type
H01L 21/56 - Encapsulations, e.g. encapsulating layers, coatings
H01L 23/31 - Encapsulation, e.g. encapsulating layers, coatings characterised by the arrangement
74.
Clock Domain Phase Adjustment for Memory Operations
Clock domain phase adjustment techniques and systems for memory operations are described. In one example, a physical memory is communicatively coupled to a physical layer via a first clock domain and a memory controller is communicatively coupled to the physical layer via a second clock domain that is different than the first clock domain. A buffer is implemented in the physical layer. The buffer is configured to set a phase adjustment for a latency setting between the first and second clock domains. The phase adjustment is based on whether a mismatch has occurred in data output by the buffer to the memory controller based on a comparison to the latency setting.
Techniques for performing memory operations are disclosed herein. The techniques include generating a plurality of performance log entries based on observed operations; generating a plurality of memory access log entries based on the observed operations, wherein each performance log entry of the plurality of performance log entries are associated with one or more memory access log entries of the plurality of memory access log entries, wherein each performance log entry is associated with an epoch; and wherein each memory access log entry is associated with an epoch and a memory address range.
An apparatus and method for efficiently managing power consumption among multiple, replicated functional blocks of an integrated circuit. An integrated circuit includes multiple, replicated memories that use separate power domains. Data of a given type is stored in an interleaved manner among the multiple memories. When control circuitry detects an idle state, commands are sent to the multiple memories specifying storing data of the given type in a contiguous manner in the memories connected to multiple functional blocks. Subsequently, the control circuitry transitions all but one of the memories to the sleep state. The memories rotate amongst themselves with a single memory being in the active state and servicing requests based on which data of the given type is targeted by the requests.
An apparatus and method for efficiently performing power management for increasing reliable wireless signal transfer performed by mobile computing devices. In various implementations, a computing system includes a network interface and multiple components for processing tasks. The network interface sends, to at least a given component of the multiple components, an indication specifying the corresponding operating frequency ranges used by one or more radio modules used for wireless communication with an access point. The given component determines whether an operating clock frequency of the given component overlaps any of the received operating frequency ranges and associated harmonic frequencies. If so, then the given component changes the operating clock frequency to a frequency that does not overlap any of the received operating frequency ranges and associated harmonic frequencies.
An apparatus and method for efficiently managing voltage droop among replicated compute circuits of an integrated circuit. In various implementations, an integrated circuit includes multiple, replicated compute circuits, each including circuitry to process tasks grouped into a work block. When a scheduling window has begun, the scheduler determines a value for a threshold number of idle compute circuits that can be simultaneously activated based on one or more of a number of active compute circuits, an operating clock frequency, a measured operating temperature, a number of pending work blocks, and an application identifier. If the scheduler determines that there is a count of idle compute circuits that is equal to or greater than the threshold number of idle compute circuits, then the scheduler limits the number of idle compute circuits that can be activated at one time to the threshold number.
Techniques for performing memory operations are disclosed herein, The techniques include generating a plurality of performance log entries based on observed operations; generating a plurality of memory access log entries based on the observed operations, wherein each performance log entry of the plurality of performance log entries are associated with one or more memory access log entries of the plurality of memory access log entries, wherein each performance log entry is associated with an epoch; and wherein each memory access log entry is associated with an epoch and a memory address range.
A method for die pair partitioning can include providing a circuit die. The method can additionally include providing one or more additional circuit die having one or more fuses positioned therein, wherein the one or more fuses identify the circuit die. The method can also include connecting the one or more additional circuit die to the circuit die. Various other methods, systems, and computer-readable media are also disclosed.
H10B 80/00 - Assemblies of multiple devices comprising at least one memory device covered by this subclass
H01L 23/544 - Marks applied to semiconductor devices, e.g. registration marks, test patterns
H01L 25/00 - Assemblies consisting of a plurality of individual semiconductor or other solid-state devices
H01L 25/18 - Assemblies consisting of a plurality of individual semiconductor or other solid-state devices the devices being of types provided for in two or more different main groups of the same subclass of , , , , or
81.
AUTOMATED DATA-DRIVEN SYSTEM TO OPTIMIZE OVERCLOCKING
A processing device includes an automated overclocking system and a processor. The automated overclocking system is data-driven and includes an inference engine that executes a machine learning model configured to generate a first output based on a current configuration of the processing device. The first output includes a first set of overclocking parameters. The processor is configured to adjust one or more operating characteristics of at least one component of the processing device based on the first set of overclocking parameters.
Dynamic range aware conversion of sensor readings is described. A system includes one or more sensors to sense conditions of a component and output sensor readings and a system manager. The system manager is configured to convert the sensor readings into condition measurements by converting the sensor readings into the condition measurements using a first transformation while operating in a first conversion mode or converting the sensor readings into the condition measurements using a second transformation while operating in a second conversion mode. The system manager then adjusts operation of the component based on the condition measurements.
G05B 19/4155 - Numerical control [NC], i.e. automatically operating machines, in particular machine tools, e.g. in a manufacturing environment, so as to execute positioning, movement or co-ordinated operations by means of programme data in numerical form characterised by programme execution, i.e. part programme or machine function execution, e.g. selection of a programme
G01K 7/01 - Measuring temperature based on the use of electric or magnetic elements directly sensitive to heat using semiconducting elements having PN junctions
83.
REDUCING VOLTAGE DROOP BY LIMITING ASSIGNMENT OF WORK BLOCKS TO COMPUTE CIRCUITS
An apparatus and method for efficiently managing voltage droop among replicated compute circuits of an integrated circuit. In various implementations, an integrated circuit includes multiple, replicated compute circuits, each including circuitry to process tasks grouped into a work block. When a scheduling window has begun, the scheduler determines a value for a threshold number of idle compute circuits that can be simultaneously activated based on one or more of a number of active compute circuits, an operating clock frequency, a measured operating temperature, a number of pending work blocks, and an application identifier. If the scheduler determines that there is a count of idle compute circuits that is equal to or greater than the threshold number of idle compute circuits, then the scheduler limits the number of idle compute circuits that can be activated at one time to the threshold number.
An apparatus and method for efficient power management of multiple integrated circuits. In various implementations, a computing system includes an integrated circuit with a security processor. The security processor determines the integrated circuit transitions to an active state from a sleep state that is not intended to maintain configuration information to return to the active state without restarting an operating system. In the sleep state, multiple components of the integrated circuit have a power supply reference level turned off, which provides low power consumption for the integrated circuit. The security processor performs the bootup operation using information stored in persistent on-chip memory. By not using information stored in off-chip memory, the security processor reduces the latency of the transition. The persistent on-chip memory utilizes synchronous random-access memory that receives a standby power supply reference level that continually supplies a voltage magnitude by not being turned off.
A technique for performing video operations is provided. The technique includes decoding underlying content to obtain a decoded block; and applying a shade pattern to the decoded block to obtain a final block.
H04N 19/85 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
H04N 19/12 - Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
H04N 19/13 - Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
H04N 19/156 - Availability of hardware or computational resources, e.g. encoding based on power-saving criteria
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/61 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
A method for die pair partitioning can include providing a circuit die that has a metal stack and that includes a majority of logic transistors of an integrated circuit. The method can also include providing one or more additional circuit die that have one or more additional metal stacks of which at least one is connected to the metal stack of the circuit die and a majority of static random access memory and analog devices of the integrated circuit. The method can further include connecting at least one of the one or more additional metal stacks to the metal stack of the circuit die. Various other methods, systems, and computer-readable media are also disclosed.
H10B 10/00 - Static random access memory [SRAM] devices
H01L 23/00 - Details of semiconductor or other solid state devices
H01L 23/48 - Arrangements for conducting electric current to or from the solid state body in operation, e.g. leads or terminal arrangements
H01L 23/522 - Arrangements for conducting electric current within the device in operation from one component to another including external interconnections consisting of a multilayer structure of conductive and insulating layers inseparably formed on the semiconductor body
Handling uncorrectable errors in memory is described. In accordance with the described techniques, a system includes a memory, a processor, and an interrupt handler. The memory detects an uncorrectable error in a portion of the memory and issues an interrupt request. The processor converts the uncorrectable error to a deferred error responsive to receiving the interrupt request issued by the memory. The interrupt handler identifies the process that accesses the uncorrectable error in the portion of the memory and handles the uncorrectable error by terminating the process that accesses the uncorrectable error. In one or more implementations, the interrupt handler terminates the process that accesses the uncorrectable error without terminating other processes that are not accessing the uncorrectable error in the portion of the memory.
In response to a video aspect ratio of a frame of video not matching an aspect ratio of a display panel of a display device, a source device of a processing system transmits only the frame to the display device and metadata indicating that the display device is to generate bars for letterboxing or pillarboxing. By generating the bars for letterboxing or pillarboxing at the display device instead of transmitting the bars from the source device to the display device or storing the bars at a frame buffer of the display device, the processing system conserves power and bandwidth.
A processor configured to execute one or more virtual machines (VMs) includes an input-output memory management unit (IOMMU) configured to handle memory-mapped input-output (MMIO) requests and direct memory access (DMA) requests from a processor core of the processor or one or more input/output (I/O) devices. In response to receiving an MMIO or DMA request, the IOMMU is configured to determine a VM associated with the request. The IOMMU then checks a security indicator field of an address space identifier (ASID) mask table to determine if the VM was previously the target of an attack by a malicious entity. In response to the VM previously being a target of an attack, the IOMMU denies the received MMIO or DMA request.
A processor [102] configured to execute one or more virtual machines (VMs) [106] includes an input-output memory management unit (IOMMU) [112] configured to handle memory-mapped input-output (MMIO) requests [122] and direct memory access (DMA) requests [123] from a processor core [104] of the processor or one or more input/output (I/O) devices. In response to receiving an MMIO or DMA request, the IOMMU is configured to determine a VM associated with the request. The IOMMU then checks a security indicator field of an address space identifier (ASID) mask table [114] to determine if the VM was previously the target of an attack by a malicious entity. In response to the VM previously being a target of an attack, the IOMMU denies the received MMIO or DMA request.
G06F 21/53 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by executing in a restricted environment, e.g. sandbox or secure virtual machine
G06F 21/55 - Detecting local intrusion or implementing counter-measures
G06F 9/455 - EmulationInterpretationSoftware simulation, e.g. virtualisation or emulation of application or operating system execution engines
G06F 12/14 - Protection against unauthorised use of memory
91.
SECURE MANAGEMENT OF DEVICE CONTROL INFORMATION IN CONFIDENTIAL COMPUTING ENVIRONMENTS
A processor (100) includes a security processor (110) and an input-output memory management unit (IOMMU) (112). The security processor is configured to maintain device control information (122-2) in a secure data structure (120-2) and prevent a hypervisor (116) from accessing the secure data structure. The IOMMU is configured to process at least one device request (124) targeting a virtual machine (114) from an input/output device (124) based on the secure data structure.
G06F 9/455 - EmulationInterpretationSoftware simulation, e.g. virtualisation or emulation of application or operating system execution engines
G06F 13/42 - Bus transfer protocol, e.g. handshakeSynchronisation
G06F 21/53 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by executing in a restricted environment, e.g. sandbox or secure virtual machine
G06F 21/82 - Protecting input, output or interconnection devices
Sampling circuitry independently accesses channels of texture data that represent a set of pixels. One or more processing units separately compress the channels of the texture data and store compressed data representative of the channels of the texture data for the set of pixels. The channels can include a red channel, a blue channel, and a green channel that represent color values of the set of pixels and an alpha channel that represents degrees of transparency of the set of pixels. Storing the compressed data can include writing the compress data to portions of a cache. The processing units can identify a subset of the set of pixels that share a value of a first channel of the plurality of channels and represent the value of the first channel over the subset of the set of pixels using information representing the value, the first channel, and boundaries of the subset.
An apparatus and method for efficient power management of multiple integrated circuits. An apparatus includes an integrated circuit and a package substrate that includes an embedded inductor. The package substrate includes a glass-reinforced epoxy laminate material. The embedded inductor is formed within a cavity of the package substrate, and includes two negatively coupled inductors connected in a parallel combination. The embedded inductor receives an output voltage generated by a voltage regulator, and conveys this output voltage to a power supply input pin of the integrated circuit. Each of the two negatively coupled inductors utilizes a ferrite core. During a transient voltage condition of the output voltage generated by the voltage regulator, the embedded inductor provides an inductance that is less than an inductance it provides for a steady-state voltage condition of the output voltage generated by the voltage regulator.
H01L 25/065 - Assemblies consisting of a plurality of individual semiconductor or other solid-state devices all the devices being of a type provided for in a single subclass of subclasses , , , , or , e.g. assemblies of rectifier diodes the devices not having separate containers the devices being of a type provided for in group
94.
Apparatus and methods for managing outstanding transactions between one or more requesting units and a target unit
An apparatus includes logic circuitry that selects a retag transaction identifier for an original transaction identifier of an incoming transaction request, based on a plurality of sub-groups of retag tracking data. Each sub-group of retag tracking data is associated with a corresponding sub-group of retag transaction identifiers in a group of retag transaction identifiers. The logic circuitry retags the incoming transaction request with the selected retag transaction identifier (ID) by replacing the original transaction identifier associated with the incoming transaction request and sends the retagged incoming transaction request to a target. A returned response is received from the target. The retag transaction ID in the returned response is removed and replaced with the original transaction ID before being sent as a reply to the requesting unit. Associated methods are also presented.
In an implementation, a semiconductor chip includes a device layer, an interconnect layer fabricated on the device layer, the interconnect layer including a conductive pad, and a conductive pillar coupled to the conductive pad. The conductive pillar includes at least a first portion having a first width and a second portion having a second width, the first portion being disposed between the second portion and the conductive pad, wherein the first width of the first portion is greater than the second width of the second portion.
Techniques are described for adaptive device power management. The device interface application of a hardware computing unit detects a launch of an application by the operating system (OS) to be executed on the hardware computing unit, in an implementation. The device interface application identifies the launched application and determines whether a hardware profile exists that is associated with the application. The hardware profile includes one or more hardware parameters that yield the optimal performance for power consumption by the hardware computing unit when executing the launched application. Based on determining that the hardware profile exists, the power policy of the OS is updated for the launched application, and thereby, the driver updates the power state(s) of the hardware computing unit based on the new power policy.
The disclosed voltage regulator circuit includes a capacitor bank configured for a first voltage step corresponding to a voltage undershoot, and a shunt circuit configured for a second voltage step exceeding the first voltage step. Various other methods, systems, and computer-readable media are also disclosed.
The disclosed device for power management of chiplet interconnects includes multiple chiplets connected via multiple interconnects. The device also includes a control circuit that detects activity states of the chiplets and manages power states of the interconnects based on the detected activity states. Various other methods, systems, and computer-readable media are also disclosed.
Chip packages are described herein that includes chiplets embedded in a core of a substrate of the chip package, such as a package substrate or an interposer. In one example, the chiplet includes voltage regulation circuitry that is coupled through a substrate core embedded inductor to an integrated circuit (IC) die mounted to the substrate.
H01L 23/50 - Arrangements for conducting electric current to or from the solid state body in operation, e.g. leads or terminal arrangements for integrated circuit devices
H01L 23/538 - Arrangements for conducting electric current within the device in operation from one component to another the interconnection structure between a plurality of semiconductor chips being formed on, or in, insulating substrates
Chip packages are described herein that includes chiplets embedded in a core of a substrate of the chip package, such as a package substrate or an interposer. In one example, the chiplet includes voltage regulation circuitry that is coupled through a substrate core embedded inductor to an integrated circuit (IC) die mounted to the substrate.
H01L 23/522 - Arrangements for conducting electric current within the device in operation from one component to another including external interconnections consisting of a multilayer structure of conductive and insulating layers inseparably formed on the semiconductor body
H01L 23/00 - Details of semiconductor or other solid state devices
H01L 25/065 - Assemblies consisting of a plurality of individual semiconductor or other solid-state devices all the devices being of a type provided for in a single subclass of subclasses , , , , or , e.g. assemblies of rectifier diodes the devices not having separate containers the devices being of a type provided for in group