A method of managing resources in a graphics processing pipeline includes, in response to selecting a task for execution within a texture/shading unit, allocating to the task both a static allocation of temporary registers for the entire task and a dynamic allocation of temporary registers. The dynamic allocation comprises temporary registers used by a first phase of the task only and the static allocation of temporary registers comprises any temporary registers that are used by the program and are live at a boundary between two phases. When the task subsequently reaches a boundary between two phases, the dynamic allocation of temporary registers are freed and a new dynamic allocation of temporary registers for a next phase of the task is allocated to the task.
A graph convolutional network (GCN) having a GCN layer is configured. The GCN layer performs an operation in dependence on an adjacency matrix, a feature embedding matrix and a weight matrix. In response to determining that the weight matrix comprises more rows than columns, the GCN layer is configured to determine a first intermediate result of multiplying the feature embedding matrix and the weight matrix, and subsequently use the determined first intermediate result to determine a full result representing a result of multiplying the adjacency matrix, the feature embedding matrix and the weight matrix. In response to determining that the weight matrix comprises more columns than rows, the GCN layer is configured to determine a second intermediate result of multiplying the adjacency matrix and the feature embedding matrix, and subsequently use the determined second intermediate result to determine the full result representing the result of multiplying the adjacency, feature embedding and weight matrices.
A method of analyzing one or more objects in a set of frames. A first frame is segmented to produce a plurality of first masks each identifying pixels belonging to a potential object-instance detected in the first frame. A first feature vector is extracted from the first frame for each potential object-instance detected therein, characterizing the potential object-instance. A second frame is segmented to produce a plurality of second masks each identifying pixels belonging to a potential object-instance detected in the second frame. A second feature vector is extracted for each potential object-instance detected in the second frame, characterizing the potential object-instance. A potential object-instance in the first frame is matched with one of the potential object-instances in the second frame.
A graphics processing system includes a plurality of processing units, wherein the graphics processing system is configured to process a task first and second times at the plurality of processing units. Data identifying which processing unit of the plurality of processing units the task has been allocated to is consulted on allocating the task to a processing unit for processing for a second time, and, in response, the task is allocated for processing for the second time to any processing unit of the plurality of processing units other than the processing unit to which the task was allocated for processing for a first time.
Methods and image processing systems are provided for determining a dominant gradient orientation for a target region within an image. A plurality of gradient samples are determined for the target region, wherein each of the gradient samples represents a variation in pixel values within the target region. The gradient samples are converted into double-angle gradient vectors, and the double-angle gradient vectors are combined so as to determine a dominant gradient orientation for the target region.
G06T 7/44 - Analysis of texture based on statistical description of texture using image operators, e.g. filters, edge density metrics or local histograms
G06F 18/22 - Matching criteria, e.g. proximity measures
G06T 7/73 - Determining position or orientation of objects or cameras using feature-based methods
G06V 10/46 - Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]Salient regional features
A graphics processing system includes a tiling unit configured to tile a scene into a plurality of tiles. A processing unit identifies tiles of the plurality of tiles that are each associated with at least a predetermined number of primitives. A memory management unit allocates a portion of memory to each of the identified tiles and does not allocate a portion of memory for each of the plurality of tiles that are not identified by the processing unit. A rendering unit renders each of the identified tiles and does not render tiles that are not identified by the processing unit.
Data in a processing system is compressed, the data comprising a plurality of values having a same multiple-byte format. Bytes with corresponding byte significance are grouped together to form a plurality of byte blocks, and a byte block of the plurality of byte blocks is compressed using a compression algorithm comprising storing at least one byte from the byte block as a byte origin, and storing at least one remaining byte in the byte block as a difference value from the byte origin.
A method of processing an input task in a processing system involves duplicating the input task so as to form a first task and a second task; allocating memory including a first block of memory configured to store read-write data to be accessed during the processing of the first task; a second block of memory configured to store a copy of the read-write data to be accessed during the processing of the second task; and a third block of memory configured to store read-only data to be accessed during the processing of both the first task and the second task; and processing the first task and the second task at processing logic of the processing system so as to, respectively, generate first and second outputs.
Rendering systems that can use combinations of rasterization rendering processes and ray tracing rendering processes are disclosed. In some implementations, these systems perform a rasterization pass to identify visible surfaces of pixels in an image. Some implementations may begin shading processes for visible surfaces, before the geometry is entirely processed, in which rays are emitted. Rays can be culled at various points during processing, based on determining whether the surface from which the ray was emitted is still visible. Rendering systems may implement rendering effects as disclosed.
G09G 5/36 - Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of individual graphic patterns using a bit-mapped memory
Hardware is configured for implementing a Deep Neural Network (DNN) for performing an activation function. A programmable lookup table for storing lookup data approximating the activation function is provided at an activation module for performing the activation function. Training data is provided to an input layer of a representation of the hardware, wherein the representation of the hardware is configured to implement the DNN, to configure the DNN by using the training data, wherein configuring the DNN comprises determining lookup data for the lookup table representing the activation function. The lookup data is loaded into the lookup table of the hardware, thereby configuring the activation module of the hardware for performing the activation function during post-training operation.
Systems and method to implement a geometry processing phase of tile-based rendering. The systems include a plurality of parallel geometry pipelines, a plurality of tiling pipelines and a geometry to tiling arbiter situated between the plurality of geometry pipelines and the plurality of tiling pipelines. Each geometry pipeline is configured to generate one or more geometry blocks for each geometry group of a subset of ordered geometry groups; generate a corresponding primitive position block for each geometry block, and compress each geometry blocks to generate a corresponding compressed geometry block. The tiling pipelines are configured to generate, from the primitive position blocks, a list for each tile indicating primitives that fall within the bounds of that tile. The geometry to tiling arbiter is configured to forward the primitive position blocks generated by the plurality of geometry pipelines to the plurality of tiling pipelines in the correct order based on the order of the geometry groups.
A processor has first and second cores and a distributed cache that caches a copy of data stored at a plurality of memory addresses of a memory. A first cache slice is connected to the first core, and a second cache slice is connected to the second core. The first cache caches a copy of data stored at a first set of memory addresses, and the second cache slice caches a copy of data stored at a second, different, set of memory addresses.
A transaction processing circuit in a graphics rendering system receives information identifying a particular vertex of a plurality of vertices in a strip, each of which is associated with a viewport, and selects a plurality of viewports for viewport transformation of the particular vertex by selecting relevant vertices from the vertices in the strip based on a provoking vertex, and selecting the plurality of viewports to comprise the viewport associated with that relevant vertex. Viewport transformation instructions are sent to a viewport transformation module to perform a viewport transformation on untransformed coordinate data for the particular vertex for each of the viewports, wherein the one or more viewport transformation instructions comprises a viewport transformation instruction for each of the plurality of viewports, each viewport transformation instruction comprises information identifying the particular vertex and information identifying one of the plurality of viewports.
A method of matching features in first and second images captured from respective camera viewpoints related by an epipolar geometry. The coordinate system of the second image is transformed so as to map an epipolar line in the second image corresponding to a first feature in the first image, to be parallel to one of the coordinate axes of the coordinate system. The epipolar line defines a geometrically-constrained region in the second image in the transformed coordinate system corresponding to the first feature in the first image; measures of similarity between the first feature in the first image and features in the second image are determined; and a best match feature is identified from the measures of similarity between the first feature in the first image and the respective features in the second image.
G06T 7/593 - Depth or shape recovery from multiple images from stereo images
G06F 18/22 - Matching criteria, e.g. proximity measures
G06T 7/73 - Determining position or orientation of objects or cameras using feature-based methods
G06V 10/44 - Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersectionsConnectivity analysis, e.g. of connected components
G06V 10/46 - Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]Salient regional features
G06V 10/74 - Image or video pattern matchingProximity measures in feature spaces
Hardware logic for implementing a convolutional neural network (CNN) is configured to receive input data values to be processed in a layer of the CNN. Addresses in banked memory of a buffer in which the received input data values are to be stored are determined based upon format data indicating a format parameter of the input data in the layer and indicating a format parameter of a filter which is to be used to process the input data in the layer, wherein the format parameter of the filter comprises a stride. The received input data values are then stored at the determined addresses in the buffer for retrieval for processing in the layer.
A method and system for generating and shading a computer graphics image in a tile based computer graphics system is provided. Geometry data is supplied and a plurality of primitives are derived from the geometry data. One or more modified primitives are then derived from at least one of the plurality of primitives. For each of a plurality of tiles, an object list is derived including data identifying the primitive from which each modified primitive located at least partially within that tile is derived. Alternatively, the object list may include data identifying each modified primitive located at least partially within that tile. Each tile is then shaded for display using its respective object list.
A cache system in a graphics processing system stores graphics data items for use in rendering primitives. It is determined whether graphics data items relating to primitives to be rendered are present in the cache, and if not then computation instances for generating the graphics data items are created. Computation instances are allocated to tasks using a task assembly unit which stores task entries for respective tasks. The task entries indicate which computation instances have been allocated to the respective tasks. The task entries are associated with characteristics of computation instances which can be allocated to the respective tasks. A computation instance to be executed is allocated to a task based on the characteristics of the computation instance. SIMD processing logic executes computation instances of a task outputted from the task assembly unit to thereby determine graphics data items, which can be used to render the primitives.
Graphics processing systems render items of geometry using a rendering space subdivided into a plurality of first regions. The items of geometry are stored in data blocks having a respective block ID. The items of geometry are rendered within a second region of a plurality of second regions using a first control list for the first region of which the second region is a part, and a second control list for the second region, each control list comprising entries associated with respective items of geometry, each of the entries comprising a block ID associated with a data block. The items of geometry are rendered within the second region by choosing from the first control list and the second list, the entry comprising the lowest block ID which has not previously been chosen, and fetching items of geometry from the data block associated with the block ID of the chosen entry.
09 - Scientific and electric apparatus and instruments
42 - Scientific, technological and industrial services, research and design
45 - Legal and security services; personal services for individuals.
Goods & Services
Computer software for microprocessors and electronic circuits; computer software for graphics processing; computer software for multimedia processing; computer software in relation to instruction set architectures; computer software in relation to neural networks processors; artificial intelligence and machine learning software in connection with microprocessors and electronic circuits; firmware and device drivers for microprocessors; interfaces between computer hardware and computer software; electronic databases featuring data and information relating to microprocessors, electronic circuits, graphics processing, instruction set architectures and neural networks processors; electronic publications in the field of microprocessors; microprocessors; electronic circuits; central processing units; graphics processing units; neural network processors. Providing online non-downloadable computer software, Software-as-a-Service, and Platform-as-a-Service in relation to microprocessors, electronic circuits, graphics processing, multimedia processing, instruction set architectures, neural networks processors, artificial intelligence and machine learning for microprocessors and electronic circuits; research, design and development services in relation to microprocessors, electronic circuits, graphics processing, multimedia processing, instruction set architectures, neural networks processors, artificial intelligence and machine learning for microprocessors and electronic circuits; research, design and development services in relation to microprocessor architecture. Licensing of Intellectual property and technology; licensing of know-how, namely practical knowledge, skill and expertise in relation to the development of microprocessors, electronic circuits, graphics processing, multimedia processing, instruction set architectures, neural networks processors, artificial intelligence and machine learning for microprocessors and electronic circuits.
45 - Legal and security services; personal services for individuals.
09 - Scientific and electric apparatus and instruments
42 - Scientific, technological and industrial services, research and design
Goods & Services
Licensing of Intellectual property and technology; licensing of know-how, namely practical knowledge, skill and expertise in relation to the development of microprocessors, electronic circuits, graphics processing, multimedia processing, instruction set architectures, neural networks processors, artificial intelligence and machine learning for microprocessors and electronic circuits Computer software for microprocessors and electronic circuits; computer software for graphics processing; computer software for multimedia processing; computer software in relation to instruction set architectures; computer software in relation to neural networks processors; artificial intelligence and machine learning software in connection with microprocessors and electronic circuits; firmware and device drivers for microprocessors; interfaces between computer hardware and computer software; electronic databases featuring data and information relating to microprocessors, electronic circuits, graphics processing, instruction set architectures and neural networks processors; electronic publications in the field of microprocessors; microprocessors; electronic circuits; central processing units; graphics processing units; neural network processors Providing online non-downloadable computer software, Software-as-a-Service, and Platform-as-a-Service in relation to microprocessors, electronic circuits, graphics processing, multimedia processing, instruction set architectures, neural networks processors, artificial intelligence and machine learning for microprocessors and electronic circuits; research, design and development services in relation to microprocessors, electronic circuits, graphics processing, multimedia processing, instruction set architectures, neural networks processors, artificial intelligence and machine learning for microprocessors and electronic circuits; research, design and development services in relation to microprocessor architecture
An adder and a method for calculating 2n+x are provided, where x is a variable input expressed in a floating point format and n is an integer. The adder comprises a first path configured to calculate 2n+x for x<0 and 2n−1≤|x|<2n+1; a second path configured to calculate 2n+x for |x|<2n; a third path configured to calculate 2n+x for |x|≥2n; and selection logic configured to cause the adder to output a result from one of the first, second, and third paths in dependence on the values of x and n.
A colour processor for mapping an image from source to destination colour gamuts has an input for receiving a source image including a plurality of source colour points expressed according to the source gamut; a colour characterizer configured to, for each source colour point in the source image, determine a position of intersection of a curve with the boundary of the destination gamut; and a gamut mapper configured to, for each source colour point in the source image: if the source colour point lies inside the destination gamut, apply a first translation factor to translate the source colour point to a destination colour point within a first range of values; or if the source colour point lies outside the destination gamut, apply a second translation factor, different to the first translation factor, to translate the source colour point to a destination colour point within a second range of values.
G09G 5/02 - Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the way in which colour is displayed
G09G 5/06 - Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the way in which colour is displayed using colour palettes, e.g. look-up tables
A multicore graphics processing unit (GPU) and a method of operating a GPU are provided. The GPU comprises at least a first core and a second core. At least one of the cores in the multicore GPU comprises a master unit configured to distribute geometry processing tasks between at least the first core and the second core.
Input/output filter units for use in a graphics processing unit include a first buffer configured to store data received from, and output to, a first component of the graphics processing unit; a second buffer configured to store data received from, and output to, a second component of the graphics processing unit; a weight buffer configured to store filter weights; a filter bank configurable to perform a plurality of types of filtering on a set of input data, the plurality of types of filtering comprising texture filtering types and pixel filtering types; and control logic configured to cause the filter bank to: (i) perform one of the plurality of types of filtering on a set of data stored in one of the first and second buffers using a set of weights stored, and (ii) store the results of the filtering in one of the first and second buffers.
G06F 7/57 - Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups or for performing logical operations
G06F 13/12 - Program control for peripheral devices using hardware independent of the central processor, e.g. channel or peripheral processor
Methods and apparatus for generating a data structure for storing primitive data for a number of primitives and vertex data for a plurality of vertices, wherein each primitive is defined with reference to one or more of the plurality of vertices. The vertex data comprises data for more than one view, such as a left view and a right view, with vertex parameter values for a first group of vertex parameters being stored separately for each view and vertex parameter values for a second, non-overlapping group of vertex parameters being stored only once and used when rendering either or both views.
Methods and hardware for cube mapping comprise receiving fragment coordinates for an input block of fragments and texture instructions for the fragments and then determining, based on gradients of the input block of fragments, whether a first mode of cube mapping or a second mode of cube mapping is to be used, wherein the first mode of cube mapping performs calculations at a first precision for a subset of the fragments and calculations for remaining fragments at a second, lower, precision and the second mode of cube mapping performs calculations for all fragments at the first precision. Cube mapping is then performed using the determined mode and the gradients, wherein if the second mode is used and more than half of the fragments in the input block are valid, the cube mapping is performed over two clock cycles.
A method of decompression to determine data values from compressed data comprising representations of one or more difference values for the data values being decompressed, each difference value representing a difference between the respective data value and an origin value, wherein the representations of the one or more difference values are included in the compressed data using a second number of bits. Based on the representations of the one or more difference values in the compressed data and a first number of bits for representing the one or more difference values for the one or more data values, for each of the one or more data values being decompressed, a difference value is determined in accordance with the first number of bits. Each of the one or more data values being decompressed is determined using: (i) the origin value, and (ii) the determined difference value for the data value.
G06F 7/72 - Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radixComputing devices using combinations of denominational and non-denominational quantity representations using residue arithmetic
G06T 3/40 - Scaling of whole images or parts thereof, e.g. expanding or contracting
A binary logic circuit performs an interpolation calculation between two endpoint values E0 and E1 using a weighting index i for generating an interpolated result P, the values E0 and E1 being formed from Adaptive Scalable Texture Compression (ASTC) colour endpoint values C0 and C1 respectively, the colour endpoint values C0 and C1 being low-dynamic range (LDR) or high dynamic range (HDR) values. An interpolation unit performs an interpolation between the values C0 and C1 using the index i to generate a first intermediate interpolated result C2; combinational logic circuitry receives the result C2 and performs logical processing operations to calculate the interpolated result P according to the equation: (1) P=└((C2<<8)+C2+32)/64┘ when the interpolated result is not to be compatible with an sRGB colour space and the colour endpoint values are LDR values; (2) P=└((C2<<8)+128.64+32)/64┘ when the interpolated result is to be compatible with an sRGB colour space and the colour endpoint values are LDR values; and (3) P=(C2+2)>>2 when the colour endpoint values are HDR values.
Texture filtering is applied to a texture represented with a mipmap comprising a plurality of levels, wherein each level of the mipmap comprises an image representing the texture at a respective level of detail. A texture filtering unit has minimum and maximum limits on an amount by which it can alter the level of detail when it filters texels from an image of a single level of the mipmap. The range of level of detail between the minimum and maximum limits defines an intrinsic region of the texture filtering unit. If it is determined that a received input level of detail is in an intrinsic region of the texture filtering unit, texels are read from a single mipmap level of the mipmap, and the read texels from the single mipmap level are filtered to determine a filtered texture value representing part of the texture at the input level of detail. If it is determined that the received input level of detail is in an extrinsic region of the texture filtering unit: texels are read from two mipmap levels of the mipmap, and the read texels from the two mipmap levels are processed to determine a filtered texture value representing part of the texture at the input level of detail.
Methods and tiling engines for tiling primitives in a tile based graphics processing system in which a rendering space is divided into a plurality of tiles. The method includes generating a multi-level hierarchy of tile groups, each level of the multi-level hierarchy comprising one or more tile groups comprising one or more of the plurality of tiles; receiving a plurality of primitive blocks, each primitive block comprising geometry data for one or more primitives; associating each of the plurality of primitive blocks with one or more of the tile groups up to a maximum number of tile groups such that if at least one primitive of a primitive block falls, at least partially, within the bounds of a tile, the primitive block is associated with at least one tile group that includes that tile; and generating a control stream for each tile group based on the associations, wherein each control stream comprises a primitive block entry for each primitive block associated with the corresponding tile group.
Rendering system combines point sampling and volume sampling operations to produce rendering outputs. For example, to determine color information for a surface location in a 3-D scene, one or more point sampling operations are conducted in a volume around the surface location, and one or more sampling operations of volumetric light transport data are performed farther from the surface location. A transition zone between point sampling and volume sampling can be provided, in which both point and volume sampling operations are conducted. Data obtained from point and volume sampling operations can be blended in determining color information for the surface location. For example, point samples are obtained by tracing a ray for each point sample, to identify an intersection between another surface and the ray, to be shaded, and volume samples are obtained from a nested 3-D grids of volume elements expressing light transport data at different levels of granularity.
Methods of encoding and decoding are described which use a variable number of instruction words to encode instructions from an instruction set, such that different instructions within the instruction set may be encoded using different numbers of instruction words. To encode an instruction, the bits within the instruction are re-ordered and formed into instruction words based upon their variance as determined using empirical or simulation data. The bits in the instruction words are compared to corresponding predicted values and some or all of the instruction words that match the predicted values are omitted from the encoded instruction.
Adder circuits and associated methods for processing a set of at least three floating-point numbers to be added together include identifying, from among the at least three numbers, at least two numbers that have the same sign—that is, at least two numbers that are both positive or both negative. The identified at least two numbers are added together using one or more same-sign floating-point adders. A same-sign floating-point adder comprises circuitry configured to add together floating-point numbers having the same sign and does not include circuitry configured to add together numbers having different signs.
G06F 7/24 - Sorting, i.e. extracting data from one or more carriers, re-arranging the data in numerical or other ordered sequence, and re-recording the sorted data on the original carrier or on a different carrier or set of carriers
G06F 7/501 - Half or full adders, i.e. basic adder cells for one denomination
H03K 19/20 - Logic circuits, i.e. having at least two inputs acting on one outputInverting circuits characterised by logic function, e.g. AND, OR, NOR, NOT circuits
34.
Modifying Processing of Commands in a Command Queue Based on Accessed Data Related to a Command
Processing of commands at a graphics processor are controlled by receiving input data and generating a command for processing at the graphics processor from the input data, wherein the command will cause the graphics processor to write out at least one buffer of data to an external memory, and submitting the command to a queue for later processing at the graphics processor. Subsequent to submitting the command, but before the write to external memory has been completed, further input data is received and it is determined that the buffer of data does not need to be written to external memory. The graphics processor is then signalled to prevent at least a portion of the write to external memory from being performed for the command.
A method of filtering a target pixel in an image forms, for a kernel of pixels comprising the target pixel and its neighbouring pixels, a data model to model pixel values within the kernel; calculates a weight for each pixel of the kernel comprising: (i) a geometric term dependent on a difference in position between that pixel and the target pixel; and (ii) a data term dependent on a difference between a pixel value of that pixel and its predicted pixel value according to the data model; and uses the calculated weights to form a filtered pixel value for the target pixel, e.g. by updating the data model with a weighted regression analysis technique using the calculated weights for the pixels of the kernel; and evaluating the updated data model at the target pixel position so as to form the filtered pixel value for the target pixel.
Neural network accelerators with one or more neural network accelerator cores. Each neural network accelerator core has hardware accelerators configured to accelerate neural network operations, an embedded processor, a command decoder, and a hardware feedback path between the embedded processor and the command decoder. The command decoder is configured to control the hardware accelerators and the embedded processor of that core in accordance with commands of a command stream, and when the command stream comprises a set of one or more branch commands that indicate a conditional branch is to be performed, cause the embedded processor to determine a next command stream, and in response to receiving information from the embedded processor identifying the next command stream via the hardware feedback path, control the one or more hardware accelerators and the embedded processor in accordance with commands of the next command stream.
A method of GPU virtualization comprises allocating each virtual machine (or operating system running on a VM) an identifier by the hypervisor and then this identifier is used to tag every transaction deriving from a GPU workload operating within a given VM context (i.e. every GPU transaction on the system bus which interconnects the CPU, GPU and other peripherals). Additionally, dedicated portions of a memory resource (which may be GPU registers or RAM) are provided for each VM and whilst each VM can only see their allocated portion of the memory, a microprocessor within the GPU can see all of the memory. Access control is achieved using root memory management units which are configured by the hypervisor and which map guest physical addresses to actual memory addresses based on the identifier associated with the transaction.
Shader processing units for a graphics processing unit execute ray tracing shaders that generate ray data associated with rays. The ray data includes a plurality of ray data elements. Store logic receives, as part of a ray tracing shader, a ray store instruction that includes: (i) information identifying a store group of a plurality of store groups, each store group comprising one or more ray data elements of the plurality of ray data elements, and (ii) information identifying one or more ray data elements of the identified store group to be stored in an external unit. In response to receiving the ray store instruction, the store logic retrieves the identified ray data elements for one or more rays from the storage. The store logic then sends one or more store requests to an external unit which cause the external unit to store the identified ray data elements for the one or more rays.
Methods and intersection testing modules are provided for determining, in a ray tracing system, whether a ray intersects a 3D axis-aligned box representing a volume defined by a front-facing plane and a back-facing plane for each dimension. The front-facing plane of the box which intersects the ray furthest along the ray is identified. It is determined whether the ray intersects the identified front-facing plane at a position that is no further along the ray than positions at which the ray intersects the back-facing planes in a subset of the dimensions, and this determination is used to determine whether the ray intersects the axis-aligned box. The subset of dimensions comprises the two dimensions for which the front-facing plane was not identified, but does not comprise the dimension for which the front-facing plane was identified. It is determined whether the ray intersects the box without performing a test to determine whether the ray intersects the identified front-facing plane at a position that is no further along the ray than a position at which the ray intersects the back-facing plane in the dimension for which the front-facing plane was identified.
A graphics processing system with a data store includes processing units for processing tasks. A check unit forms a signature which is characteristic of an output from processing a task on a processing unit, and a fault detection unit compares signatures formed at the check unit. Each task is processed first and second times at the processing units to generate first and second processed outputs. The graphics processing system write outs the first processed output to the data store, reads back the first processed output from the data store and forms at the check unit a first signature characteristic of the first processed output as read back from the data store; forms at the check unit a second signature characteristic of the second processed output, compares the first and second signatures at the fault detection unit, and raises a fault signal if the signatures do not match.
G06F 9/48 - Program initiatingProgram switching, e.g. by interrupt
G06F 9/30 - Arrangements for executing machine instructions, e.g. instruction decode
G06F 9/38 - Concurrent instruction execution, e.g. pipeline or look ahead
G06F 11/10 - Adding special bits or symbols to the coded information, e.g. parity check, casting out nines or elevens
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 11/16 - Error detection or correction of the data by redundancy in hardware
G06F 21/62 - Protecting access to data via a platform, e.g. using keys or access control rules
G06F 21/64 - Protecting data integrity, e.g. using checksums, certificates or signatures
A graphics processing system for performing tile-based rendering of a scene that comprises safety-related primitives. The system comprises a plurality of graphics processing units (GPUs), each configured to i) receive tile data identifying one or more protected tiles comprising at least part of a safety-related primitive, ii) process two respective sets of protected tiles, and iii) based on said processing, generate two respective checksums for each respective set of protected tiles. The two respective sets of protected tiles are mutually exclusive, and each respective set and each protected tile being processed by two different GPUs. The system comprises a comparison unit configured to compare one or more pairs of checksums, each pair comprising a respective checksum generated based on a same respective set of protected tiles and generated by different GPUs. The graphics processing system is configured to perform one or more actions based on an outcome of said comparison.
A computer-implemented method of compressing one or more data values, uses a first number of bits to determine a second number of bits, wherein the first number of bits is for representing a maximum difference value of one or more difference values representing one or more differences between the one or more data values and an origin value; and forms compressed data, wherein the compressed data comprises one or more representations of the one or more difference values, wherein each of the one or more representations of the one or more difference values uses said determined second number of bits.
G06F 7/72 - Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radixComputing devices using combinations of denominational and non-denominational quantity representations using residue arithmetic
G06T 3/40 - Scaling of whole images or parts thereof, e.g. expanding or contracting
43.
CPU FOR IMPLEMENTING A GRAPHICS PROCESSING PIPELINE
A central processing unit for implementing a graphics processing pipeline which comprises a plurality of graphics processing tasks, the central processing unit comprising: one or more distinct graphics processing modules configured in dedicated hardware, wherein each of the one or more distinct graphics processing modules is configured to perform one of the graphics processing tasks of the graphics processing pipeline; and an execution unit configured to execute instructions of an instruction set for implementing the graphics processing pipeline, wherein the execution unit is configured to call each of the one or more distinct graphics processing modules using a respective instruction of the instruction set to perform its respective graphics processing task of the graphics processing pipeline.
A hierarchical acceleration structure for use in a ray tracing system. When generating a node for the hierarchical acceleration structure, the primitives in a particular portion of the 3D scene may be alternatively bounded by different shaped volumes. These bounding volumes or ‘bounding regions’ can be Axis Aligned Bounding Boxes (AABBs), although other bounding volumes can be used. The ray tracing system may use sets of two or more bounding volumes in a 3D scene to bound all the primitives within that portion. The choice of how to create sets of multiple bounding volumes within a portion of the 3D scene may be done by using a binary space partition (BSP). Different sets of bounding regions may present different amounts of surface area for a hypothetical ray entering the portion of the 3D scene dependent upon the expected ray direction or angle.
A graphics processing system is configured to render primitives using a rendering space that is sub-divided into sections, wherein the graphics processing system includes assessment logic configured to make an assessment regarding the presence of primitive edges in a section, and determination logic configured to determine an anti-aliasing setting for the section based on the assessment.
A computer implemented method of training a neural network configured to combine a set of coefficients with respective input data values. So as to train a test implementation of the neural network, sparsity is applied to one or more of the coefficients according to a sparsity parameter, the sparsity parameter indicating a level of sparsity to be applied to the set of coefficients; the test implementation of the neural network is operated on training input data using the coefficients so as to form training output data; in dependence on the training output data, assessing the accuracy of the neural network; the sparsity parameter is updated in dependence on the accuracy of the neural network; and a runtime implementation of the neural network is configured in dependence on the updated sparsity parameter.
G06N 3/084 - Backpropagation, e.g. using gradient descent
G06F 18/2136 - Feature extraction, e.g. by transforming the feature spaceSummarisationMappings, e.g. subspace methods based on sparsity criteria, e.g. with an overcomplete basis
G06N 3/082 - Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
47.
Unified Rasterization and Ray Tracing Rendering Environments
A graphics processor architecture provides for scan conversion and ray tracing approaches to visible surface determination as concurrent and separate processes. Surfaces can be identified for shading by scan conversion and ray tracing. Data produced by each can be normalized, so that instances of shaders, being executed on a unified shading computation resource, can shade surfaces originating from both ray tracing and rasterization. Such resource also may execute geometry shaders. The shaders can emit rays to be tested for intersection by the ray tracing process. Such shaders can complete, without waiting for those emitted rays to complete. Where scan conversion operates on tiles of 2-D screen pixels, the ray tracing can be tile aware, and controlled to prioritize testing of rays based on scan conversion status. Ray population can be controlled by feedback to any of scan conversion, and shading.
A hardware-implemented method of indexing data elements in a source array in a memory, generates a number of shifted copy arrays based on the source array, each shifted copy array comprising the data elements of the source array at a respective shifted position. A plurality of indices for indexing the source array are received, each index of the plurality of indices indicating a target location in the source array, and for each index of the plurality of indices, a data element is retrieved from each of the shifted copy arrays. The retrieved elements are gated based on the index, to thereby select a data element.
G06F 17/11 - Complex mathematical operations for solving equations
49.
PROCESSOR HAVING FIRST AND SECOND PIPELINES AND BLOCKING CIRCUIT ENABLING SECOND PIPELINE TO PROCESS TASKS DURING DEALLOCATION OF MEMORY ALLOCATED TO FIRST PIPELINE
A processor includes a first processing pipeline, a second processing pipeline and a memory management that allocates memory regions from memory for the first processing pipeline to write the data of each of a first of a sequence of tasks, and deallocates each of the memory regions after the data therein has been processed by the second processing pipeline. A blocking circuit enables the second processing pipeline to start processing a second sequence of tasks while the memory management circuit is still deallocating some of the memory regions allocated to the data portions of the first of said sequence of tasks, the blocking circuit preventing identifiers of the data portions of the second task being passed to the memory management circuit until the memory management circuit indicates that it has completed deallocating the memory regions allocated to all the data portions of the first task.
A method for generating an augmented reality image from first and second images, wherein at least a portion of at least one of the first and the second image is captured from a real scene, identifies a confidence region in which a confident determination as to which of the first and second image to render in that region of the augmented reality image can be made, and identifies an uncertainty region in which it is uncertain as to which of the first and second image to render in that region of the augmented reality image. At least one blending factor value in the uncertainty region is determined based upon a similarity between a first colour value in the uncertainty region and a second colour value in the confidence region, and an augmented reality image is generated by combining, in the uncertainty region, the first and second images using the at least one blending factor value.
Methods and primitive block generators for generating primitive blocks in a graphics processing system. The methods include receiving transformed position data for a current primitive, the transformed position data indicating a position of the current primitive in rendering space; determining a distance between the position of the current primitive and a position of a current primitive block based on the transformed position data for the current primitive; determining whether to add the current primitive to the current primitive block based on the distance and a fullness of the current primitive block; in response to determining that the current primitive is to be added to the current primitive block, adding the current primitive to the current primitive block; and in response to determining that the current primitive is not to be added to the current primitive block, flushing the current primitive block and adding the current primitive to a new current primitive block.
A ray tracing system determines whether a ray intersects a three-dimensional axis-aligned box by determining whether a minimum distance condition and a maximum distance condition are satisfied, wherein the determining comprises determining whether a single distance condition is satisfied. The determination is used to determine whether the ray intersects the axis-aligned box. A point on the ray is at a position O+Dt where O is a vector which represents an origin of the ray, and t represents a distance along the ray from the origin of the ray, and wherein D is a 3D vector defining a direction vector of the ray.
A ray tracing unit and method for processing a ray in a ray tracing system performs intersection testing for the ray by performing one or more intersection testing iterations. Each intersection testing iteration includes: (i) traversing an acceleration structure to identify the nearest intersection of the ray with a primitive that has not been identified as the nearest intersection in any previous intersection testing iterations for the ray; and (ii) if, based on a characteristic of the primitive, a traverse shader is to be executed in respect of the identified intersection: executing the traverse shader in respect of the identified intersection; and if the execution of the traverse shader determines that the ray does not intersect the primitive at the identified intersection, causing another intersection testing iteration to be performed. When the intersection testing for the ray is complete, an output shader is executed to process a result of the intersection testing for the ray.
A method of performing intersection testing in a ray tracing system, for a ray with respect to a set of two or more primitives. Each primitive is defined by an ordered set of edges, and each edge is defined by a respective pair of vertices. A set of distinct edges is determined for the set of primitives, each distinct edge being part of at least one primitive of the set of primitives and being defined by a different pair of vertices to the other distinct edges in the set, wherein every edge in the ordered sets of edges that define the set of primitives is represented by a distinct edge of the set of distinct edges. For each distinct edge in the set of distinct edges, an edge test is performed to determine which side of the distinct edge the ray passes on. For each primitive in the set of primitives, a result of the edge test is used for each distinct edge that defines that primitive to determine whether or not the ray intersects that primitive.
A method of processing a primitive as part of intersection testing in a ray tracing system, the primitive being defined by an ordered set of vertices. Data defining a direction and an origin of a ray to be tested against the primitive, and coordinate data for a set of vertices are received. The coordinate data for the set of vertices is projected into ray space using the ray data, wherein the ray space has two non-parallel axes that are transverse to the direction of the ray, wherein a ray-space frame of reference associated with the axes is centered at a point on the ray such that the ray is represented as that point on the axes in the ray space, and wherein the point is an origin of the ray space. Then, the signs of the coordinate data for the set of vertices are analysed to determine whether a non-intersection condition is fulfilled, wherein fulfilment of the non-intersection condition indicates that the ray does not intersect the primitive. In response to determining that the non-intersection condition is fulfilled, it is determined that the ray does not intersect the primitive.
An image of a 3-D scene is rendered by first rendering a noisy image at a first resolution. One or more guide channels at the first resolution and one or more corresponding guide channels at a second resolution are obtained. When the two resolutions are the same, the guide channels at the first resolution and the corresponding guide channels at the second resolution may be provided by a single set of guide channels. For each of a plurality of local neighborhoods, the parameters of a model that approximates the noisy image as a function of the one or more guide channels (at the first resolution) are calculated, and the calculated parameters are applied to the one or more guide channels (at the second resolution), to produce a denoised image at the second resolution. The one or more guide channels include at least one guide channel characterizing a spatial dependency of incident light on global lighting over the surface of one or more 3-D models in the scene.
A processing unit configured to perform parallel processing includes a parallel processing engine, the parallel processing engine including a plurality of processing instances configured to process instructions in parallel. Test instruction insertion logic identifies an idle cycle of the parallel processing engine and inserts a test instruction for processing during the idle cycle by each of the plurality of processing instances so as to generate a respective plurality of test outputs. Check logic compares a test output generated during the idle cycle by a first processing instance of the plurality of processing instances with a test output generated during the idle cycle by a second processing instance of the plurality of processing instances, and raises a fault signal if the compared test outputs do not match.
A first pixel in a first group of neighbouring pixels classified as acceptable or not acceptable with respect to a second pixel in a second group of neighbouring pixels. It is determined whether a difference between the first pixel and the second pixel is greater than a threshold difference, and in response to determining that the difference between the first pixel and the second pixel is greater than the threshold difference, the pixels in at least one of the first and second groups of neighbouring pixels are analysed to determine whether the difference is indicative of the first pixel being erroneous. The first pixel is classified as acceptable or not acceptable based on whether the difference is determined to be indicative of the first pixel being erroneous, and the classification of the first pixel is outputted.
A decoder for decoding texels of a p by q sub-block from a block of ASTC encoded texture data representing an n by m block of texels, where p≤n and q≤m. The decoder determines a position of a first texel of the sub-block with respect to the rows and/or columns of a weight grid comprising a first plurality of weights in a first plane; determines a position of a second texel of the sub-block with respect to the rows and/or columns of the weight grid; compares the positions of the first and second texels; extracts fewer than (p+1)(q+1) weights in response to determining that the positions of the first and second texels are between the upper-most and lower-most rows of a predetermined number of adjacent rows of the weight grid and/or the left-most and right-most columns of a predetermined number of adjacent columns of the weight grid; and decodes the texels in dependence on the extracted weights.
H04N 19/44 - Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
H04N 19/167 - Position within a video image, e.g. region of interest [ROI]
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
60.
Reduced Acceleration Structures for Ray Tracing Systems
Ray tracing units, processing modules and methods are described for generating one or more reduced acceleration structures to be used for intersection testing in a ray tracing system for processing a 3D scene. Nodes of the reduced acceleration structure(s) are determined, wherein a reduced acceleration structure represents a subset of the 3D scene. The reduced acceleration structure(s) are stored for use in intersection testing. Since the reduced acceleration structures represent a subset of the scene (rather than the whole scene) the memory usage for storing the acceleration structure is reduced, and the latency in the traversal of the acceleration structure is reduced.
A computer-implemented method and a processing module for identifying that a fault has occurred in a finite-state machine (FSM). It is determined whether a set of one or more transitions that have occurred between states of the FSM is allowable. In response to determining that the set of one or more transitions is not allowable, it is identified that a fault has occurred in the FSM.
A method of processing instructions at a processing unit having a parallel processing engine. During a mission cycle, a first set of mission operand values is processed in accordance with a mission instruction at a first processing instance to generate a first mission output. In parallel, a second set of mission operand values is processed in accordance with the mission instruction at a second processing instance to generate a second mission output. During a test cycle, a first set of test operand values is processed in accordance with a test instruction at the first processing instance to generate a first test output, and in parallel, a second set of test operand values is processed in accordance with the test instruction at the second processing instance to generate a second test output, where the first set of test operand values is the same as the second set of test operand values. The first test output and the second test output are compared and a fault signal is raised if the compared test outputs do not match.
A method of rounding a floating-point number in an Extended Exponent Range (EER), that would be a denormal floating-point number represented in an Unextended Exponent Range (UER) includes the steps of receiving, at an arithmetic unit, a plurality of input numbers in the EER representation, each input number comprising a sign bit (si), exponent bits (ei) and mantissa bits (mi); performing an arithmetic operation to produce an output number in the EER representation comprising a sign bit (sa), an exponent bits (ea) and mantissa bits (ma); constructing a rounding mask based on the exponent bits (ea) computed by the arithmetic operation; and applying the rounding mask to the output number in the EER representation to round the output number to correct position as if rounding in the UER representation.
A graphics processing unit for graphics data in a tile-based rendering space includes a plurality of processing cores, cost indication logic which obtains a cost indication parameter from a set of tiles, similarity indication logic which obtains similarity indications between sets of tiles of the rendering space, and scheduling logic that assigns the sets of tiles to the processing cores for rendering in dependence on the cost indications and the similarity indications. The similarity indication logic is configured to assign a single similarity indication to each of a plurality of sets of one or more tiles, the similarity indication assigned to each set of one or more tiles being indicative of a level of similarity between that set of one or more tiles and another set of one or more tiles specified according to a spatial order of the tiles within the rendering space.
A method of scheduling tasks in a processor comprises receiving a plurality of tasks that are ready to be executed, i.e. all their dependencies have been met and all the resources required to execute the task are available, and adding the received tasks to a task queue (or “task pool”). The number of tasks that are executing is monitored and in response to determining that an additional task can be executed by the processor, a task is selected from the task pool based at least in part on a comparison of indications of resources used by tasks being executed and indications of resources used by individual tasks in the task pool and the selected task is then sent for execution.
A graphics processing system for generating a rendering output includes geometry processing logic having first transformation logic configured to transform a plurality of untransformed primitives into a plurality of transformed primitives, the first transformation logic configured to implement one or more expansion transformation stages which generate one or more sub-primitives; a primitive block generator configured to divide the plurality of transformed primitives into a plurality of groups; and generate an untransformed primitive block for each group comprising (i) information identifying the untransformed primitives related to the transformed primitives in the group; and (ii) an expansion transformation stage mask for at least one or more expansion transformation stages that indicates the sub-primitives generated for the untransformed primitives in that untransformed primitive block used in generating the rendering output. Rasterization logic includes second transformation logic configured to re-transform the plurality of untransformed primitives into the plurality of transformed primitives on an untransformed primitive block-basis in accordance with the expansion transformation stage mask for the one or more expansion transformation stages; and logic configured to render the transformed primitives to generate the rendering output.
A control stream decoder decodes a control stream for a tile group comprising at least two tiles of a rendering space. A primitive block entry analyser received a primitive block entry of the control stream and identifies a location in memory of a control data block for a corresponding primitive block. For the received primitive block entry, in response to determining that a current tile is a valid tile for the corresponding primitive block, the control data block for the corresponding primitive block is retrieved from the identified location in memory. An address of the corresponding primitive block in memory is identified from the control data block and primitives of that primitive block relevant for rendering the current tile, and information identifying the address of the corresponding primitive block and the primitives of that primitive block relevant for rendering the current tile is outputted.
An image processing method and an image processing unit for performing image processing determines a set of one or more filtered pixel values, wherein the one or more filtered pixel values represent a result of processing image data using a set of one or more filtering functions. A total covariance of the set of one or more filtering functions is identified. A refinement filtering function is applied to the set of one or more filtered pixel values to determine a set of one or more refined pixel values, wherein the refinement filtering function has a covariance that is determined based on the total covariance of the set of one or more filtering functions.
In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.
Aspects relate to tracing rays in 3-D scenes that comprise objects that are defined by or with implicit geometry. In an example, a trapping element defines a portion of 3-D space in which implicit geometry exist. When a ray is found to intersect a trapping element, a trapping element procedure is executed. The trapping element procedure may comprise marching a ray through a 3-D volume and evaluating a function that defines the implicit geometry for each current 3-D position of the ray. An intersection detected with the implicit geometry may be found concurrently with intersections for the same ray with explicitly-defined geometry, and data describing these intersections may be stored with the ray and resolved.
3-D rendering systems include a rasterization section that can fetch untransformed geometry, transform geometry and cache data for transformed geometry in a memory. As an example, the rasterization section can transform the geometry into screen space. The geometry can include one or more of static geometry and dynamic geometry. The rasterization section can query the cache for presence of data pertaining to a specific element or elements of geometry, and use that data from the cache, if present, and otherwise perform the transformation again, for actions such as hidden surface removal. The rasterization section can receive, from a geometry processing section, tiled geometry lists and perform the hidden surface removal for pixels within respective tiles to which those lists pertain.
Graphics processing system configured to perform ray tracing. Rays are bundled together and processed together. When differential data is needed by a shader, the data of a true ray in the bundle can be used rather than processing separate tracker rays.
A binary logic circuit for determining y=x mod (2m−1), where x is an n-bit integer, y is an m-bit integer, and n>m, includes reduction logic configured to reduce x to a sum of a first m-bit integer β and a second m-bit integer γ; and addition logic configured to calculate an addition output represented by the m least significant bits of the following sum right-shifted by m: a first binary value of length 2m, the m most significant bits and the m least significant bits each being the string of bit values represented by β; a second binary value of length 2m, the m most significant bits and the m least significant bits each being the string of bit values represented by γ; and the binary value 1.
G06F 7/72 - Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radixComputing devices using combinations of denominational and non-denominational quantity representations using residue arithmetic
74.
Synchronising Devices Using Clock Signal Delay Comparison
A time difference between an occurrence of a first event and an occurrence of a second event is estimated. A first time marker indicating the occurrence of the first event and a second time marker indicating the occurrence of the second event are received, wherein at least one event is one of playing a media frame or receiving a beacon. A plurality of delayed versions of the first time marker are provided, each being delayed by a different amount of delay to the other delayed versions. Each of the delayed versions of the first time marker are compared with the second time marker to identify which of the delayed versions of the first time marker is the closest temporally matching time marker to the second time marker. The time difference between the first and second time markers is estimated in dependence on the identified delayed version.
H04L 7/033 - Speed or phase control by the received code signals, the signals containing no special synchronisation information using the transitions of the received signal to control the phase of the synchronising-signal- generating means, e.g. using a phase-locked loop
H03K 5/15 - Arrangements in which pulses are delivered at different times at several outputs, i.e. pulse distributors
H04L 7/00 - Arrangements for synchronising receiver with transmitter
75.
IMMERSIVE VIRTUAL REALITY SYSTEM USING RAY TRACING
Ray tracing, and more generally, graphics operations taking place in a 3-D scene, involve a plurality of constituent graphics operations. Responsibility for executing these operations can be distributed among different sets of computation units. The sets of computation units each can execute a set of instructions on a parallelized set of input data elements and produce results. These results can be that the data elements can be categorized into different subsets, where each subset requires different processing as a next step. The data elements of these different subsets can be coalesced so that they are contiguous in a results set. The results set can be used to schedule additional computation, and if there are empty locations of a scheduling vector (after accounting for the members of a given subset), then those empty locations can be filled with other data elements that require the same further processing as that subset.
Aspects include a multistage collector to receive outputs from plural processing elements. Processing elements may comprise (each or collectively) a plurality of clusters, with one or more ALUs that may perform SIMD operations on a data vector and produce outputs according to the instruction stream being used to configure the ALU(s). The multistage collector includes substituent components each with at least one input queue, a memory, a packing unit, and an output queue; these components can be sized to process groups of input elements of a given size, and can have multiple input queues and a single output queue. Some components couple to receive outputs from the ALUs and others receive outputs from other components. Ultimately, the multistage collector can output groupings of input elements. Each grouping of elements (e.g., at input queues, or stored in the memories of component) can be formed based on matching of index elements.
Apparatus identifies a set of M output memory addresses from a larger set of N input memory addresses containing a non-unique memory address. A comparator block performs comparisons of memory addresses from a set of N input memory addresses to generate a binary classification dataset that identifies a subset of addresses, where each address in the subset identified by the binary classification dataset is unique within that subset. Combination logic units receive a selection of bits of the binary classification dataset and sort its received selection of bits into an intermediary binary string in which the bits are ordered into a first group identifying addresses belonging to the identified subset, and a second group identifying addresses not belonging to the identified subset. Output generating logic selects between bits belonging to different intermediary binary strings to generate a binary output identifying a set of output memory addresses containing at least one address in the identified subset.
A memory interface for interfacing between a memory bus and a cache memory. A plurality of bus interfaces are configured to transfer data between the memory bus and the cache memory, and a plurality of snoop processors are configured to receive snoop requests from the memory bus. Each snoop processor is associated with a respective bus interface and each snoop processor is configured, on receiving a snoop request, to determine whether the snoop request relates to the bus interface associated with that snoop processor and to process the snoop request in dependence on that determination.
Graphics processing systems can include lighting effects when rendering images. “Light probes” are directional representations of lighting at particular probe positions in the space of a scene which is being rendered. Light probes can be determined iteratively, which can allow them to be determined dynamically, in real-time over a sequence of frames. Once the light probes have been determined for a frame then the lighting at a pixel can be determined based on the lighting at the nearby light probe positions. Pixels can then be shaded based on the lighting determined for the pixel positions.
Input data for a convolutional neural network (CNN) is stored in a buffer comprising a plurality of banks, by receiving input data comprising input data values to be processed in the CNN, determining addresses in the buffer in which the received input data values are to be stored, keeping a cursor for one or more salient positions to reduce arithmetic performed to determine the addresses in the buffer in which the received input data values are to be stored, and storing the received input data values at the determined addresses in the buffer.
Out-of-bounds recovery circuits are configured to detect an out-of-bounds violation in an electronic device, and cause the electronic device to transition to a predetermined safe state when an out-of-bounds violation is detected. The out-of-bounds recovery circuits include detection logic configured to detect that an out-of-bounds violation has occurred when a processing element of the electronic device has fetched an instruction from an unallowable memory address range for the current operating state of the electronic device; and transition logic configured to cause the electronic device to transition to a predetermined safe state when an out-of-bounds violation has been detected by the detection logic.
G06F 11/07 - Responding to the occurrence of a fault, e.g. fault tolerance
G06F 21/52 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure
Rendering systems that can use combinations of rasterization rendering processes and ray tracing rendering processes are disclosed. In some implementations, these systems perform a rasterization pass to identify visible surfaces of pixels in an image. Some implementations may begin shading processes for visible surfaces, before the geometry is entirely processed, in which rays are emitted. Rays can be culled at various points during processing, based on determining whether the surface from which the ray was emitted is still visible. Rendering systems may implement rendering effects as disclosed.
Systems and methods of geometry processing, for rasterization and ray tracing processes provide for pre-processing of source geometry, such as by tessellating or other procedural modification of source geometry, to produce final geometry on which a rendering will be based. An acceleration structure (or portion thereof) for use during ray tracing is defined based on the final geometry. Only coarse-grained elements of the acceleration structure may be produced or retained, and a fine-grained structure within a particular coarse-grained element may be Produced in response to a collection of rays being ready for traversal within the coarse grained element. Final geometry can be recreated in response to demand from a rasterization engine, and from ray intersection units that require such geometry for intersection testing with primitives. Geometry at different resolutions can be generated to respond to demands from different rendering components.
A windowed operation is implemented in at least three traversed dimensions. The windowed operation applies a window having at least three dimensions to data having at least three traversed dimensions, with shifts of the window in all three traversed dimensions. Two dimensions of the at least three traversed dimensions are selected, and the windowed operation is mapped to a plurality of constituent 2-D windowed operations in the selected two dimensions, the 2-D windowed operations applying a slice of the window to a slice of the data, with shifts of the slice of the window in only two dimensions. Each of the plurality of 2-D windowed operations is implemented by at least one hardware accelerator, each 2-D windowed operation producing a respective partial result, and the partial results are assembled to produce the result of the windowed operation.
Graphics processing systems and methods provide soft shadowing effects into rendered images. This is achieved in a simple manner which can be implemented in real-time without incurring high processing costs so it is suitable for implementation in low-cost devices. Rays are cast from positions on visible surfaces corresponding to pixel positions towards the center of a light, and occlusions of the rays are determined. The results of these determinations are used to apply soft shadows to the rendered pixel values.
A method of synchronizing a group of scheduled tasks within a parallel processing unit into a known state is described. The method uses a synchronization instruction in a scheduled task which triggers, in response to decoding of the instruction, an instruction decoder to place the scheduled task into a non-active state and forward the decoded synchronization instruction to an atomic ALU for execution. When the atomic ALU executes the decoded synchronization instruction, the atomic ALU performs an operation and check on data assigned to the group ID of the scheduled task and if the check is passed, all scheduled tasks having the particular group ID are removed from the non-active state.
G06F 9/48 - Program initiatingProgram switching, e.g. by interrupt
G06F 7/575 - Basic arithmetic logic units, i.e. devices selectable to perform either addition, subtraction or one of several logical operations, using, at least partially, the same circuitry
G06F 9/30 - Arrangements for executing machine instructions, e.g. instruction decode
G06F 9/52 - Program synchronisationMutual exclusion, e.g. by means of semaphores
G06F 15/173 - Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star or snowflake
G06F 15/76 - Architectures of general purpose stored program computers
87.
Graphics Processor with Non-Blocking Concurrent Architecture
In some aspects, systems and methods provide for forming groupings of a plurality of independently-specified computation workloads, such as graphics processing workloads, and in a specific example, ray tracing workloads. The workloads include a scheduling key, which is one basis on which the groupings can be formed. Workloads grouped together can all execute from the same source of instructions, on one or more different private data elements. Such workloads can recursively instantiate other workloads that reference the same private data elements. In some examples, the scheduling key can be used to identify a data element to be used by all the workloads of a grouping. Memory conflicts to private data elements are handled through scheduling of non-conflicted workloads or specific instructions and/or deferring conflicted workloads instead of locking memory locations.
G06T 15/00 - 3D [Three Dimensional] image rendering
G06F 9/50 - Allocation of resources, e.g. of the central processing unit [CPU]
G06F 9/52 - Program synchronisationMutual exclusion, e.g. by means of semaphores
G06F 15/80 - Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
G06T 1/20 - Processor architecturesProcessor configuration, e.g. pipelining
Aspects comprise systems implementing 3-D graphics processing functionality in a multiprocessing system. Control flow structures are used in scheduling instances of computation in the multiprocessing system, where different points in the control flow structure serve as points where deferral of some instances of computation can be performed in favor of scheduling other instances of computation. In some examples, the control flow structure identifies particular tasks, such as intersection testing of a particular portion of an acceleration structure, and a particular element of shading code. In some examples, the aspects are used in 3-D graphics processing systems that can perform ray tracing based rendering.
During tracing of a primary ray in a 3-D space (e.g., a 3-D scene in graphics rendering), a ray is found to intersect a primitive (e.g., a triangle) located in the 3-D space. Secondary ray(s) may be generated for a variety of purposes. For example, occlusion rays may be generated to test occlusion of a point of intersection between the primary ray and primitive is illuminated by any of the light(s). An origin for each secondary ray can be modified from the intersection point based on characteristics of the primitive intersected. For example, an offset from the intersection point can be calculated using barycentric coordinates of the intersection point and interpolation of one or more parameters associated with vertices defining the primitive. These parameters may include a size of the primitive and differences between a geometric normal for the primitive and a respective additional vector supplied with each vertex.
A method and apparatus are provided for tessellating patches of surfaces in a tile based three dimensional computer graphics rendering system. For each tile in an image a per tile list of primitive indices is derived for tessellated primitives which make up a patch. Hidden surface removal is then performed on the patch and any domain points which remain after hidden surface removal are derived. The primitives are then shaded for display.
A method of processing rays in a ray tracing system allocates a block of memory for a task on a per-task basis. Processing rays in the task causes at least one child ray to be emitted such that intermediate data for the task is written to the block of memory, the intermediate data being written to and read from the block of memory in one or more finite-sized data bursts. Processing of the task is suspended, and when the task is ready to resume, the intermediate data or updated intermediate data for the task is read from the block of memory, and the processing of the task is resumed.
A method of configuring a hardware implementation of a Convolutional Neural Network (CNN), the method comprising: determining, for each of a plurality of layers of the CNN, a first number format for representing weight values in the layer based upon a distribution of weight values for the layer, the first number format comprising a first integer of a first predetermined bit-length and a first exponent value that is fixed for the layer; determining, for each of a plurality of layers of the CNN, a second number format for representing data values in the layer based upon a distribution of expected data values for the layer, the second number format comprising a second integer of a second predetermined bit-length and a second exponent value that is fixed for the layer; and storing the determined number formats for use in configuring the hardware implementation of a CNN.
G06N 3/063 - Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
G06F 7/00 - Methods or arrangements for processing data by operating upon the order or content of the data handled
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state deviceMethods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
G06N 3/04 - Architecture, e.g. interconnection topology
A graphics processing system renders primitives using a rendering space which is sub-divided into a plurality of regions. Geometry processing logic performs a geometry processing phase for a current render wherein for each region in the plurality of regions it is determined, for each of a plurality of primitives which are present in the region, whether the primitive totally covers the region, and total coverage data is stored indicating which of the primitives which are present in the region totally cover the region. Rendering logic performs, after the geometry processing logic has completed the geometry processing phase for the current render, a rendering phase for each of the regions of the plurality of regions on a region-by-region basis for the current render using the total coverage data for the region.
A plurality of work items are processed through a processing pipeline comprising a plurality of stages in processing logic. The processing of a work item includes: (i) reading data in accordance with a memory address associated with the work item, (ii) updating the read data, and (iii) writing the updated data in accordance with the memory address associated with the work item. The method includes processing a first work item and a second work item through the processing pipeline, wherein the processing of the first work item through the pipeline is initiated earlier than the processing of the second work item, and where it is determined that the first and second work items are associated with the same memory address, first updated data of the first work item is written to a register in the processing logic, and the processing of the second work item comprises reading the first updated data from the register instead of reading data from the memory.
Apparatus includes hardware logic arranged to normalise an n-bit input number. The hardware logic comprises at least a first hardware logic stage, an intermediate hardware logic stage and a final hardware logic stage. Each stage comprises a left shifting logic element, the first and intermediate stages each also comprise a plurality of OR-reduction logic elements and the intermediate and final stages each also comprise one or more multiplexers. The OR-reduction logic elements operate on different subsets of bits from the number input to the particular stage. In the intermediate and final hardware logic stages, a first of the multiplexers selects an OR-reduction result received from a previous hardware logic stage and the left shifting logic element is arranged to perform left shifting on the updated binary number received from an immediately previous hardware logic stage dependent upon the selected OR-reduction result.
G06F 5/01 - Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising
G06F 7/499 - Denomination or exception handling, e.g. rounding or overflow
96.
ASSESSING PERFORMANCE OF A HARDWARE DESIGN USING FORMAL EVALUATION LOGIC
A formal verification tool is used to assess the performance of a hardware design for an integrated circuit to complete a set of tasks. The tool monitors one or more control signals and/or data signals of an instantiation of the hardware design to identify start and completion of a symbolic task by the instantiation of the hardware design, the symbolic task representing the set of tasks. A number of cycles between the start and the completion of the symbolic task is counted, and it is verified that one or more formal properties related to the counted number of cycles are true for the hardware design. An indication of whether or not each of the one or more formal properties was successfully verified is outputted, the indication providing an exhaustive assessment of the performance of the instantiation of the hardware design in completing the set of tasks.
A reduced noise image can be formed from a set of images. One of the images of the set can be selected to be a reference image and other images of the set are transformed such that they are better aligned with the reference image. A measure of the alignment of each image with the reference image is determined. At least some of the transformed images can then be combined using weights which depend on the alignment of the transformed image with the reference image to thereby form the reduced noise image. By weighting the images according to their alignment with the reference image the effects of misalignment between the images in the combined image are reduced. Furthermore, motion correction may be applied to the reduced noise image.
Ray tracing systems have computation units (“RACs”) adapted to perform ray tracing operations (e.g. intersection testing). There are multiple RACs. A centralized packet unit controls the allocation and testing of rays by the RACs. This allows RACs to be implemented without Content Addressable Memories (CAMs) which are expensive to implement, but the functionality of CAMs can still be achieved by implemented them in the centralized controller.
A method of improving texture fetching by a texturing/shading unit in a GPU pipeline by performing efficient convolution operations, includes receiving a shader and determining whether the shader is a kernel shader. In response to receiving a kernel shader, the kernel shader is modified to perform a collective fetch of all texels for a group of output pixels instead of performing independent fetches of texels for each output pixel in the group of output pixels.
Task building logic builds a plurality of tasks each comprising a group of rays. When a new ray is received into ray storage, if an existing task exists for the new ray, the new ray is added to an existing respective list. The task building logic indicates when any of the tasks is ready for scheduling, and task scheduling logic identifies a task ready for scheduling based on the indication from the task building logic, and in response traverses the respective list in order to schedule at least some of the rays of the respective task for processing in parallel.