Pixel values are determined at respective upsampled pixel locations for a current frame of a sequence of frames. Depth values are obtained for locations of pixels of a reference frame of the sequence of frames. For each of the upsampled pixel locations: (a) a depth value of the current frame is obtained; (b) a motion vector is obtained to indicate motion between the reference frame and the current frame; (c) the motion vector is used to identify one or more of the pixels of the reference frame; (d) a weight is determined for each of the identified pixels of the reference frame in dependence on: (i) the depth value of the current frame for the upsampled pixel location, and (ii) the depth value for the location of the identified pixel of the reference frame; and (e) the pixel value for the upsampled pixel location is determined using the determined weight for each of the identified pixels.
G06V 10/75 - Organisation de procédés de l’appariement, p. ex. comparaisons simultanées ou séquentielles des caractéristiques d’images ou de vidéosApproches-approximative-fine, p. ex. approches multi-échellesAppariement de motifs d’image ou de vidéoMesures de proximité dans les espaces de caractéristiques utilisant l’analyse de contexteSélection des dictionnaires
A computer-implemented method for performing a vector bitwise rotation, wherein a processing system comprises a byte-wise anything-to-anything mux and one or more bitwise right shifters, wherein the byte-wise anything-to-anything mux includes a plurality of byte-sized inputs and a plurality of byte-sized outputs, each input being associated with a respective input position and each output being associated with a respective output position. A combination of a byte-wise anything-to-anything mux and one or more bitwise shifts is used to perform vector bitwise rotations, with even and odd elements of the vector operated on separately.
A neural network block includes a plurality of layers arranged sequentially. Each layer includes an expansion layer having a first number of input channels and a second number of output channels, where the second number is larger than the first number, a compression layer, having a third number of input channels and a fourth number of output channels, wherein the fourth number is smaller than the third number, and a grouped convolution layer.
An indication of one or more weighting parameters is determined for use in applying upsampling to input pixel values representing an image region to determine a block of one or more upsampled pixel values. A horizontal edge filter determines a first filtered value. A vertical edge filter determines a second filtered value. A horizontal line filter determines a third filtered value. A vertical line filter determines a fourth filtered value. The first, second, third and fourth filtered values are used to determine the indication of one or more weighting parameters, wherein the one or more weighting parameters are indicative of relative horizontal and vertical variation of the input pixel values within the image region. The determined indication of the one or more weighting parameters is output for use in applying upsampling to the input pixel values representing the image region to determine a block of one or more upsampled pixel values.
G06T 3/4046 - Changement d'échelle d’images complètes ou de parties d’image, p. ex. agrandissement ou rétrécissement utilisant des réseaux neuronaux
G06T 3/403 - Changement d'échelle guidé par les bordsChangement d'échelle basé sur les bords
G06T 3/4053 - Changement d'échelle d’images complètes ou de parties d’image, p. ex. agrandissement ou rétrécissement basé sur la super-résolution, c.-à-d. où la résolution de l’image obtenue est plus élevée que la résolution du capteur
A graph attention network including a graph attention network layer arranged to perform an operation in dependence on an adjacency matrix mask having a plurality of elements representative of connected graph nodes is compressed by rearranging the rows and/or columns of the adjacency matrix mask so as to gather the plurality of elements representative of connected graph nodes into one or more adjacency sub-matrix masks, the one or more adjacency sub-matrix masks having a greater number of elements representative of connected graph nodes per total number of elements of the one or more adjacency sub-matrix masks than the number of elements representative of connected graph nodes per total number of elements of the adjacency matrix mask. A compressed graph attention network comprising a compressed graph attention network layer arranged to perform a compressed operation in dependence on the one or more adjacency sub-matrix masks is outputted.
An attention layer of an attention-based neural network is arranged to implement an attention function in dependence on a Key matrix, a Query matrix and a Value matrix. The attention layer uses a Key weight matrix to determine the Key matrix, a Query weight matrix to determine the Query matrix, and Value weight matrix to determine the Value matrix. A compressed attention-based neural network is outputted which comprises a compressed attention layer arranged to implement the attention function by performing a compressed operation in dependence on: (i) a set of one or more Key weight sub-matrices, (ii) a set of one or more Query weight sub-matrices, and (iii) a set of one or more Value weight sub-matrices.
A graphics processing unit has a shader core including a main processing portion and a sub-processor. The main processing portion comprises a scheduler, an instruction cache, a plurality of registers and a plurality of arithmetic logic units (ALUs). The sub-processor operates independently of the main processing portion and comprises a burst scheduler, a plurality of registers and a plurality of ALUs. The sub-processor is arranged to execute bursts, wherein a burst comprises at least one group of instructions that can be executed atomically and which are extracted from a program. The main processing portion executes a modified version of the program, wherein the modified program is created from the program by replacing the instructions in a burst with an instruction that triggers the execution of the burst. The registers in the sub-processor are used to store one or more sources and/or results for bursts that are being executed by the sub-processor.
A method of converting 10-bit pixel data (e.g. 10:10:10:2 data) into 8-bit pixel data involves converting the 10-bit values to 7-bits or 8-bits and generating error values for each of the converted values. Two of the 8-bit output channels comprise a combination of a converted 7-bit value and one of the bits from the fourth input channel. A third 8-bit output channel comprises the converted 8-bit value and the fourth 8-bit output channel comprises the error values. In various examples, the bits of the error values may be interleaved when they are packed into the fourth output channel.
H04N 19/132 - Échantillonnage, masquage ou troncature d’unités de codage, p. ex. ré-échantillonnage adaptatif, saut de trames, interpolation de trames ou masquage de coefficients haute fréquence de transformée
G06T 1/20 - Architectures de processeursConfiguration de processeurs p. ex. configuration en pipeline
H04N 19/176 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’unité de codage, c.-à-d. la partie structurelle ou sémantique du signal vidéo étant l’objet ou le sujet du codage adaptatif l’unité étant une zone de l'image, p. ex. un objet la zone étant un bloc, p. ex. un macrobloc
H04N 19/42 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques caractérisés par les détails de mise en œuvre ou le matériel spécialement adapté à la compression ou à la décompression vidéo, p. ex. la mise en œuvre de logiciels spécialisés
H04N 19/89 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le pré-traitement ou le post-traitement spécialement adaptés pour la compression vidéo mettant en œuvre des procédés ou des dispositions de détection d'erreurs de transmission au niveau du décodeur
9.
Anisotropic Texture Filtering by Combining Results of Isotropic Filtering at a Plurality of Sampling Points with a Gaussian Filter
A method of performing anisotropic texture filtering includes generating one or more parameters describing an elliptical footprint in texture space; performing isotropic filtering at each of a plurality of sampling points along a major axis of the elliptical footprint, wherein a spacing between adjacent sampling points of the plurality of sampling points is proportional to √{square root over (1−η−2)} units, wherein η is a ratio of a major radius of an ellipse to be sampled and a minor radius of the ellipse to be sampled, wherein the ellipse to be sampled is based on the elliptical footprint; and combining results of the isotropic filtering at the plurality of sampling points with a Gaussian filter to generate at least a portion of a filter result.
Within a graphical processing system a plurality of different shading programs may be executed by a single processor over multiple threads. For each shading program a plurality of registers are used to store data for the respective shading program. Thus, for multiple shading programs executed over multiple threads a plurality of registers are allocated to each program, or thread, being executed. However, there are a limited number of registers available and therefore efficient allocation of the registers optimises performance. Often an unnecessary number of registers is allocated to each shading program but the present invention provides a method of allocating the correct number of registers based on the size of the fragments being shaded.
Intersection testing in a ray tracing system is performed for a ray with respect to a primitive. An intersection attribute value is determined for a primary sample of the ray relating to an intersection between the ray and the primitive in a ray coordinate system. The ray coordinate system has two non-parallel axes that are both transverse to the direction of the ray, and an origin of the ray coordinate system is on the ray. For one or both of the two non-parallel axes of the ray coordinate system, data is determined indicating a change to the intersection attributes in a direction parallel to that axis. The intersection between the ray and the primitive is processed using the determined value of the intersection attributes for the primary sample of the ray and the determined data indicating a change to the intersection attributes in the directions parallel to the two non-parallel axes.
Shader programs may include conditional portions, executed only in response to a specific condition being met. The use of conditional portions can require different numbers of registers. Thus, the use of conditional portions potentially results in the over-allocation of registers. Accordingly, there is provided a method of rendering in a graphics processing system using a shader program having a conditional section applied only in response to fulfilment of a condition, the method comprising compiling the program, by a compiler, the compiling comprising identifying a conditional section reading, by a resource allocator, a constant which determines the result of the condition, determining, by the resource allocator, whether the condition is met or not met and allocating, by the resource allocator, a number of registers.
A method and a processing module are provided for applying upsampling to input pixels representing an image region to determine a block of upsampled pixels. The input pixels have locations corresponding to a repeating quincunx arrangement of upsampled pixel locations. The input pixels are analysed to determine one or more weighting parameters, the one or more weighting parameters being indicative of a directionality of filtering to be applied when upsampling is applied to the input pixels within the image region. One or more of the upsampled pixels of the block of upsampled pixels are determined in accordance with the directionality of filtering indicated by the determined one or more weighting parameters.
G06T 3/4053 - Changement d'échelle d’images complètes ou de parties d’image, p. ex. agrandissement ou rétrécissement basé sur la super-résolution, c.-à-d. où la résolution de l’image obtenue est plus élevée que la résolution du capteur
G06T 3/4046 - Changement d'échelle d’images complètes ou de parties d’image, p. ex. agrandissement ou rétrécissement utilisant des réseaux neuronaux
G06T 5/10 - Amélioration ou restauration d'image utilisant le filtrage dans le domaine non spatial
G06T 5/60 - Amélioration ou restauration d'image utilisant l’apprentissage automatique, p. ex. les réseaux neuronaux
G06T 5/73 - Élimination des flousAccentuation de la netteté
A method of managing resources in a GPU comprises allocating a region of off-chip storage to a geometry task on creation of the geometry task and receiving, at an on-chip store in the GPU, a memory allocation request for the geometry task from a shader core in the GPU, wherein the memory allocation request is received after generation of geometry data for the geometry task. In response to receiving the memory allocation request, the method comprises determining, by the on-chip store, whether to allocate a region of the on-chip store to the geometry task. In response to allocating the region of the on-chip store, geometry data for the geometry task is written to the on-chip store and in response to determining not to allocate the region of the on-chip store, the geometry data is written to the allocated region of off-chip storage.
A method of operating a GPU uses input attributes in executing a first part of a geometry task fetched by a shader core. The first part of the task executes a first part of a shader to calculate position data for each instance of the task. The first part of the task is executed to output the position data for each instance of the task. The task is then descheduled until cull results are received for each instance. In response to receiving cull results indicating at least one remaining instance in the task, input attributes used in executing a second part of a task are fetched. The second part of the task executes a second part of a shader to calculate varyings for each remaining instance. The second part of the task is executed and the varyings for each remaining instance are output.
An elementwise operations hardware accelerator for use in a neural network accelerator. The elementwise operations hardware accelerator comprises one or more processing pipelines and a control module. Each processing pipeline includes: an arithmetic logic unit module comprising a plurality of different arithmetic logic unit blocks, each arithmetic logic unit block of the plurality of arithmetic logic unit blocks configured to receive one or more inputs, selectively perform one or more elementwise operations on the one or more inputs, and output a result of the one or more elementwise operations; and an interconnection module configured to receive elements of one or more input tensors and selectively provide the elements of at least one of the one or more input tensors to an arithmetic logic unit block of the plurality of arithmetic logic unit blocks as an input; The control module is configured to receive a set of commands identifying an arithmetic logic unit block of the plurality of arithmetic logic unit blocks and one or more elementwise operations to be performed by the identified arithmetic logic unit block and control the operation of the one or more processing pipelines to cause the identified arithmetic logic unit block to perform the identified one or more elementwise operations.
G06F 7/57 - Unités arithmétiques et logiques [UAL], c.-à-d. dispositions ou dispositifs pour accomplir plusieurs des opérations couvertes par les groupes ou pour accomplir des opérations logiques
G06F 7/483 - Calculs avec des nombres représentés par une combinaison non linéaire de nombres codés, p. ex. nombres rationnels, système de numération logarithmique ou nombres à virgule flottante
A compressed attention-based neural network comprises a compressed attention layer implementing an attention function. The compressed attention layer rearranges and partitions an embedded tensor to form embedded sub-matrices. The compressed attention layer applies Key weight sub-matrices to the embedded sub-matrices, and concatenates the results to the respective embedded sub-matrices to determine a Key matrix. The compressed attention layer applies Query weight sub-matrices to the embedded sub-matrices and concatenates the results to determine a Query matrix. The compressed attention layer applies a set of one or more Value weight sub-matrices to the respective one or more embedded sub-matrices, and concatenates the results of applying the one or more Value weight sub-matrices to the respective one or more embedded sub-matrices, to determine a Value matrix. The compressed attention layer implements the attention function using the determined Key matrix, the determined Query matrix and the determined Value matrix.
Segment load operations are performed by processing data through an anything-to-anything mux, and sections writing elements to respective storage locations based on corresponding indices of the elements and the storage locations. Once all of the elements are loaded into the correct storage location, each location is read again with the elements of that storage location being sent through the mux, arranged) into the correct order, and written back to the same register.
A hardware design for a main component is verified, the main component being representable as a hierarchical set of components comprising parent components which each comprise leaf components in the hierarchical set. For each of the parent components it is verified that an instantiation of an abstracted hardware design for the parent component generates an expected output transaction in response to each of a plurality of test input transactions. The abstracted hardware design comprises, for each leaf component of the parent component, a corresponding abstracted component that is configured to, for a specific input transaction to the leaf component, produce a specific output transaction with a causal deterministic relationship to the specific input transaction, wherein a formal verification tool is configured to select the specific input transaction and the specific output transaction pair to be each possible valid input transaction and valid output transaction pair for the leaf component.
G06F 30/33 - Vérification de la conception, p. ex. simulation fonctionnelle ou vérification du modèle
G01R 31/3183 - Génération de signaux d'entrée de test, p. ex. vecteurs, formes ou séquences de test
G06F 30/3308 - Vérification de la conception, p. ex. simulation fonctionnelle ou vérification du modèle par simulation
G06F 30/3323 - Vérification de la conception, p. ex. simulation fonctionnelle ou vérification du modèle utilisant des méthodes formelles, p. ex. vérification de l’équivalence ou vérification des propriétés
Methods of arbitrating between requestors and a shared resource wherein for each processing cycle a plurality of select signals are generated and then used by decision nodes in a binary decision tree to select a requestor. The select signals are generated using valid bits and priority bits. Each valid bit corresponds to one of the requestors and indicates whether, in the processing cycle, the requestor is requesting access to the shared resource. Each priority bit corresponds one of the requestors and indicates whether, in the processing cycle, the requestor has priority. Corresponding valid bit and priority bits are combined in an AND logic element to generate a valid_and_priority bit for each requestor. Pair-wise OR-reduction is then performed on both the valid bits and the valid_and_priority bits to generate additional valid bits and valid_and_priority bits for sets of requestors and these are then used to generate the select signal.
A method of managing resources in a graphics processing pipeline includes, in response to selecting a task for execution within a texture/shading unit, allocating to the task both a static allocation of temporary registers for the entire task and a dynamic allocation of temporary registers. The dynamic allocation comprises temporary registers used by a first phase of the task only and the static allocation of temporary registers comprises any temporary registers that are used by the program and are live at a boundary between two phases. When the task subsequently reaches a boundary between two phases, the dynamic allocation of temporary registers are freed and a new dynamic allocation of temporary registers for a next phase of the task is allocated to the task.
A graph convolutional network (GCN) having a GCN layer is configured. The GCN layer performs an operation in dependence on an adjacency matrix, a feature embedding matrix and a weight matrix. In response to determining that the weight matrix comprises more rows than columns, the GCN layer is configured to determine a first intermediate result of multiplying the feature embedding matrix and the weight matrix, and subsequently use the determined first intermediate result to determine a full result representing a result of multiplying the adjacency matrix, the feature embedding matrix and the weight matrix. In response to determining that the weight matrix comprises more columns than rows, the GCN layer is configured to determine a second intermediate result of multiplying the adjacency matrix and the feature embedding matrix, and subsequently use the determined second intermediate result to determine the full result representing the result of multiplying the adjacency, feature embedding and weight matrices.
Methods and image processing systems are provided for determining a dominant gradient orientation for a target region within an image. A plurality of gradient samples are determined for the target region, wherein each of the gradient samples represents a variation in pixel values within the target region. The gradient samples are converted into double-angle gradient vectors, and the double-angle gradient vectors are combined so as to determine a dominant gradient orientation for the target region.
G06T 7/44 - Analyse de la texture basée sur la description statistique de texture utilisant des opérateurs de l'image, p. ex. des filtres, des mesures de densité des bords ou des histogrammes locaux
G06F 18/22 - Critères d'appariement, p. ex. mesures de proximité
G06T 7/73 - Détermination de la position ou de l'orientation des objets ou des caméras utilisant des procédés basés sur les caractéristiques
G06V 10/46 - Descripteurs pour la forme, descripteurs liés au contour ou aux points, p. ex. transformation de caractéristiques visuelles invariante à l’échelle [SIFT] ou sacs de mots [BoW]Caractéristiques régionales saillantes
A graphics processing system includes a plurality of processing units, wherein the graphics processing system is configured to process a task first and second times at the plurality of processing units. Data identifying which processing unit of the plurality of processing units the task has been allocated to is consulted on allocating the task to a processing unit for processing for a second time, and, in response, the task is allocated for processing for the second time to any processing unit of the plurality of processing units other than the processing unit to which the task was allocated for processing for a first time.
A method of analyzing one or more objects in a set of frames. A first frame is segmented to produce a plurality of first masks each identifying pixels belonging to a potential object-instance detected in the first frame. A first feature vector is extracted from the first frame for each potential object-instance detected therein, characterizing the potential object-instance. A second frame is segmented to produce a plurality of second masks each identifying pixels belonging to a potential object-instance detected in the second frame. A second feature vector is extracted for each potential object-instance detected in the second frame, characterizing the potential object-instance. A potential object-instance in the first frame is matched with one of the potential object-instances in the second frame.
A graphics processing system includes a tiling unit configured to tile a scene into a plurality of tiles. A processing unit identifies tiles of the plurality of tiles that are each associated with at least a predetermined number of primitives. A memory management unit allocates a portion of memory to each of the identified tiles and does not allocate a portion of memory for each of the plurality of tiles that are not identified by the processing unit. A rendering unit renders each of the identified tiles and does not render tiles that are not identified by the processing unit.
Data in a processing system is compressed, the data comprising a plurality of values having a same multiple-byte format. Bytes with corresponding byte significance are grouped together to form a plurality of byte blocks, and a byte block of the plurality of byte blocks is compressed using a compression algorithm comprising storing at least one byte from the byte block as a byte origin, and storing at least one remaining byte in the byte block as a difference value from the byte origin.
A method of processing an input task in a processing system involves duplicating the input task so as to form a first task and a second task; allocating memory including a first block of memory configured to store read-write data to be accessed during the processing of the first task; a second block of memory configured to store a copy of the read-write data to be accessed during the processing of the second task; and a third block of memory configured to store read-only data to be accessed during the processing of both the first task and the second task; and processing the first task and the second task at processing logic of the processing system so as to, respectively, generate first and second outputs.
Rendering systems that can use combinations of rasterization rendering processes and ray tracing rendering processes are disclosed. In some implementations, these systems perform a rasterization pass to identify visible surfaces of pixels in an image. Some implementations may begin shading processes for visible surfaces, before the geometry is entirely processed, in which rays are emitted. Rays can be culled at various points during processing, based on determining whether the surface from which the ray was emitted is still visible. Rendering systems may implement rendering effects as disclosed.
G09G 5/36 - Dispositions ou circuits de commande de l'affichage communs à l'affichage utilisant des tubes à rayons cathodiques et à l'affichage utilisant d'autres moyens de visualisation caractérisés par l'affichage de dessins graphiques individuels en utilisant une mémoire à mappage binaire
Hardware is configured for implementing a Deep Neural Network (DNN) for performing an activation function. A programmable lookup table for storing lookup data approximating the activation function is provided at an activation module for performing the activation function. Training data is provided to an input layer of a representation of the hardware, wherein the representation of the hardware is configured to implement the DNN, to configure the DNN by using the training data, wherein configuring the DNN comprises determining lookup data for the lookup table representing the activation function. The lookup data is loaded into the lookup table of the hardware, thereby configuring the activation module of the hardware for performing the activation function during post-training operation.
G06N 3/063 - Réalisation physique, c.-à-d. mise en œuvre matérielle de réseaux neuronaux, de neurones ou de parties de neurone utilisant des moyens électroniques
Systems and method to implement a geometry processing phase of tile-based rendering. The systems include a plurality of parallel geometry pipelines, a plurality of tiling pipelines and a geometry to tiling arbiter situated between the plurality of geometry pipelines and the plurality of tiling pipelines. Each geometry pipeline is configured to generate one or more geometry blocks for each geometry group of a subset of ordered geometry groups; generate a corresponding primitive position block for each geometry block, and compress each geometry blocks to generate a corresponding compressed geometry block. The tiling pipelines are configured to generate, from the primitive position blocks, a list for each tile indicating primitives that fall within the bounds of that tile. The geometry to tiling arbiter is configured to forward the primitive position blocks generated by the plurality of geometry pipelines to the plurality of tiling pipelines in the correct order based on the order of the geometry groups.
A processor has first and second cores and a distributed cache that caches a copy of data stored at a plurality of memory addresses of a memory. A first cache slice is connected to the first core, and a second cache slice is connected to the second core. The first cache caches a copy of data stored at a first set of memory addresses, and the second cache slice caches a copy of data stored at a second, different, set of memory addresses.
A transaction processing circuit in a graphics rendering system receives information identifying a particular vertex of a plurality of vertices in a strip, each of which is associated with a viewport, and selects a plurality of viewports for viewport transformation of the particular vertex by selecting relevant vertices from the vertices in the strip based on a provoking vertex, and selecting the plurality of viewports to comprise the viewport associated with that relevant vertex. Viewport transformation instructions are sent to a viewport transformation module to perform a viewport transformation on untransformed coordinate data for the particular vertex for each of the viewports, wherein the one or more viewport transformation instructions comprises a viewport transformation instruction for each of the plurality of viewports, each viewport transformation instruction comprises information identifying the particular vertex and information identifying one of the plurality of viewports.
G06T 17/10 - Description de volumes, p. ex. de cylindres, de cubes ou utilisant la GSC [géométrie solide constructive]
G06F 30/327 - Synthèse logiqueSynthèse de comportement, p. ex. logique de correspondance, langage de description de matériel [HDL] à liste d’interconnections [Netlist], langage de haut niveau à langage de transfert entre registres [RTL] ou liste d’interconnections [Netlist]
G06F 119/18 - Analyse de fabricabilité ou optimisation de fabricabilité
A method of matching features in first and second images captured from respective camera viewpoints related by an epipolar geometry. The coordinate system of the second image is transformed so as to map an epipolar line in the second image corresponding to a first feature in the first image, to be parallel to one of the coordinate axes of the coordinate system. The epipolar line defines a geometrically-constrained region in the second image in the transformed coordinate system corresponding to the first feature in the first image; measures of similarity between the first feature in the first image and features in the second image are determined; and a best match feature is identified from the measures of similarity between the first feature in the first image and the respective features in the second image.
G06T 7/593 - Récupération de la profondeur ou de la forme à partir de plusieurs images à partir d’images stéréo
G06F 18/22 - Critères d'appariement, p. ex. mesures de proximité
G06T 7/73 - Détermination de la position ou de l'orientation des objets ou des caméras utilisant des procédés basés sur les caractéristiques
G06V 10/44 - Extraction de caractéristiques locales par analyse des parties du motif, p. ex. par détection d’arêtes, de contours, de boucles, d’angles, de barres ou d’intersectionsAnalyse de connectivité, p. ex. de composantes connectées
G06V 10/46 - Descripteurs pour la forme, descripteurs liés au contour ou aux points, p. ex. transformation de caractéristiques visuelles invariante à l’échelle [SIFT] ou sacs de mots [BoW]Caractéristiques régionales saillantes
G06V 10/74 - Appariement de motifs d’image ou de vidéoMesures de proximité dans les espaces de caractéristiques
Hardware logic for implementing a convolutional neural network (CNN) is configured to receive input data values to be processed in a layer of the CNN. Addresses in banked memory of a buffer in which the received input data values are to be stored are determined based upon format data indicating a format parameter of the input data in the layer and indicating a format parameter of a filter which is to be used to process the input data in the layer, wherein the format parameter of the filter comprises a stride. The received input data values are then stored at the determined addresses in the buffer for retrieval for processing in the layer.
G06F 12/06 - Adressage d'un bloc physique de transfert, p. ex. par adresse de base, adressage de modules, extension de l'espace d'adresse, spécialisation de mémoire
G06N 3/063 - Réalisation physique, c.-à-d. mise en œuvre matérielle de réseaux neuronaux, de neurones ou de parties de neurone utilisant des moyens électroniques
A method and system for generating and shading a computer graphics image in a tile based computer graphics system is provided. Geometry data is supplied and a plurality of primitives are derived from the geometry data. One or more modified primitives are then derived from at least one of the plurality of primitives. For each of a plurality of tiles, an object list is derived including data identifying the primitive from which each modified primitive located at least partially within that tile is derived. Alternatively, the object list may include data identifying each modified primitive located at least partially within that tile. Each tile is then shaded for display using its respective object list.
A cache system in a graphics processing system stores graphics data items for use in rendering primitives. It is determined whether graphics data items relating to primitives to be rendered are present in the cache, and if not then computation instances for generating the graphics data items are created. Computation instances are allocated to tasks using a task assembly unit which stores task entries for respective tasks. The task entries indicate which computation instances have been allocated to the respective tasks. The task entries are associated with characteristics of computation instances which can be allocated to the respective tasks. A computation instance to be executed is allocated to a task based on the characteristics of the computation instance. SIMD processing logic executes computation instances of a task outputted from the task assembly unit to thereby determine graphics data items, which can be used to render the primitives.
Graphics processing systems render items of geometry using a rendering space subdivided into a plurality of first regions. The items of geometry are stored in data blocks having a respective block ID. The items of geometry are rendered within a second region of a plurality of second regions using a first control list for the first region of which the second region is a part, and a second control list for the second region, each control list comprising entries associated with respective items of geometry, each of the entries comprising a block ID associated with a data block. The items of geometry are rendered within the second region by choosing from the first control list and the second list, the entry comprising the lowest block ID which has not previously been chosen, and fetching items of geometry from the data block associated with the block ID of the chosen entry.
09 - Appareils et instruments scientifiques et électriques
42 - Services scientifiques, technologiques et industriels, recherche et conception
45 - Services juridiques; services de sécurité; services personnels pour individus
Produits et services
Computer software for microprocessors and electronic circuits; computer software for graphics processing; computer software for multimedia processing; computer software in relation to instruction set architectures; computer software in relation to neural networks processors; artificial intelligence and machine learning software in connection with microprocessors and electronic circuits; firmware and device drivers for microprocessors; interfaces between computer hardware and computer software; electronic databases featuring data and information relating to microprocessors, electronic circuits, graphics processing, instruction set architectures and neural networks processors; electronic publications in the field of microprocessors; microprocessors; electronic circuits; central processing units; graphics processing units; neural network processors. Providing online non-downloadable computer software, Software-as-a-Service, and Platform-as-a-Service in relation to microprocessors, electronic circuits, graphics processing, multimedia processing, instruction set architectures, neural networks processors, artificial intelligence and machine learning for microprocessors and electronic circuits; research, design and development services in relation to microprocessors, electronic circuits, graphics processing, multimedia processing, instruction set architectures, neural networks processors, artificial intelligence and machine learning for microprocessors and electronic circuits; research, design and development services in relation to microprocessor architecture. Licensing of Intellectual property and technology; licensing of know-how, namely practical knowledge, skill and expertise in relation to the development of microprocessors, electronic circuits, graphics processing, multimedia processing, instruction set architectures, neural networks processors, artificial intelligence and machine learning for microprocessors and electronic circuits.
45 - Services juridiques; services de sécurité; services personnels pour individus
09 - Appareils et instruments scientifiques et électriques
42 - Services scientifiques, technologiques et industriels, recherche et conception
Produits et services
Licensing of Intellectual property and technology; licensing of know-how, namely practical knowledge, skill and expertise in relation to the development of microprocessors, electronic circuits, graphics processing, multimedia processing, instruction set architectures, neural networks processors, artificial intelligence and machine learning for microprocessors and electronic circuits Computer software for microprocessors and electronic circuits; computer software for graphics processing; computer software for multimedia processing; computer software in relation to instruction set architectures; computer software in relation to neural networks processors; artificial intelligence and machine learning software in connection with microprocessors and electronic circuits; firmware and device drivers for microprocessors; interfaces between computer hardware and computer software; electronic databases featuring data and information relating to microprocessors, electronic circuits, graphics processing, instruction set architectures and neural networks processors; electronic publications in the field of microprocessors; microprocessors; electronic circuits; central processing units; graphics processing units; neural network processors Providing online non-downloadable computer software, Software-as-a-Service, and Platform-as-a-Service in relation to microprocessors, electronic circuits, graphics processing, multimedia processing, instruction set architectures, neural networks processors, artificial intelligence and machine learning for microprocessors and electronic circuits; research, design and development services in relation to microprocessors, electronic circuits, graphics processing, multimedia processing, instruction set architectures, neural networks processors, artificial intelligence and machine learning for microprocessors and electronic circuits; research, design and development services in relation to microprocessor architecture
An adder and a method for calculating 2n+x are provided, where x is a variable input expressed in a floating point format and n is an integer. The adder comprises a first path configured to calculate 2n+x for x<0 and 2n−1≤|x|<2n+1; a second path configured to calculate 2n+x for |x|<2n; a third path configured to calculate 2n+x for |x|≥2n; and selection logic configured to cause the adder to output a result from one of the first, second, and third paths in dependence on the values of x and n.
A colour processor for mapping an image from source to destination colour gamuts has an input for receiving a source image including a plurality of source colour points expressed according to the source gamut; a colour characterizer configured to, for each source colour point in the source image, determine a position of intersection of a curve with the boundary of the destination gamut; and a gamut mapper configured to, for each source colour point in the source image: if the source colour point lies inside the destination gamut, apply a first translation factor to translate the source colour point to a destination colour point within a first range of values; or if the source colour point lies outside the destination gamut, apply a second translation factor, different to the first translation factor, to translate the source colour point to a destination colour point within a second range of values.
G09G 5/02 - Dispositions ou circuits de commande de l'affichage communs à l'affichage utilisant des tubes à rayons cathodiques et à l'affichage utilisant d'autres moyens de visualisation caractérisés par la manière dont la couleur est visualisée
G09G 5/06 - Dispositions ou circuits de commande de l'affichage communs à l'affichage utilisant des tubes à rayons cathodiques et à l'affichage utilisant d'autres moyens de visualisation caractérisés par la manière dont la couleur est visualisée utilisant des palettes de couleurs, p. ex. des tables de consultation
A multicore graphics processing unit (GPU) and a method of operating a GPU are provided. The GPU comprises at least a first core and a second core. At least one of the cores in the multicore GPU comprises a master unit configured to distribute geometry processing tasks between at least the first core and the second core.
Input/output filter units for use in a graphics processing unit include a first buffer configured to store data received from, and output to, a first component of the graphics processing unit; a second buffer configured to store data received from, and output to, a second component of the graphics processing unit; a weight buffer configured to store filter weights; a filter bank configurable to perform a plurality of types of filtering on a set of input data, the plurality of types of filtering comprising texture filtering types and pixel filtering types; and control logic configured to cause the filter bank to: (i) perform one of the plurality of types of filtering on a set of data stored in one of the first and second buffers using a set of weights stored, and (ii) store the results of the filtering in one of the first and second buffers.
G06F 7/57 - Unités arithmétiques et logiques [UAL], c.-à-d. dispositions ou dispositifs pour accomplir plusieurs des opérations couvertes par les groupes ou pour accomplir des opérations logiques
G06F 13/12 - Commande par programme pour dispositifs périphériques utilisant des matériels indépendants du processeur central, p. ex. canal ou processeur périphérique
Methods and apparatus for generating a data structure for storing primitive data for a number of primitives and vertex data for a plurality of vertices, wherein each primitive is defined with reference to one or more of the plurality of vertices. The vertex data comprises data for more than one view, such as a left view and a right view, with vertex parameter values for a first group of vertex parameters being stored separately for each view and vertex parameter values for a second, non-overlapping group of vertex parameters being stored only once and used when rendering either or both views.
Methods and hardware for cube mapping comprise receiving fragment coordinates for an input block of fragments and texture instructions for the fragments and then determining, based on gradients of the input block of fragments, whether a first mode of cube mapping or a second mode of cube mapping is to be used, wherein the first mode of cube mapping performs calculations at a first precision for a subset of the fragments and calculations for remaining fragments at a second, lower, precision and the second mode of cube mapping performs calculations for all fragments at the first precision. Cube mapping is then performed using the determined mode and the gradients, wherein if the second mode is used and more than half of the fragments in the input block are valid, the cube mapping is performed over two clock cycles.
A method of decompression to determine data values from compressed data comprising representations of one or more difference values for the data values being decompressed, each difference value representing a difference between the respective data value and an origin value, wherein the representations of the one or more difference values are included in the compressed data using a second number of bits. Based on the representations of the one or more difference values in the compressed data and a first number of bits for representing the one or more difference values for the one or more data values, for each of the one or more data values being decompressed, a difference value is determined in accordance with the first number of bits. Each of the one or more data values being decompressed is determined using: (i) the origin value, and (ii) the determined difference value for the data value.
G06F 7/72 - Méthodes ou dispositions pour effectuer des calculs en utilisant une représentation numérique non codée, c.-à-d. une représentation de nombres sans baseDispositifs de calcul utilisant une combinaison de représentations de nombres codées et non codées utilisant l'arithmétique des résidus
G06T 3/40 - Changement d'échelle d’images complètes ou de parties d’image, p. ex. agrandissement ou rétrécissement
A binary logic circuit performs an interpolation calculation between two endpoint values E0 and E1 using a weighting index i for generating an interpolated result P, the values E0 and E1 being formed from Adaptive Scalable Texture Compression (ASTC) colour endpoint values C0 and C1 respectively, the colour endpoint values C0 and C1 being low-dynamic range (LDR) or high dynamic range (HDR) values. An interpolation unit performs an interpolation between the values C0 and C1 using the index i to generate a first intermediate interpolated result C2; combinational logic circuitry receives the result C2 and performs logical processing operations to calculate the interpolated result P according to the equation: (1) P=└((C2<<8)+C2+32)/64┘ when the interpolated result is not to be compatible with an sRGB colour space and the colour endpoint values are LDR values; (2) P=└((C2<<8)+128.64+32)/64┘ when the interpolated result is to be compatible with an sRGB colour space and the colour endpoint values are LDR values; and (3) P=(C2+2)>>2 when the colour endpoint values are HDR values.
G06T 3/4007 - Changement d'échelle d’images complètes ou de parties d’image, p. ex. agrandissement ou rétrécissement basé sur l’interpolation, p. ex. interpolation bilinéaire
Texture filtering is applied to a texture represented with a mipmap comprising a plurality of levels, wherein each level of the mipmap comprises an image representing the texture at a respective level of detail. A texture filtering unit has minimum and maximum limits on an amount by which it can alter the level of detail when it filters texels from an image of a single level of the mipmap. The range of level of detail between the minimum and maximum limits defines an intrinsic region of the texture filtering unit. If it is determined that a received input level of detail is in an intrinsic region of the texture filtering unit, texels are read from a single mipmap level of the mipmap, and the read texels from the single mipmap level are filtered to determine a filtered texture value representing part of the texture at the input level of detail. If it is determined that the received input level of detail is in an extrinsic region of the texture filtering unit: texels are read from two mipmap levels of the mipmap, and the read texels from the two mipmap levels are processed to determine a filtered texture value representing part of the texture at the input level of detail.
Methods and tiling engines for tiling primitives in a tile based graphics processing system in which a rendering space is divided into a plurality of tiles. The method includes generating a multi-level hierarchy of tile groups, each level of the multi-level hierarchy comprising one or more tile groups comprising one or more of the plurality of tiles; receiving a plurality of primitive blocks, each primitive block comprising geometry data for one or more primitives; associating each of the plurality of primitive blocks with one or more of the tile groups up to a maximum number of tile groups such that if at least one primitive of a primitive block falls, at least partially, within the bounds of a tile, the primitive block is associated with at least one tile group that includes that tile; and generating a control stream for each tile group based on the associations, wherein each control stream comprises a primitive block entry for each primitive block associated with the corresponding tile group.
Rendering system combines point sampling and volume sampling operations to produce rendering outputs. For example, to determine color information for a surface location in a 3-D scene, one or more point sampling operations are conducted in a volume around the surface location, and one or more sampling operations of volumetric light transport data are performed farther from the surface location. A transition zone between point sampling and volume sampling can be provided, in which both point and volume sampling operations are conducted. Data obtained from point and volume sampling operations can be blended in determining color information for the surface location. For example, point samples are obtained by tracing a ray for each point sample, to identify an intersection between another surface and the ray, to be shaded, and volume samples are obtained from a nested 3-D grids of volume elements expressing light transport data at different levels of granularity.
Methods of encoding and decoding are described which use a variable number of instruction words to encode instructions from an instruction set, such that different instructions within the instruction set may be encoded using different numbers of instruction words. To encode an instruction, the bits within the instruction are re-ordered and formed into instruction words based upon their variance as determined using empirical or simulation data. The bits in the instruction words are compared to corresponding predicted values and some or all of the instruction words that match the predicted values are omitted from the encoded instruction.
Adder circuits and associated methods for processing a set of at least three floating-point numbers to be added together include identifying, from among the at least three numbers, at least two numbers that have the same sign—that is, at least two numbers that are both positive or both negative. The identified at least two numbers are added together using one or more same-sign floating-point adders. A same-sign floating-point adder comprises circuitry configured to add together floating-point numbers having the same sign and does not include circuitry configured to add together numbers having different signs.
G06F 7/24 - Tri, c.-à-d. extraction de données d'un ou de plusieurs supports, nouveau rangement des données dans un ordre de succession numérique ou autre, et réinscription des données triées sur le support original ou sur un support différent ou sur une série de supports
G06F 7/501 - Semi-additionneurs ou additionneurs complets, c.-à-d. cellules élémentaires d'addition pour une position
H03K 19/20 - Circuits logiques, c.-à-d. ayant au moins deux entrées agissant sur une sortieCircuits d'inversion caractérisés par la fonction logique, p. ex. circuits ET, OU, NI, NON
54.
Modifying Processing of Commands in a Command Queue Based on Accessed Data Related to a Command
Processing of commands at a graphics processor are controlled by receiving input data and generating a command for processing at the graphics processor from the input data, wherein the command will cause the graphics processor to write out at least one buffer of data to an external memory, and submitting the command to a queue for later processing at the graphics processor. Subsequent to submitting the command, but before the write to external memory has been completed, further input data is received and it is determined that the buffer of data does not need to be written to external memory. The graphics processor is then signalled to prevent at least a portion of the write to external memory from being performed for the command.
G06F 12/0802 - Adressage d’un niveau de mémoire dans lequel l’accès aux données ou aux blocs de données désirés nécessite des moyens d’adressage associatif, p. ex. mémoires cache
G06T 1/20 - Architectures de processeursConfiguration de processeurs p. ex. configuration en pipeline
A method of filtering a target pixel in an image forms, for a kernel of pixels comprising the target pixel and its neighbouring pixels, a data model to model pixel values within the kernel; calculates a weight for each pixel of the kernel comprising: (i) a geometric term dependent on a difference in position between that pixel and the target pixel; and (ii) a data term dependent on a difference between a pixel value of that pixel and its predicted pixel value according to the data model; and uses the calculated weights to form a filtered pixel value for the target pixel, e.g. by updating the data model with a weighted regression analysis technique using the calculated weights for the pixels of the kernel; and evaluating the updated data model at the target pixel position so as to form the filtered pixel value for the target pixel.
Neural network accelerators with one or more neural network accelerator cores. Each neural network accelerator core has hardware accelerators configured to accelerate neural network operations, an embedded processor, a command decoder, and a hardware feedback path between the embedded processor and the command decoder. The command decoder is configured to control the hardware accelerators and the embedded processor of that core in accordance with commands of a command stream, and when the command stream comprises a set of one or more branch commands that indicate a conditional branch is to be performed, cause the embedded processor to determine a next command stream, and in response to receiving information from the embedded processor identifying the next command stream via the hardware feedback path, control the one or more hardware accelerators and the embedded processor in accordance with commands of the next command stream.
A method of GPU virtualization comprises allocating each virtual machine (or operating system running on a VM) an identifier by the hypervisor and then this identifier is used to tag every transaction deriving from a GPU workload operating within a given VM context (i.e. every GPU transaction on the system bus which interconnects the CPU, GPU and other peripherals). Additionally, dedicated portions of a memory resource (which may be GPU registers or RAM) are provided for each VM and whilst each VM can only see their allocated portion of the memory, a microprocessor within the GPU can see all of the memory. Access control is achieved using root memory management units which are configured by the hypervisor and which map guest physical addresses to actual memory addresses based on the identifier associated with the transaction.
G06F 12/1009 - Traduction d'adresses avec tables de pages, p. ex. structures de table de page
G06F 9/455 - ÉmulationInterprétationSimulation de logiciel, p. ex. virtualisation ou émulation des moteurs d’exécution d’applications ou de systèmes d’exploitation
G06F 13/16 - Gestion de demandes d'interconnexion ou de transfert pour l'accès au bus de mémoire
G06T 1/20 - Architectures de processeursConfiguration de processeurs p. ex. configuration en pipeline
Shader processing units for a graphics processing unit execute ray tracing shaders that generate ray data associated with rays. The ray data includes a plurality of ray data elements. Store logic receives, as part of a ray tracing shader, a ray store instruction that includes: (i) information identifying a store group of a plurality of store groups, each store group comprising one or more ray data elements of the plurality of ray data elements, and (ii) information identifying one or more ray data elements of the identified store group to be stored in an external unit. In response to receiving the ray store instruction, the store logic retrieves the identified ray data elements for one or more rays from the storage. The store logic then sends one or more store requests to an external unit which cause the external unit to store the identified ray data elements for the one or more rays.
Methods and intersection testing modules are provided for determining, in a ray tracing system, whether a ray intersects a 3D axis-aligned box representing a volume defined by a front-facing plane and a back-facing plane for each dimension. The front-facing plane of the box which intersects the ray furthest along the ray is identified. It is determined whether the ray intersects the identified front-facing plane at a position that is no further along the ray than positions at which the ray intersects the back-facing planes in a subset of the dimensions, and this determination is used to determine whether the ray intersects the axis-aligned box. The subset of dimensions comprises the two dimensions for which the front-facing plane was not identified, but does not comprise the dimension for which the front-facing plane was identified. It is determined whether the ray intersects the box without performing a test to determine whether the ray intersects the identified front-facing plane at a position that is no further along the ray than a position at which the ray intersects the back-facing plane in the dimension for which the front-facing plane was identified.
A graphics processing system with a data store includes processing units for processing tasks. A check unit forms a signature which is characteristic of an output from processing a task on a processing unit, and a fault detection unit compares signatures formed at the check unit. Each task is processed first and second times at the processing units to generate first and second processed outputs. The graphics processing system write outs the first processed output to the data store, reads back the first processed output from the data store and forms at the check unit a first signature characteristic of the first processed output as read back from the data store; forms at the check unit a second signature characteristic of the second processed output, compares the first and second signatures at the fault detection unit, and raises a fault signal if the signatures do not match.
G06F 9/48 - Lancement de programmes Commutation de programmes, p. ex. par interruption
G06F 9/30 - Dispositions pour exécuter des instructions machines, p. ex. décodage d'instructions
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire
G06F 11/10 - Détection ou correction d'erreur par introduction de redondance dans la représentation des données, p. ex. en utilisant des codes de contrôle en ajoutant des chiffres binaires ou des symboles particuliers aux données exprimées suivant un code, p. ex. contrôle de parité, exclusion des 9 ou des 11
G06F 11/14 - Détection ou correction d'erreur dans les données par redondance dans les opérations, p. ex. en utilisant différentes séquences d'opérations aboutissant au même résultat
G06F 11/16 - Détection ou correction d'erreur dans une donnée par redondance dans le matériel
G06F 21/62 - Protection de l’accès à des données via une plate-forme, p. ex. par clés ou règles de contrôle de l’accès
G06F 21/64 - Protection de l’intégrité des données, p. ex. par sommes de contrôle, certificats ou signatures
A graphics processing system for performing tile-based rendering of a scene that comprises safety-related primitives. The system comprises a plurality of graphics processing units (GPUs), each configured to i) receive tile data identifying one or more protected tiles comprising at least part of a safety-related primitive, ii) process two respective sets of protected tiles, and iii) based on said processing, generate two respective checksums for each respective set of protected tiles. The two respective sets of protected tiles are mutually exclusive, and each respective set and each protected tile being processed by two different GPUs. The system comprises a comparison unit configured to compare one or more pairs of checksums, each pair comprising a respective checksum generated based on a same respective set of protected tiles and generated by different GPUs. The graphics processing system is configured to perform one or more actions based on an outcome of said comparison.
G06F 11/10 - Détection ou correction d'erreur par introduction de redondance dans la représentation des données, p. ex. en utilisant des codes de contrôle en ajoutant des chiffres binaires ou des symboles particuliers aux données exprimées suivant un code, p. ex. contrôle de parité, exclusion des 9 ou des 11
G06T 1/20 - Architectures de processeursConfiguration de processeurs p. ex. configuration en pipeline
A computer-implemented method of compressing one or more data values, uses a first number of bits to determine a second number of bits, wherein the first number of bits is for representing a maximum difference value of one or more difference values representing one or more differences between the one or more data values and an origin value; and forms compressed data, wherein the compressed data comprises one or more representations of the one or more difference values, wherein each of the one or more representations of the one or more difference values uses said determined second number of bits.
G06F 7/72 - Méthodes ou dispositions pour effectuer des calculs en utilisant une représentation numérique non codée, c.-à-d. une représentation de nombres sans baseDispositifs de calcul utilisant une combinaison de représentations de nombres codées et non codées utilisant l'arithmétique des résidus
G06T 3/40 - Changement d'échelle d’images complètes ou de parties d’image, p. ex. agrandissement ou rétrécissement
63.
CPU FOR IMPLEMENTING A GRAPHICS PROCESSING PIPELINE
A central processing unit for implementing a graphics processing pipeline which comprises a plurality of graphics processing tasks, the central processing unit comprising: one or more distinct graphics processing modules configured in dedicated hardware, wherein each of the one or more distinct graphics processing modules is configured to perform one of the graphics processing tasks of the graphics processing pipeline; and an execution unit configured to execute instructions of an instruction set for implementing the graphics processing pipeline, wherein the execution unit is configured to call each of the one or more distinct graphics processing modules using a respective instruction of the instruction set to perform its respective graphics processing task of the graphics processing pipeline.
A hierarchical acceleration structure for use in a ray tracing system. When generating a node for the hierarchical acceleration structure, the primitives in a particular portion of the 3D scene may be alternatively bounded by different shaped volumes. These bounding volumes or ‘bounding regions’ can be Axis Aligned Bounding Boxes (AABBs), although other bounding volumes can be used. The ray tracing system may use sets of two or more bounding volumes in a 3D scene to bound all the primitives within that portion. The choice of how to create sets of multiple bounding volumes within a portion of the 3D scene may be done by using a binary space partition (BSP). Different sets of bounding regions may present different amounts of surface area for a hypothetical ray entering the portion of the 3D scene dependent upon the expected ray direction or angle.
A graphics processing system is configured to render primitives using a rendering space that is sub-divided into sections, wherein the graphics processing system includes assessment logic configured to make an assessment regarding the presence of primitive edges in a section, and determination logic configured to determine an anti-aliasing setting for the section based on the assessment.
A computer implemented method of training a neural network configured to combine a set of coefficients with respective input data values. So as to train a test implementation of the neural network, sparsity is applied to one or more of the coefficients according to a sparsity parameter, the sparsity parameter indicating a level of sparsity to be applied to the set of coefficients; the test implementation of the neural network is operated on training input data using the coefficients so as to form training output data; in dependence on the training output data, assessing the accuracy of the neural network; the sparsity parameter is updated in dependence on the accuracy of the neural network; and a runtime implementation of the neural network is configured in dependence on the updated sparsity parameter.
G06N 3/084 - Rétropropagation, p. ex. suivant l’algorithme du gradient
G06F 18/2136 - Extraction de caractéristiques, p. ex. en transformant l'espace des caractéristiquesSynthétisationsMappages, p. ex. procédés de sous-espace basée sur des critères de parcimonie, p. ex. avec une base trop complète
G06N 3/082 - Méthodes d'apprentissage modifiant l’architecture, p. ex. par ajout, suppression ou mise sous silence de nœuds ou de connexions
67.
Unified Rasterization and Ray Tracing Rendering Environments
A graphics processor architecture provides for scan conversion and ray tracing approaches to visible surface determination as concurrent and separate processes. Surfaces can be identified for shading by scan conversion and ray tracing. Data produced by each can be normalized, so that instances of shaders, being executed on a unified shading computation resource, can shade surfaces originating from both ray tracing and rasterization. Such resource also may execute geometry shaders. The shaders can emit rays to be tested for intersection by the ray tracing process. Such shaders can complete, without waiting for those emitted rays to complete. Where scan conversion operates on tiles of 2-D screen pixels, the ray tracing can be tile aware, and controlled to prioritize testing of rays based on scan conversion status. Ray population can be controlled by feedback to any of scan conversion, and shading.
A hardware-implemented method of indexing data elements in a source array in a memory, generates a number of shifted copy arrays based on the source array, each shifted copy array comprising the data elements of the source array at a respective shifted position. A plurality of indices for indexing the source array are received, each index of the plurality of indices indicating a target location in the source array, and for each index of the plurality of indices, a data element is retrieved from each of the shifted copy arrays. The retrieved elements are gated based on the index, to thereby select a data element.
G06F 16/22 - IndexationStructures de données à cet effetStructures de stockage
G06F 17/11 - Opérations mathématiques complexes pour la résolution d'équations
69.
PROCESSOR HAVING FIRST AND SECOND PIPELINES AND BLOCKING CIRCUIT ENABLING SECOND PIPELINE TO PROCESS TASKS DURING DEALLOCATION OF MEMORY ALLOCATED TO FIRST PIPELINE
A processor includes a first processing pipeline, a second processing pipeline and a memory management that allocates memory regions from memory for the first processing pipeline to write the data of each of a first of a sequence of tasks, and deallocates each of the memory regions after the data therein has been processed by the second processing pipeline. A blocking circuit enables the second processing pipeline to start processing a second sequence of tasks while the memory management circuit is still deallocating some of the memory regions allocated to the data portions of the first of said sequence of tasks, the blocking circuit preventing identifiers of the data portions of the second task being passed to the memory management circuit until the memory management circuit indicates that it has completed deallocating the memory regions allocated to all the data portions of the first task.
Methods and primitive block generators for generating primitive blocks in a graphics processing system. The methods include receiving transformed position data for a current primitive, the transformed position data indicating a position of the current primitive in rendering space; determining a distance between the position of the current primitive and a position of a current primitive block based on the transformed position data for the current primitive; determining whether to add the current primitive to the current primitive block based on the distance and a fullness of the current primitive block; in response to determining that the current primitive is to be added to the current primitive block, adding the current primitive to the current primitive block; and in response to determining that the current primitive is not to be added to the current primitive block, flushing the current primitive block and adding the current primitive to a new current primitive block.
A ray tracing system determines whether a ray intersects a three-dimensional axis-aligned box by determining whether a minimum distance condition and a maximum distance condition are satisfied, wherein the determining comprises determining whether a single distance condition is satisfied. The determination is used to determine whether the ray intersects the axis-aligned box. A point on the ray is at a position O+Dt where O is a vector which represents an origin of the ray, and t represents a distance along the ray from the origin of the ray, and wherein D is a 3D vector defining a direction vector of the ray.
A method for generating an augmented reality image from first and second images, wherein at least a portion of at least one of the first and the second image is captured from a real scene, identifies a confidence region in which a confident determination as to which of the first and second image to render in that region of the augmented reality image can be made, and identifies an uncertainty region in which it is uncertain as to which of the first and second image to render in that region of the augmented reality image. At least one blending factor value in the uncertainty region is determined based upon a similarity between a first colour value in the uncertainty region and a second colour value in the confidence region, and an augmented reality image is generated by combining, in the uncertainty region, the first and second images using the at least one blending factor value.
A ray tracing unit and method for processing a ray in a ray tracing system performs intersection testing for the ray by performing one or more intersection testing iterations. Each intersection testing iteration includes: (i) traversing an acceleration structure to identify the nearest intersection of the ray with a primitive that has not been identified as the nearest intersection in any previous intersection testing iterations for the ray; and (ii) if, based on a characteristic of the primitive, a traverse shader is to be executed in respect of the identified intersection: executing the traverse shader in respect of the identified intersection; and if the execution of the traverse shader determines that the ray does not intersect the primitive at the identified intersection, causing another intersection testing iteration to be performed. When the intersection testing for the ray is complete, an output shader is executed to process a result of the intersection testing for the ray.
A method of performing intersection testing in a ray tracing system, for a ray with respect to a set of two or more primitives. Each primitive is defined by an ordered set of edges, and each edge is defined by a respective pair of vertices. A set of distinct edges is determined for the set of primitives, each distinct edge being part of at least one primitive of the set of primitives and being defined by a different pair of vertices to the other distinct edges in the set, wherein every edge in the ordered sets of edges that define the set of primitives is represented by a distinct edge of the set of distinct edges. For each distinct edge in the set of distinct edges, an edge test is performed to determine which side of the distinct edge the ray passes on. For each primitive in the set of primitives, a result of the edge test is used for each distinct edge that defines that primitive to determine whether or not the ray intersects that primitive.
A method of processing a primitive as part of intersection testing in a ray tracing system, the primitive being defined by an ordered set of vertices. Data defining a direction and an origin of a ray to be tested against the primitive, and coordinate data for a set of vertices are received. The coordinate data for the set of vertices is projected into ray space using the ray data, wherein the ray space has two non-parallel axes that are transverse to the direction of the ray, wherein a ray-space frame of reference associated with the axes is centered at a point on the ray such that the ray is represented as that point on the axes in the ray space, and wherein the point is an origin of the ray space. Then, the signs of the coordinate data for the set of vertices are analysed to determine whether a non-intersection condition is fulfilled, wherein fulfilment of the non-intersection condition indicates that the ray does not intersect the primitive. In response to determining that the non-intersection condition is fulfilled, it is determined that the ray does not intersect the primitive.
An image of a 3-D scene is rendered by first rendering a noisy image at a first resolution. One or more guide channels at the first resolution and one or more corresponding guide channels at a second resolution are obtained. When the two resolutions are the same, the guide channels at the first resolution and the corresponding guide channels at the second resolution may be provided by a single set of guide channels. For each of a plurality of local neighborhoods, the parameters of a model that approximates the noisy image as a function of the one or more guide channels (at the first resolution) are calculated, and the calculated parameters are applied to the one or more guide channels (at the second resolution), to produce a denoised image at the second resolution. The one or more guide channels include at least one guide channel characterizing a spatial dependency of incident light on global lighting over the surface of one or more 3-D models in the scene.
A processing unit configured to perform parallel processing includes a parallel processing engine, the parallel processing engine including a plurality of processing instances configured to process instructions in parallel. Test instruction insertion logic identifies an idle cycle of the parallel processing engine and inserts a test instruction for processing during the idle cycle by each of the plurality of processing instances so as to generate a respective plurality of test outputs. Check logic compares a test output generated during the idle cycle by a first processing instance of the plurality of processing instances with a test output generated during the idle cycle by a second processing instance of the plurality of processing instances, and raises a fault signal if the compared test outputs do not match.
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire
G06F 11/22 - Détection ou localisation du matériel d'ordinateur défectueux en effectuant des tests pendant les opérations d'attente ou pendant les temps morts, p. ex. essais de mise en route
A first pixel in a first group of neighbouring pixels classified as acceptable or not acceptable with respect to a second pixel in a second group of neighbouring pixels. It is determined whether a difference between the first pixel and the second pixel is greater than a threshold difference, and in response to determining that the difference between the first pixel and the second pixel is greater than the threshold difference, the pixels in at least one of the first and second groups of neighbouring pixels are analysed to determine whether the difference is indicative of the first pixel being erroneous. The first pixel is classified as acceptable or not acceptable based on whether the difference is determined to be indicative of the first pixel being erroneous, and the classification of the first pixel is outputted.
A decoder for decoding texels of a p by q sub-block from a block of ASTC encoded texture data representing an n by m block of texels, where p≤n and q≤m. The decoder determines a position of a first texel of the sub-block with respect to the rows and/or columns of a weight grid comprising a first plurality of weights in a first plane; determines a position of a second texel of the sub-block with respect to the rows and/or columns of the weight grid; compares the positions of the first and second texels; extracts fewer than (p+1)(q+1) weights in response to determining that the positions of the first and second texels are between the upper-most and lower-most rows of a predetermined number of adjacent rows of the weight grid and/or the left-most and right-most columns of a predetermined number of adjacent columns of the weight grid; and decodes the texels in dependence on the extracted weights.
H04N 19/44 - Décodeurs spécialement adaptés à cet effet, p. ex. décodeurs vidéo asymétriques par rapport à l’encodeur
H04N 19/167 - Position dans une image vidéo, p. ex. région d'intérêt [ROI]
H04N 19/176 - Procédés ou dispositions pour le codage, le décodage, la compression ou la décompression de signaux vidéo numériques utilisant le codage adaptatif caractérisés par l’unité de codage, c.-à-d. la partie structurelle ou sémantique du signal vidéo étant l’objet ou le sujet du codage adaptatif l’unité étant une zone de l'image, p. ex. un objet la zone étant un bloc, p. ex. un macrobloc
80.
Reduced Acceleration Structures for Ray Tracing Systems
Ray tracing units, processing modules and methods are described for generating one or more reduced acceleration structures to be used for intersection testing in a ray tracing system for processing a 3D scene. Nodes of the reduced acceleration structure(s) are determined, wherein a reduced acceleration structure represents a subset of the 3D scene. The reduced acceleration structure(s) are stored for use in intersection testing. Since the reduced acceleration structures represent a subset of the scene (rather than the whole scene) the memory usage for storing the acceleration structure is reduced, and the latency in the traversal of the acceleration structure is reduced.
A computer-implemented method and a processing module for identifying that a fault has occurred in a finite-state machine (FSM). It is determined whether a set of one or more transitions that have occurred between states of the FSM is allowable. In response to determining that the set of one or more transitions is not allowable, it is identified that a fault has occurred in the FSM.
A method of processing instructions at a processing unit having a parallel processing engine. During a mission cycle, a first set of mission operand values is processed in accordance with a mission instruction at a first processing instance to generate a first mission output. In parallel, a second set of mission operand values is processed in accordance with the mission instruction at a second processing instance to generate a second mission output. During a test cycle, a first set of test operand values is processed in accordance with a test instruction at the first processing instance to generate a first test output, and in parallel, a second set of test operand values is processed in accordance with the test instruction at the second processing instance to generate a second test output, where the first set of test operand values is the same as the second set of test operand values. The first test output and the second test output are compared and a fault signal is raised if the compared test outputs do not match.
G06F 11/277 - Matériel de test, c.-à-d. circuits de traitement de signaux de sortie avec une comparaison entre la réponse effective et la réponse connue en l'absence d'erreur
G06F 9/38 - Exécution simultanée d'instructions, p. ex. pipeline ou lecture en mémoire
G06F 11/22 - Détection ou localisation du matériel d'ordinateur défectueux en effectuant des tests pendant les opérations d'attente ou pendant les temps morts, p. ex. essais de mise en route
A method of rounding a floating-point number in an Extended Exponent Range (EER), that would be a denormal floating-point number represented in an Unextended Exponent Range (UER) includes the steps of receiving, at an arithmetic unit, a plurality of input numbers in the EER representation, each input number comprising a sign bit (si), exponent bits (ei) and mantissa bits (mi); performing an arithmetic operation to produce an output number in the EER representation comprising a sign bit (sa), an exponent bits (ea) and mantissa bits (ma); constructing a rounding mask based on the exponent bits (ea) computed by the arithmetic operation; and applying the rounding mask to the output number in the EER representation to round the output number to correct position as if rounding in the UER representation.
A graphics processing unit for graphics data in a tile-based rendering space includes a plurality of processing cores, cost indication logic which obtains a cost indication parameter from a set of tiles, similarity indication logic which obtains similarity indications between sets of tiles of the rendering space, and scheduling logic that assigns the sets of tiles to the processing cores for rendering in dependence on the cost indications and the similarity indications. The similarity indication logic is configured to assign a single similarity indication to each of a plurality of sets of one or more tiles, the similarity indication assigned to each set of one or more tiles being indicative of a level of similarity between that set of one or more tiles and another set of one or more tiles specified according to a spatial order of the tiles within the rendering space.
A method of scheduling tasks in a processor comprises receiving a plurality of tasks that are ready to be executed, i.e. all their dependencies have been met and all the resources required to execute the task are available, and adding the received tasks to a task queue (or “task pool”). The number of tasks that are executing is monitored and in response to determining that an additional task can be executed by the processor, a task is selected from the task pool based at least in part on a comparison of indications of resources used by tasks being executed and indications of resources used by individual tasks in the task pool and the selected task is then sent for execution.
A graphics processing system for generating a rendering output includes geometry processing logic having first transformation logic configured to transform a plurality of untransformed primitives into a plurality of transformed primitives, the first transformation logic configured to implement one or more expansion transformation stages which generate one or more sub-primitives; a primitive block generator configured to divide the plurality of transformed primitives into a plurality of groups; and generate an untransformed primitive block for each group comprising (i) information identifying the untransformed primitives related to the transformed primitives in the group; and (ii) an expansion transformation stage mask for at least one or more expansion transformation stages that indicates the sub-primitives generated for the untransformed primitives in that untransformed primitive block used in generating the rendering output. Rasterization logic includes second transformation logic configured to re-transform the plurality of untransformed primitives into the plurality of transformed primitives on an untransformed primitive block-basis in accordance with the expansion transformation stage mask for the one or more expansion transformation stages; and logic configured to render the transformed primitives to generate the rendering output.
A control stream decoder decodes a control stream for a tile group comprising at least two tiles of a rendering space. A primitive block entry analyser received a primitive block entry of the control stream and identifies a location in memory of a control data block for a corresponding primitive block. For the received primitive block entry, in response to determining that a current tile is a valid tile for the corresponding primitive block, the control data block for the corresponding primitive block is retrieved from the identified location in memory. An address of the corresponding primitive block in memory is identified from the control data block and primitives of that primitive block relevant for rendering the current tile, and information identifying the address of the corresponding primitive block and the primitives of that primitive block relevant for rendering the current tile is outputted.
An image processing method and an image processing unit for performing image processing determines a set of one or more filtered pixel values, wherein the one or more filtered pixel values represent a result of processing image data using a set of one or more filtering functions. A total covariance of the set of one or more filtering functions is identified. A refinement filtering function is applied to the set of one or more filtered pixel values to determine a set of one or more refined pixel values, wherein the refinement filtering function has a covariance that is determined based on the total covariance of the set of one or more filtering functions.
In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.
G06F 7/552 - Méthodes ou dispositions pour effectuer des calculs en utilisant exclusivement une représentation numérique codée, p. ex. en utilisant une représentation binaire, ternaire, décimale utilisant des dispositifs n'établissant pas de contact, p. ex. tube, dispositif à l'état solideMéthodes ou dispositions pour effectuer des calculs en utilisant exclusivement une représentation numérique codée, p. ex. en utilisant une représentation binaire, ternaire, décimale utilisant des dispositifs non spécifiés pour l'évaluation de fonctions par calcul de puissances ou racines
Aspects relate to tracing rays in 3-D scenes that comprise objects that are defined by or with implicit geometry. In an example, a trapping element defines a portion of 3-D space in which implicit geometry exist. When a ray is found to intersect a trapping element, a trapping element procedure is executed. The trapping element procedure may comprise marching a ray through a 3-D volume and evaluating a function that defines the implicit geometry for each current 3-D position of the ray. An intersection detected with the implicit geometry may be found concurrently with intersections for the same ray with explicitly-defined geometry, and data describing these intersections may be stored with the ray and resolved.
3-D rendering systems include a rasterization section that can fetch untransformed geometry, transform geometry and cache data for transformed geometry in a memory. As an example, the rasterization section can transform the geometry into screen space. The geometry can include one or more of static geometry and dynamic geometry. The rasterization section can query the cache for presence of data pertaining to a specific element or elements of geometry, and use that data from the cache, if present, and otherwise perform the transformation again, for actions such as hidden surface removal. The rasterization section can receive, from a geometry processing section, tiled geometry lists and perform the hidden surface removal for pixels within respective tiles to which those lists pertain.
Graphics processing system configured to perform ray tracing. Rays are bundled together and processed together. When differential data is needed by a shader, the data of a true ray in the bundle can be used rather than processing separate tracker rays.
A binary logic circuit for determining y=x mod (2m−1), where x is an n-bit integer, y is an m-bit integer, and n>m, includes reduction logic configured to reduce x to a sum of a first m-bit integer β and a second m-bit integer γ; and addition logic configured to calculate an addition output represented by the m least significant bits of the following sum right-shifted by m: a first binary value of length 2m, the m most significant bits and the m least significant bits each being the string of bit values represented by β; a second binary value of length 2m, the m most significant bits and the m least significant bits each being the string of bit values represented by γ; and the binary value 1.
G06F 7/72 - Méthodes ou dispositions pour effectuer des calculs en utilisant une représentation numérique non codée, c.-à-d. une représentation de nombres sans baseDispositifs de calcul utilisant une combinaison de représentations de nombres codées et non codées utilisant l'arithmétique des résidus
94.
Synchronising Devices Using Clock Signal Delay Comparison
A time difference between an occurrence of a first event and an occurrence of a second event is estimated. A first time marker indicating the occurrence of the first event and a second time marker indicating the occurrence of the second event are received, wherein at least one event is one of playing a media frame or receiving a beacon. A plurality of delayed versions of the first time marker are provided, each being delayed by a different amount of delay to the other delayed versions. Each of the delayed versions of the first time marker are compared with the second time marker to identify which of the delayed versions of the first time marker is the closest temporally matching time marker to the second time marker. The time difference between the first and second time markers is estimated in dependence on the identified delayed version.
H04L 7/033 - Commande de vitesse ou de phase au moyen des signaux de code reçus, les signaux ne contenant aucune information de synchronisation particulière en utilisant les transitions du signal reçu pour commander la phase de moyens générateurs du signal de synchronisation, p. ex. en utilisant une boucle verrouillée en phase
H03K 5/15 - Dispositions dans lesquelles des impulsions sont délivrées à plusieurs sorties à des instants différents, c.-à-d. distributeurs d'impulsions
H04L 7/00 - Dispositions pour synchroniser le récepteur avec l'émetteur
95.
IMMERSIVE VIRTUAL REALITY SYSTEM USING RAY TRACING
Ray tracing, and more generally, graphics operations taking place in a 3-D scene, involve a plurality of constituent graphics operations. Responsibility for executing these operations can be distributed among different sets of computation units. The sets of computation units each can execute a set of instructions on a parallelized set of input data elements and produce results. These results can be that the data elements can be categorized into different subsets, where each subset requires different processing as a next step. The data elements of these different subsets can be coalesced so that they are contiguous in a results set. The results set can be used to schedule additional computation, and if there are empty locations of a scheduling vector (after accounting for the members of a given subset), then those empty locations can be filled with other data elements that require the same further processing as that subset.
Aspects include a multistage collector to receive outputs from plural processing elements. Processing elements may comprise (each or collectively) a plurality of clusters, with one or more ALUs that may perform SIMD operations on a data vector and produce outputs according to the instruction stream being used to configure the ALU(s). The multistage collector includes substituent components each with at least one input queue, a memory, a packing unit, and an output queue; these components can be sized to process groups of input elements of a given size, and can have multiple input queues and a single output queue. Some components couple to receive outputs from the ALUs and others receive outputs from other components. Ultimately, the multistage collector can output groupings of input elements. Each grouping of elements (e.g., at input queues, or stored in the memories of component) can be formed based on matching of index elements.
Apparatus identifies a set of M output memory addresses from a larger set of N input memory addresses containing a non-unique memory address. A comparator block performs comparisons of memory addresses from a set of N input memory addresses to generate a binary classification dataset that identifies a subset of addresses, where each address in the subset identified by the binary classification dataset is unique within that subset. Combination logic units receive a selection of bits of the binary classification dataset and sort its received selection of bits into an intermediary binary string in which the bits are ordered into a first group identifying addresses belonging to the identified subset, and a second group identifying addresses not belonging to the identified subset. Output generating logic selects between bits belonging to different intermediary binary strings to generate a binary output identifying a set of output memory addresses containing at least one address in the identified subset.
G06F 12/06 - Adressage d'un bloc physique de transfert, p. ex. par adresse de base, adressage de modules, extension de l'espace d'adresse, spécialisation de mémoire
A memory interface for interfacing between a memory bus and a cache memory. A plurality of bus interfaces are configured to transfer data between the memory bus and the cache memory, and a plurality of snoop processors are configured to receive snoop requests from the memory bus. Each snoop processor is associated with a respective bus interface and each snoop processor is configured, on receiving a snoop request, to determine whether the snoop request relates to the bus interface associated with that snoop processor and to process the snoop request in dependence on that determination.
G06F 12/1018 - Traduction d'adresses avec tables de pages, p. ex. structures de table de page impliquant des techniques de hachage, p. ex. tables de page inversée
G06F 13/16 - Gestion de demandes d'interconnexion ou de transfert pour l'accès au bus de mémoire
Graphics processing systems can include lighting effects when rendering images. “Light probes” are directional representations of lighting at particular probe positions in the space of a scene which is being rendered. Light probes can be determined iteratively, which can allow them to be determined dynamically, in real-time over a sequence of frames. Once the light probes have been determined for a frame then the lighting at a pixel can be determined based on the lighting at the nearby light probe positions. Pixels can then be shaded based on the lighting determined for the pixel positions.
Input data for a convolutional neural network (CNN) is stored in a buffer comprising a plurality of banks, by receiving input data comprising input data values to be processed in the CNN, determining addresses in the buffer in which the received input data values are to be stored, keeping a cursor for one or more salient positions to reduce arithmetic performed to determine the addresses in the buffer in which the received input data values are to be stored, and storing the received input data values at the determined addresses in the buffer.
G06F 12/06 - Adressage d'un bloc physique de transfert, p. ex. par adresse de base, adressage de modules, extension de l'espace d'adresse, spécialisation de mémoire
G06N 3/063 - Réalisation physique, c.-à-d. mise en œuvre matérielle de réseaux neuronaux, de neurones ou de parties de neurone utilisant des moyens électroniques