INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Zhang, Wenli
Wan, Wenkai
Chen, Mingyu
Abstract
A content lock firewall method based on a white list includes performing semantic parsing on the payload of a data packet received by a website to obtain parsed texts of the received data packet, and matching the parsed texts of the data packet received by the website with a text pattern library to decide whether to forward or intercept the received data packet, the text pattern library comprising a plurality of text patterns, and each text pattern includes a sequence of keywords and a value range of each keyword. For the website with a relatively fixed function, through deployment of the firewall, known and new network attacks may be effectively defended, and the website may run with vulnerability under the condition of ensuring normal functions, without expensive upgrading.
The embodiments of the present disclosure provide an information processing method, an information processing apparatus, an electronic device, and a storage medium. The method is applied to a key-value storage system for key-value separation, a storage unit in the key-value storage system includes a key partition used for storing LSMT information and a plurality of storage partitions used for storing key-value information, and the method includes: selecting, according to the LSMT information in the key partition, a target storage partition with a highest invalid information rate from the storage partitions; detecting validity information corresponding to each key-value information in the target storage partition, and screening out valid key-value information according to the validity information corresponding to each key-value information; and transferring and storing the valid key-value information to a first storage partition except for the target storage partition, and erasing multiple key-value information stored in the target storage partition.
G06F 16/27 - Replication, distribution or synchronisation of data between databases or within a distributed database systemDistributed database system architectures therefor
3.
CET MECHANISM-BASED METHOD FOR PROTECTING INTEGRITY OF GENERAL-PURPOSE MEMORY
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Wang, Zhe
Wu, Chenggang
Xie, Mengyao
Abstract
A CET mechanism-based method for protecting the integrity of a general-purpose memory. In the method, the integrity of the general-purpose memory is protected on the basis of a CET mechanism. A dedicated shadow stack page is provided, and is independent of the shadow stack page maintained by the CET mechanism itself, and overhead reduction processing adaptive to content to be written that is written to the dedicated shadow stack page and requires writing overhead reduction is performed on the content to be written, so as to reduce the number of times of using WRSS instructions, such that the integrity of sensitive data and/or sensitive codes is protected in the case of using lower overhead, and performance overhead of a processor in the protection of the integrity of the general-purpose memory is reduced, thereby improving the efficiency of processing other tasks by the processor.
G06F 21/54 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by adding security routines or objects to programs
G06F 21/55 - Detecting local intrusion or implementing counter-measures
4.
FFT EXECUTION METHOD, PROCESSOR, AND COMPUTING DEVICE
INSTITUTE OF COMPUTING TECHNOLOGY OF CHINESE ACADEMY OF SCIENCES (China)
Inventor
Ma, Penghao
Li, Yinghao
Zhang, Ruge
Yan, Baicheng
Wang, Zhe
Wang, Long
Jia, Haipeng
Abstract
Embodiments of the present application relate to the technical field of computing, and in particular to an FFT execution method, a processor, and a computing device. The method comprises: a processor responding to an execution request of a fast Fourier transform (FFT) computation of an application program; decomposing the FFT computation into a plurality of computation stages; the processor sequentially executing the plurality of computation stages, wherein when executing a target computation stage, a twiddle factor computation is executed on the basis of a vector operation unit, and a DFT computation is executed on the basis of a matrix operation unit; and after execution of the plurality of computation stages is completed, on the basis of an execution result of the last computation stage, determining an execution result of the FFT computation, and returning the execution result to the application program. By using the present application, a processor can implement an FFT computation on the basis of both a vector operation unit and a matrix operation unit, thereby improving the efficiency of the processor executing the FFT computation.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Shi, Ningzhe
Liu, Ling
Zhou, Yiqing
Shi, Jinglin
Abstract
Provided in the prevent invention is a method for constructing a wireless network resource allocation system. The wireless network resource allocation system is used for obtaining a wireless network resource allocation policy according to the state of a wireless network. The method comprises: S1, acquiring a non-convex optimization objective which corresponds to a wireless communication requirement in an imperfect global channel state information (CSI) environment and has an interruption probability constraint; S2, converting the obtained non-convex optimization objective to obtain a non-convex optimization objective without the interruption probability constraint; S3, acquiring imperfect global CSI of a wireless network; and S4, by taking the obtained non-convex optimization objective as a training objective and taking the imperfect global CSI in step S3 as an input, training an initial resource allocation system by means of reinforcement learning until the initial resource allocation system converges. By means of the present invention, more practical CSI (imperfect global channel state information) is used for training a learning-based initial allocation system, such that the convergence rate of a wireless network resource allocation system is increased, and the performance in completing an optimization objective is improved.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Chen, Yan
Cao, Huan
Zhou, Yiqing
Zhao, Jiawei
Shi, Jinglin
Chen, Daojin
Liu, Zifan
Wang, Longhe
Abstract
The present invention provides a topology prediction model training method applied to a satellite network. A satellite network comprises a plurality of satellites, a plurality of terminals, and a plurality of ground stations. A topology prediction model comprises a spatial-temporal learning layer and a fully connected layer. The method comprises the following steps: S1, acquiring satellite network historical topological graph data in a target area, wherein the satellite network historical topological graph data comprises consecutive satellite network topological graphs at a plurality of moments, the satellite network topological graph at each moment comprises a plurality of nodes and a plurality of edges, the nodes represent terminals, satellites or ground stations, and an edge between any two nodes represents that a link is set between the two nodes; and S2, training the topology prediction model on the basis of the satellite network historical topological graph data acquired in step S1 until the model converges. According to the solution provided in the present invention, weight memory parameters are used to reduce huge storage consumption, thereby greatly reducing storage occupation and ensuring the integrity of network topology information.
INSTITUTE OF COMPUTING TECHNOLOGY , CHINESE ACADEMY OF SCIENCES (China)
Inventor
Chang, Yisong
Peng, Fanzhang
Shan, Liuhao
Zhang, Ke
Bao, Yungang
Abstract
Provided in the present invention are an implementation method and system for an FPGA bare metal server, in order to solve the problems of an abstract description level being relatively low, and the flexibility of the resource management of a cloud FPGA and a usage mode of a tenant being limited, which problems are caused by an existing abstract description method for a cloud FPGA resource only taking into consideration an I/O external device. Thus, in a method for general-purpose cloud computing resources that enables a cloud FPGA resource and an x86 cloud system to have similar usage modes, it is not necessary to customize and develop a new cloud FPGA resource management assembly, and a management method for a cloud FPGA resource is simplified, thereby reducing the complexity of resource management; and a cloud tenant directly applies for, deploys and uses an FPGA system on demand, and it is not necessary to apply for general-purpose computing resources such as x86 in a matched manner, such that the method is a feasible mode for improving the cloud FPGA management and usage flexibility and reducing dependence on the general-purpose x86 computing resources.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Chang, Yisong
Chen, Yuxiao
Zhang, Ke
Chen, Mingyu
Bao, Yungang
Abstract
Provided in the present invention are a cloud native hardware logic simulation FPGA acceleration method and system. The method comprises: constructing a hardware logic simulation acceleration platform on the basis of a loosely-coupled FPGA cluster, and dividing each FPGA node into a static logic area for bearing an acceleration platform providing function, and a plurality of dynamic logic areas which have the same logic resource scale and are used for bearing a target logic circuit to be simulated, wherein a matched customization tool can acquire the current hardware design to be simulated of each tenant of an acceleration platform, a simulation control circuit is inserted, and the matched tool can generate FPGA configuration files which can be deployed in several dynamic logic areas; simulation software, which operates on a tightly-coupled integrated processor in each FPGA node, controlling the operation of the hardware design on the FPGA node, generating simulation data in each dynamic logic area of the FPGA node, and returning state data inside a circuit of the FPGA node to a tenant as a simulation result; and performing simulation data exchange with other FPGA nodes by means of the static logic area of each FPGA node, so as to support large-scale logic circuit simulation operation.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Wang, Huizhe
Huang, Bowen
Zhang, Chuanqi
Wang, Sa
Tang, Dan
Bao, Yungang
Abstract
A cache dynamic division method and system considering both service quality and a utilization rate. Grouping sampling and a hardware sorting network are utilized to count dead block information in real time, grouping sampling enables hardware implementation to be feasible, and the number of dead blocks can be explored to the maximum extent after the counted information is sorted by means of the hardware sorting network. The method also includes directly generating way masks on the basis of quality of service parameters. Thus, only by setting quality of service target parameters, a system user can direct automatic counting of the dead blocks so as to generate corresponding way masks and divide the cache. The quality of service is guaranteed, and meanwhile the cache utilization rate is improved.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Wang, Zhe
Wu, Chenggang
Xie, Mengyao
Abstract
A CET mechanism-based method for protecting the integrity of a general-purpose memory. In the method, the integrity of the general-purpose memory is protected on the basis of a CET mechanism; in order to be compatible with the CET mechanism without conflicting with a shadow stack page maintained by the CET mechanism itself, in the method, a dedicated shadow stack page is provided, and is independent of the shadow stack page maintained by the CET mechanism itself, and overhead reduction processing adaptive to content to be written that is written to the dedicated shadow stack page and requires writing overhead reduction is performed on the content to be written, so as to reduce the number of times of using WRSS instructions, such that the integrity of sensitive data and/or sensitive codes is protected in the case of using lower costs, and performance costs of a processor in the protection of the integrity of the general-purpose memory are reduced, thereby improving the efficiency of processing other tasks by the processor.
G06F 21/52 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure
11.
TRAINING METHOD, SPEECH TRANSLATION METHOD, DEVICE AND COMPUTER-READABLE MEDIUM
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Ye, Rong
Fang, Qingkai
Wang, Mingxuan
Feng, Yang
Abstract
A training method for a speech translation model, a speech translation method, a device and a computer-readable medium. The training method comprises: obtaining a source speech representation sequence corresponding to source speech data and a source text representation sequence corresponding to the source speech data (S102); obtaining a mixed sequence according to the source speech representation sequence and the source text representation sequence (S104); processing the source speech representation sequence by using a speech translation model, so as to output a first probability distribution of first target text, and processing the mixed sequence by using the speech translation model, so as to output a second probability distribution of second target text (S106); calculating a total loss function according to the first probability distribution and the second probability distribution (S108); and training the speech translation model according to the total loss function (S110).
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Cheng, Xueqi
Guo, Jiafeng
Li, Bing
Qiu, Qiang
Zhang, Zhibin
Abstract
A stream data processing method and system based on column-oriented data, comprising: obtaining column-oriented stream data to be processed and a processing task corresponding thereto, dividing the stream data into a batch-type data block on the basis of a time dimension, and allocating a window serial number to each data in the batch-type data block according to a preset window mode; dividing the batch-type block into a plurality of intermediate data blocks, wherein each intermediate data block only comprises data having a same window serial number, and performing pre-aggregation calculation on data of each intermediate data block to generate a pre-aggregated intermediate state; and according to a preset stream data time processing mode, extracting, from an internal memory, a pre-aggregated intermediate state of a window serial number corresponding to a window and executing a processing task corresponding to the pre-aggregation intermediate state, and outputting a task execution result as a stream data processing result. The method improves the throughput of a data analysis scenario on the basis of maintaining low delay by using column-oriented storage and a compute engine in combination with a pre-aggregation technique.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Lu, Hang
Li, Xiaowei
Abstract
The present invention provides a deep learning convolution acceleration method using bit-level sparsity, and a processor. The method comprises: obtaining a plurality of groups of data pairs to be convolved, summing exponents of an activation value and an original weight in each group of data pairs, obtaining the sum of the exponents of each group of data pairs, and selecting the sum of the maximum exponents from all data pairs as a maximum exponent; arranging mantissas of the original weights according to a calculation sequence to form a weight matrix, and uniformly aligning each row in the weight matrix to the maximum exponent to obtain an alignment matrix; removing slack bits in the alignment matrix to obtain a simplified matrix, supplementing vacancies for basic bits of each column of the simplified matrix according to a calculation sequence to form an intermediate matrix, and after removing vacant rows of the intermediate matrix, setting the vacant position of the matrix to be 0 to obtain an interleaved weight matrix; and sending the weight section in each row in the interleaved weight matrix and the mantissa of the corresponding activation value to an addition tree for summation processing, and performing shift addition on a processing result to obtain an output feature map as a convolution result of the plurality of groups of data pairs.
G06F 7/483 - Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
Institute of Computing Technology, Chinese Academy of Sciences (China)
Inventor
Ma, Liuying
Liu, Zhenqing
Xiong, Jin
Jiang, Dejun
Abstract
The present invention proposes a dynamic resources allocation method and system for guaranteeing tail latency SLO of latency-sensitive applications. A plurality of request queues is created in a storage server node of a distributed storage system with different types of requests located in different queues, and thread groups are allocated to the request queues according to logical thread resources of the service node and target tail latency requirements, and thread resources are dynamically allocated in real time, and the thread group of each request queue is bound to physical CPU resources of the storage server node. The client sends an application's requests to the storage server node; the storage server node stores the request in a request queue corresponding to its type, uses the thread group allocated for the current queue to process the application's requests, and sends responses to the client.
Institute of Computing Technology, Chinese Academy of Sciences (China)
Inventor
Zhang, Ke
Wang, Yazhou
Chen, Mingyu
Chang, Yisong
Zhao, Ran
Bao, Yungang
Abstract
A method and system for realizing a FPGA server, wherein centralized monitoring and managing all SoC FPGA compute nodes within the server by a motherboard, the motherboard comprising: a plurality of self-defined management interfaces for connecting the SoC FPGA compute nodes to supply power and data switch to the SoC FPGA compute nodes; a management network switch module for interconnecting the SoC FPGA compute nodes and supplying management; and a core control unit for managing the SoC FPGA compute nodes through the self-defined management interfaces and a self-defined management interface protocol, and acquiring operating parameters of the SoC FPGA compute nodes to manage and monitor the SoC FPGA compute nodes based on the management interface protocol.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Lu, Hang
Li, Hongyan
Li, Xiaowei
Abstract
A hardware-based real-time pruning method and system for a neural network, and a neural network accelerator. The method comprises: acquiring, from a neural network model, a bit matrix to be subjected to matrix multiplication, and taking the Euclidean distance product of each bit row and each bit column of the bit matrix as the significance of each bit row in the bit matrix regarding a matrix multiplication operation; and classifying each bit row of the bit matrix into a significant row or an insignificant row according to the significance, and taking a matrix, which is obtained after bits that are 1 in the insignificant row of the bit matrix are set to 0, as a pruning result of the bit matrix. The method is a pruning method based on valid bits; and by means of a method for determining the validity of bits, pruning is performed without the help of a software level, is independent of an existing software pruning method and supports DNNs of multiple precisions.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Xu, Yinan
Zhou, Yaoyang
Wang, Sa
Tang, Dan
Sun, Ninghui
Bao, Yungang
Abstract
Provided in the present invention are a memory resource dynamic regulation and control method and system based on memory access and performance modeling. By means of technology of guaranteeing quality of service of key applications on real-time multi-core hardware by means of dynamic memory bandwidth resource partitioning, a non-invasive solution with fine granularity, high precision and quick response is provided. In the present invention, an overall architecture of an automatic process performance regulation and control mechanism is designed; by means of a label mechanism, hardware directly acquires the priority of an upper-layer application, so as to provide differentiated hardware resource allocation for processes with different priorities. On the basis of a machine learning method, latency modeling is performed on a bank structure of a dynamic random access memory. For the problem of guaranteeing quality of service of key applications, in a real-time multi-core environment, memory access interference of other processes to a key process can be effectively reduced by means of dynamically adjusting memory bandwidth allocation, such that the quality of service of a high-priority process can be accurately guaranteed.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Xie, Gaogang
Zhang, Xinyi
Zhang, Penghao
Abstract
The disclosure provides a data packet classification method and system based on a convolutional neural network including merging each rule set in a training rule set to form a plurality of merging schemes, and determining an optimal merging scheme for each rule set in the training rule set on the basis of performance evaluation; converting a prefix combination distribution of each rule set in the training rule set and a target rule set into an image, and training a convolutional neural network model by taking the image and the corresponding optimal merging scheme as features; and classifying the target rule set on the basis of image similarity, and constructing a corresponding hash table for data packet classification.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Jin, Hao
Yang, Hongzhang
Tu, Yaofeng
Jiang, Dejun
Han, Yinjun
Guo, Bin
Chen, Fengfeng
Abstract
A high-concurrency protocol stack offloading method and device based on a host-side large-capacity memory, and a medium. The method comprises: obtaining data to be sent, and determining first data volume information of the data (S110); buffering the data to a sending buffer area of a shared memory (S120); and sending the first data volume information to TOE hardware, so that the TOE hardware obtains the data from the shared memory according to the first data volume information, and executes a TOE offload according to the data (S130).
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Ma, Liuying
Liu, Zhenqing
Xiong, Jin
Jiang, Dejun
Abstract
iittttttii in the current window. On the basis of the embodiments of the present invention, the bandwidth of a BE tenant can be maximized, and CPU resources are added quickly and accurately, so as to prevent the occurrence of a target SLO being not satisfied due to an abnormality. Abnormality detection is performed at an appropriate time, and CPU resources are recalculated and re-allocated.
Institute of Computing Technology, Chinese Academy of Sciences (China)
Inventor
Ren, Fei
Liu, Zhiyong
Liu, Yudong
Abstract
The invention relates to a TMB classification method and system and a TMB analysis device based on a pathological image, comprising: performing TMB classification and marking and pre-processing on a known pathological image to construct a training set; training a convolutional neural network by means of the training set to construct a classification model; pre-processing a target pathological image of a target case to obtain a plurality of target image blocks; classifying the target image blocks by means of the classification model to acquire an image block TMB classification result of the target case; and acquiring an image TMB classification result of the target case by means of classification voting using all the image block TMB classification results. The invention further relates to a TMB analysis device based on a pathological image. The TMB classification method of the invention has advantages of accuracy, a low cost and fast rapid.
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
G06V 10/26 - Segmentation of patterns in the image fieldCutting or merging of image elements to establish the pattern region, e.g. clustering-based techniquesDetection of occlusion
G06V 10/774 - Generating sets of training patternsBootstrap methods, e.g. bagging or boosting
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
A61B 5/00 - Measuring for diagnostic purposes Identification of persons
22.
FUSION NEURON MODEL, NEURAL NETWORK STRUCTURE AND TRAINING AND INFERENCE METHODS THEREFOR, STORAGE MEDIUM, AND DEVICE
XI'AN INSTITUTE OF OPTICS AND PRECISION MECHANICS CHINESE ACADEMY OF SCIENCES (China)
INSTITUTE OF COMPUTING TECHNOLOGY CHINESE ACADEMY OF SCIENCES (China)
Inventor
Zhao, Wei
Zang, Dawei
Cheng, Dong
Du, Bingzheng
Xie, Xiaoping
Zhang, Peiheng
Tan, Guangming
Yao, Hongpeng
Abstract
The present invention relates to artificial neuron and neural networks, and in particular to a fusion neuron model, a neural network structure and inference and training methods therefor, a computer readable storage medium, and a computer device. Each synaptic connection weight of the fusion neuron model is any continuously differentiable nonlinear function. linear to nonlinear mapping is implemented on a synaptic weight. The fusion neuron model is used as a basic constituent unit of the neural network structure so as to form a hierarchical structure. The inference method comprises: substituting input data into a connected nonlinear weight function to calculate connection weighted results, summing all the weighted results of a neuron and directly passing same to a next-level neuron and sequentially passing same forward, and finally obtaining a recognition result. The training method comprises: optimizing parameters of a neural model by means of a backpropagation algorithm and a gradient descent algorithm. The computer readable storage medium and the computer device can implement the specific steps of the inference method and the training method.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Zhang, Ke
Yu, Lei
Wang, Yazhou
Chang, Yisong
Zhao, Ran
Chen, Mingyu
Abstract
Provided is a drawer-type high-density FPGA cloud platform case, comprising: an exchange module located at the bottom of the case, a power supply module located on the exchange module, and a drawer structure located on the power supply module, wherein a control board card and FPGA node board cards are arranged in the drawer structure, and the FPGA node board cards are inserted into the control board card by means of preset interfaces; a power transmission end of the power supply module is electrically connected to the exchange module and a power input interface of the control board card; and a network exchange interface of the exchange module is connected to network interfaces of the FPGA node board cards and used for carrying out data interaction between the FPGA node board cards. By means of the present invention, the deployment density of the FPGA node board cards in the FPGA cloud platform case is greatly improved, thereby reducing the wiring cost, mounting and dismounting complexity and maintenance difficulty in the case; a comprehensive and convenient development environment is provided for a user by using a control and management system which is independently researched and developed; and the case and the state of the board cards are monitored in real time, and the number of manually connected wires is reduced by means of the preset interfaces, thereby improving the reliability of the FPGA cloud platform case.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Wang, Qi
Liu, Jianmin
Xu, Yongjun
Wang, Yongqing
Abstract
The present invention provides a network coding method based on deep reinforcement learning. The method comprises: a source node dividing information to be sent into K pieces, determining a coding coefficient of each piece according to a source node coding model, and generating a coded packet and sending same to a next-hop node; and an intermediate node receiving a coded packet sent by a previous node, re-coding the received coded packet, determining a coding coefficient according to an intermediate node coding model, and generating a coded packet and sending same to a next-hop node, wherein the source node coding model and the intermediate node coding model are obtained by means of training a DQN network. By means of the present invention, a coding coefficient can be self-adaptively adjusted according to a dynamic network change, so as to increase the decoding efficiency. The present invention has a good model generalization capability, and can be generalized in networks having different network scales and different link qualities. According to the present invention, respective coding coefficient optimization models are respectively executed on a source node and an intermediate node in a distributed manner, thereby simplifying the implementation of coding coefficient optimization, and improving the stability of DQN training.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Yuan, Chunjing
Liu, Shuzheng
Wang, Yuanyuan
Tian, Lin
Shi, Jinglin
Abstract
pxpxx, mox, tox, tx, mpxpx, triggering the exit of resources for the slices in the slice list of which the occupied resources exceed the protection resource, and after the exit of the resources succeeds, establishing the new service m and the required DRBs, and configuring a protocol stack. On the basis of the embodiments of the present invention, multiple slices share the base station resources, isolation between the slices is ensured, the characteristic of not affecting each other is implemented, and deployment and management costs are reduced.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Zhang, Wenli
Wan, Wenkai
Chen, Mingyu
Abstract
Provided is a white list-based content lock firewall method, comprising: step 200, performing semantic parsing on a data packet load received by a website, and obtain a parsed text of a received data packet; step 300, matching the parsed text of the data packet received by the website with a text pattern library so as to determine whether to forward or intercept the received data packet, the text pattern library comprising value fields and structural features of a plurality of keywords. On the basis of the embodiments of the present invention, for websites with relatively fixed functions, known and new network attacks can be effectively defended against by means of deploying the present firewall, and said firewall can run with loopholes while ensuring normal functions, without high costs for upgrading.
Institute of Computing Technology, Chinese Academy of Sciences (China)
Inventor
Li, Xiaowei
Wei, Xin
Lu, Hang
Abstract
Disclosed embodiments relate to a split accumulator for a convolutional neural network accelerator, comprising: arranging original weights in a computation sequence and aligning by bit to obtain a weight matrix, removing slack bits in the weight matrix, allowing essential bits in each column of the weight matrix to fill the vacancies according to the computation sequence to obtain an intermediate matrix, removing null rows in the intermediate matrix, obtain a kneading matrix, wherein each row of the kneading matrix serves as a kneading weight; obtaining positional information of the activation corresponding to each bit of the kneading weight; divides the kneading weight by bit into multiple weight segments, processing summation of the weight segments and the corresponding activations according to the positional information, and sending a processing result to an adder tree to obtain an output feature map by means of executing shift-and-add on the processing result.
Institute of Computing Technology, Chinese Academy of Sciences (China)
Inventor
Li, Xiaowei
Wei, Xin
Lu, Hang
Abstract
Disclosed embodiments relate to a convolutional neural network accelerator, comprising: arranging original weights in a computation sequence and aligning by bit to obtain a weight matrix, removing slack bits in the weight matrix, allowing essential bits in each column of the weight matrix to fill the vacancies according to the computation sequence to obtain an intermediate matrix, removing null rows in the intermediate matrix, obtain a kneading matrix, wherein each row of the kneading matrix serves as a kneading weight; obtaining positional information of the activation corresponding to each bit of the kneading weight; divides the kneading weight by bit into multiple weight segments, processing summation of the weight segments and the corresponding activations according to the positional information, and sending a processing result to an adder tree to obtain an output feature map by means of executing shift-and-add on the processing result.
Institute of Computing Technology, Chinese Academy of Sciences (China)
Inventor
Li, Xiaowei
Wei, Xin
Lu, Hang
Abstract
Disclosed embodiments relate to a convolutional neural network computing method and system based on weight kneading, comprising: arranging original weights in a computation sequence and aligning by bit to obtain a weight matrix, removing slack bits in the weight matrix, allowing essential bits in each column of the weight matrix to fill the vacancies according to the computation sequence to obtain an intermediate matrix, removing null rows in the intermediate matrix, obtain a kneading matrix, wherein each row of the kneading matrix serves as a kneading weight; obtaining positional information of the activation corresponding to each bit of the kneading weight; divides the kneading weight by bit into multiple weight segments, processing summation of the weight segments and the corresponding activations according to the positional information, and sending a processing result to an adder tree to obtain an output feature map by means of executing shift-and-add on the processing result.
G06F 5/01 - Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising
G06F 7/544 - Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state deviceMethods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using unspecified devices for evaluating functions by calculation
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Ma, Liuying
Liu, Zhenqing
Xiong, Jin
Jiang, Dejun
Abstract
A dynamic resource scheduling method for guaranteeing a latency SLO of a latency-sensitive application, and a system. The method comprises: creating multiple request queues in a service node of a distributed storage system, allocating, according to logic thread resources of the service node and a target tail latency requirement, thread groups to the request queues, and dynamically scheduling the thread resources in real-time, wherein different types of requests are located in different queues, and the respective thread groups of the request queues are bound with physical CPU resources; and a client sending an application access request to the service node, and the service node saving, according to a type of the application access request, the application access request to a request queue corresponding to the type, using the queue as a current queue, processing the application access request by means of the thread group allocated to the current queue, and sending a processing result to the client. The manner described above ensures that latency-sensitive applications have a tail latency meeting a target requirement while maintaining a high bandwidth for batch processing applications.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Zhang, Ke
Yu, Lei
Wang, Yazhou
Chen, Mingyu
Chang, Yisong
Zhao, Ran
Bao, Yungang
Abstract
A method and system for realizing an FPGA server. Centralized monitoring and management of all SoC FPGA computing node resources in a server are realized by means of a main control base board, and the main control base board comprises: a plurality of custom management interfaces respectively being for connecting to SoC FPGA computing nodes and providing a power supply and data exchange for the SoC FPGA computing nodes; a management network exchange module for interconnecting with the SoC FPGA computing nodes and providing a management network; and a core control unit for managing the SoC FPGA computing nodes by means of the custom management interfaces and a custom management interface protocol, and for acquiring operation parameters of the SoC FPGA computing nodes on the basis of the management interface protocol, so as to manage and monitor the SoC FPGA computing nodes. SoC FPGA computing nodes are controlled and supervised more comprehensively, complexly and flexibly by means of a custom management interface protocol, and a management plane network is independent of a user data plane network, such that the broadband performance is improved, and the data security is also improved.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Han, Yinhe
Min, Feng
Xu, Haobo
Wang, Ying
Abstract
Disclosed are a weight data storage method and a convolution computation method that may be implemented in a neural network. The weight data storage method comprises searching for effective weights in a weight convolution kernel matrix and acquiring an index of effective weights. The effective weights are non-zero weights, and the index of effective weights is used to mark the position of the effective weights in the weight convolution kernel matrix. The weight data storage method further comprises storing the effective weights and the index of effective weights. According to the weight data storage method and the convolution computation method of the present disclosure, storage space can be saved, and computation efficiency can be improved.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Pan, Heng
Zhang, Penghao
Li, Zhenyu
Xie, Gaogang
Abstract
A distributed information retrieval method based on in-network computing, comprising: according to retrieval requirements of a user, an agent server sends a retrieval instruction to a retrieval server via a network; the retrieval server performs retrieval to obtain preliminary retrieval results and sends same to the network; aggregate the preliminary retrieval results in the network to obtain aggregated retrieval results and send the aggregated retrieval results to the agent server; and the agent server selects a final retrieval result from the aggregated retrieval results and feeds same back to the user. A programmable switch of the network is used to perform in-network aggregation on the preliminary retrieval results obtained by the retrieval server to reduce the transmission amount of retrieval data in the network, thereby effectively reducing network communication overhead without affecting normal data high-speed forwarding.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Xie, Gaogang
Zhang, Xinyi
Zhang, Penghao
Abstract
A data packet classification method and system based on a convolutional neural network. The method comprises: merging each rule set in a training rule set to form a plurality of merging schemes, and determining an optimal merging scheme for each rule set in the training rule set on the basis of performance evaluation; converting a prefix combination distribution of each rule set in the training rule set and a target rule set into an image, and training a convolutional neural network model by means of taking the image and the corresponding optimal merging scheme as features; and classifying the target rule set on the basis of image similarity, and constructing a corresponding hash table for data packet classification. The method can improve the data packet search performance, increase the data packet search speed, and increase the rule update speed. According to the system, by means of the cooperation of an on-line system and an off-line system, it can be guaranteed that the on-line system realizes the efficient search of a data packet and the rapid updating of a rule set, and the updating of the rule set can be monitored, thereby reflecting the latest state of a network at all times.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Ren, Fei
Liu, Zhiyong
Liu, Yudong
Abstract
The present invention relates to a TMB classification method and system based on a pathological image, and a TMB analysis device based on a pathological image. The method comprises: performing TMB classification and marking and pre-processing on a known pathological image to construct a training set; training a convolutional neural network by means of the training set to construct a classification model; pre-processing a target pathological image of a target case to obtain a plurality of target image blocks; classifying the target image blocks by means of the classification model to obtain image block TMB classification results of the target case; and by means of all the image block TMB classification results, acquiring an image TMB classification result of the target case by means of classification voting. The present invention further relates to a TMB analysis device based on a pathological image. The TMB classification method of the present invention does not depend on samples other than a pathological image, has the advantages of being accurate, having a low cost and being rapid, and is of great value to tumor research.
Institute of Computing Technology, Chinese Academy of Sciences (China)
Inventor
Han, Yinhe
Xu, Haobo
Wang, Ying
Abstract
The present invention provides a processing system for a binary weight convolutional neural network. The system comprises: at least one storage unit for storing data and instructions; at least one control unit for acquiring the instructions stored in the storage unit and sending out a control signal; and, at least one calculation unit for acquiring, from the storage unit, node values of a layer in a convolutional neural network and corresponding binary weight value data and obtaining node values of a next layer by performing addition and subtraction operations. With the system of the present invention, the data bit width during the calculation process of a convolutional neural network is reduced, the convolutional operation speed is improved, and the storage capacity and operational energy consumption are reduced.
Institute of Computing Technology, Chinese Academy of Sciences (China)
Inventor
Tao, Jinhua
Luo, Tao
Liu, Shaoli
Zhang, Shijin
Chen, Yunji
Abstract
The present invention provides a fractal tree structure-based data transmit device and method, a control device, and an intelligent chip. The device comprises: a central node that is as a communication data center of a network-on-chip and used for broadcasting or multicasting communication data to a plurality of leaf nodes; the plurality of leaf nodes that are as communication data nodes of the network-on-chip and for transmitting the communication data to a central leaf node; and forwarder modules for connecting the central node with the plurality of leaf nodes and forwarding the communication data; the central node, the forwarder modules and the plurality of leaf nodes are connected in the fractal tree network structure, and the central node is directly connected to M the forwarder modules and/or leaf nodes, any the forwarder module is directly connected to M the next level forwarder modules and/or leaf nodes.
Institute of Computing Technology, Chinese Academy of Sciences (China)
Inventor
Han, Dong
Luo, Tao
Liu, Shaoli
Zhang, Shijin
Chen, Yunji
Abstract
An example device comprises a central node for receiving vector data returned by leaf nodes, a plurality of leaf nodes for calculating and shifting the vector data, and forwarder modules comprising a local cache structure and a data processing component, wherein the plurality of leaf nodes are divided into N groups, each group having the same number of leaf nodes; the central node is individually in communication connection with each group of leaf nodes by means of the forwarder modules; a communication structure constituted by each group of leaf nodes has self-similarity; the plurality of leaf nodes are in communication connection with the central node in a complete M-way tree approach by means of the forwarder modules of multiple levels; each of the leaf nodes comprises a setting bit.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Tang, Sheng
Wu, Tianyi
Li, Jintao
Abstract
Disclosed is a scene segmentation method based on contextual information guidance. The method comprises: constructing contextual information-based guidance modules by using a residual structure network; taking an original image as input, outputting a primary feature map via a plurality of 3 * 3 convolutional layers; taking the primary feature map as input, outputting an intermediate feature map via a plurality of said guidance modules; taking the intermediate feature map as input, outputting an advanced feature map via a plurality of said guidance modules; and taking the advanced feature map as input, obtaining the scene segmentation result of the original image via a scene segmentation sub-network. The segmentation network designed by the method is small in parameter quantity, and uses, during feature extraction, a global feature extractor to further correct joint features formed by combining local features and the corresponding surrounding context features, which is more beneficial for the model to learn the features of segmentation, thereby greatly improving the performance of the scene segmentation network for existing mobile terminals.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Tang, Sheng
Wu, Tianyi
Li, Jintao
Abstract
A Kronecker convolution-based scene segmentation method, comprising: constructing a Kronecker convolution layer having a residual structure; constructing a feature extraction sub-network on the basis of the Kronecker convolution layer and a standard convolution layer, taking an original image as an input, and outputting an abstract feature map by means of the feature extraction sub-network; constructing a tree-structured feature aggregation module on the basis of the Kronecker convolution layer, taking the abstract feature map as an input, and outputting an aggregation feature map by means of the tree-structured feature aggregation module; and taking said aggregation feature map as an input, and outputting a scene segmentation result of the original image by means of the scene segmentation sub-network.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Xiao, Hang
Han, Yinhe
Wang, Ying
Lian, Shiqi
Abstract
An inverse kinematics solution system for use with a robot, which is used for obtaining a joint angle value corresponding to a target pose value on the basis of an inputted target pose value and degree of freedom of a robot and which comprises: a parameters initialization module, an inverse kinematics scheduler, a Jacobian calculating unit, a pose updating unit and a parameters selector. The system is implemented by means of hardware and may quickly obtain motion parameters, which are used for controlling a robot, while reducing power consumption.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Li, Xiaowei
Wei, Xin
Lu, Hang
Abstract
The present invention relates to a convolutional neural network calculation method and system based on weight kneading. The method comprises: arranging original weights in a calculation order and performing bitwise alignment on same to obtain a weight matrix; removing slack bits in the weight matrix to obtain a reduced matrix with vacancies, and making basic bits in each column of the reduced matrix to fill the vacancies according to the calculation order to obtain an intermediate matrix; removing empty rows in the intermediate matrix, and performing zero setting on the vacancies of the intermediate matrix to obtain a kneading matrix, wherein each row of the kneading matrix serves as a kneading weight; obtaining, according to a correlation between activation values and the basic bits in the original weights, position information of an activation value corresponding to each bit in the kneading weight; sending the kneading weight to a splitting accumulator, and the splitting accumulator performing bitwise segmentation on the kneading weight to form multiple weight segments, performing summation processing on the weight segments and the corresponding activation values according to the position information, and sending a processing result to an adder tree; and obtaining an output characteristic pattern by means of executing shift and addition on the processing result.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Li, Xiaowei
Wei, Xin
Lu, Hang
Abstract
The present invention relates to a split accumulator for a convolutional neural network accelerator, configured to sort and bitwise align original weights in a calculation order to obtain a weight matrix, remove relaxed bits in the weight matrix to obtain a reduced matrix having vacant positions, make basic bits in each column of the reduced matrix fill the vacant positions in the calculation order to obtain an intermediate matrix, remove vacant rows in the intermediate matrix, and null the vacant positions of the intermediate matrix to obtain a kneading matrix, each row of the kneading matrix serving as a kneading weight; obtain position information of each bit in the kneading weight corresponding to an activation value according to a correspondence between the activation value and the basic bit in the original weight; and send the kneading weight to a split accumulator, bitwise segment the kneading weight into a plurality of weight segments by the split accumulator, perform summation processing on the weight segment and the corresponding activation value according to the position information, send the processing result to an adder tree, and perform shift addition on the processing result to obtain an output feature map.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Li, Xiaowei
Wei, Xin
Lu, Hang
Abstract
A convolutional neural network accelerator, comprising: arranging original weights in calculation order and aligning same by bit to obtain a weight matrix, removing a slack bit in the weight matrix to obtain a reduced matrix with vacancies, filling the vacancies in calculation order with basic bits in each column of the reduced matrix to obtain an intermediate matrix, and removing empty rows in the intermediate matrix and setting vacancies of the intermediate matrix to zero so as to obtain a kneading matrix, wherein each row of the kneading matrix serves as a kneading weight; according to a correlation between an activation value and the basic bits in the original weight, obtaining position information of the activation value corresponding to each bit in the kneading weight; and sending the kneading weight to a split accumulator, the split accumulator dividing the kneading weight into multiple weight segments by bit, finding, according to the position information, the sum of the weight segment and the corresponding activation value and sending the processing result to an addition tree, and obtaining an output feature map by performing bit shift addition on a processing result.
Institute of Computing Technology, Chinese Academy of Sciences (China)
Inventor
Tao, Jinhua
Luo, Tao
Liu, Shaoli
Zhang, Shijin
Chen, Yunji
Abstract
One example of a device comprises: a central node that is as a communication data center of a network-on-chip; a plurality of leaf nodes that are as communication data nodes of the network-on-chip and for transmitting the communication data to a central leaf node; forwarder modules for connecting the central node with the plurality of leaf nodes and forwarding the communication data, wherein the plurality of leaf nodes are divided into N groups, each group having the same number of leaf nodes, the central node is individually in communication connection with each group of leaf nodes by means of the forwarder module, a communication structure constituted by each group of leaf nodes has self-similarity, and the plurality of leaf nodes are in communication connection with the central node in a complete multi-way tree approach by means of the forwarder modules of multiple levels.
Institute of Computing Technology, Chinese Academy of Sciences (China)
Inventor
Han, Yinhe
Xu, Haobo
Wang, Ying
Abstract
The present invention relates to a weight management method and system for neural network processing. The method includes two stages, i.e., off-chip encryption stage and on-chip decryption stage: encrypting trained neural network weight data in advance, inputting the encrypted weight into a neural network processor chip, and decrypting the weight in real time by a decryption unit inside the neural network processor chip to perform related neural network calculation. The method and system realizes the protection of weight data without affecting the normal operation of a neural network processor.
H04L 9/06 - Arrangements for secret or secure communicationsNetwork security protocols the encryption apparatus using shift registers or memories for blockwise coding, e.g. D.E.S. systems
G06N 3/063 - Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Han, Yinhe
Min, Feng
Xu, Haobo
Wang, Ying
Abstract
A neural network processor and a method for performing bit conversion on data of a neural network using the neural network processor. The neural network processor comprises a bit conversion device (101); the bit conversion device (101) comprises an input interface (102), a control unit (105), a data conversion unit (103), and an output interface (104); the control unit (105) is configured to generate a control signal for the data conversion unit (103); the input interface (102) is configured to receive original data; the data conversion unit (103) is configured to perform bit conversion on the original data according to the control signal to convert the original data into a bit conversion result expressed by fewer bits; the output interface (104) is configured to output the bit conversion result from the bit conversion device (101). By means of the method, the number of bits used for expressing data can be reduced, the hardware costs and power consumption required for calculation can be reduced, and the calculation speed can be increased.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Han, Yinhe
Min, Feng
Xu, Haobo
Wang, Ying
Abstract
A weight data storage method and a convolutional calculation method in a neural network. The weight data storage method comprises: searching for an effective weight in a weight convolutional kernel matrix and acquiring an effective weight index, wherein the effective weight is a non-zero weight, and the effective weight index is used for marking the position of the effective weight in the weight convolutional kernel matrix; and storing the effective weight and the effective weight index. By means of the weight data storage method and the convolutional calculation method, storage space can be saved, and the calculation efficiency can be improved.
Institute of Computing Technology, Chinese Academy of Sciences (China)
Inventor
Du, Zidong
Guo, Qi
Chen, Tianshi
Chen, Yunji
Abstract
The present disclosure provides a neural network processing system that comprises a multi-core processing module composed of a plurality of core processing modules and for executing vector multiplication and addition operations in a neural network operation, an on-chip storage medium, an on-chip address index module, and an ALU module for executing a non-linear operation not completable by the multi-core processing module according to input data acquired from the multi-core processing module or the on-chip storage medium, wherein the plurality of core processing modules share an on-chip storage medium and an ALU module, or the plurality of core processing modules have an independent on-chip storage medium and an ALU module. The present disclosure improves an operating speed of the neural network processing system, such that performance of the neural network processing system is higher and more efficient.
G06N 3/06 - Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
G06F 7/57 - Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups or for performing logical operations
G06F 15/78 - Architectures of general purpose stored program computers comprising a single central processing unit
G06N 3/04 - Architecture, e.g. interconnection topology
G06N 3/063 - Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
50.
NONVOLATILE MEMORY BASED COMPUTING DEVICE AND USE METHOD THEREFOR
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Han, Yinhe
Xu, Haobo
Wang, Ying
Abstract
Provided in the present invention is a nonvolatile memory based computing device, comprising: a processor; integrated on the processor, an on-chip memory and/or a memory; and an energy storing device, used to store electric energy when powered on and provide the electric energy so as to store unsaved data on the processor to the on-chip memory and/or the memory when powered off. The on-chip memory and/or the memory use(s) a nonvolatile memory having a read-write speed measured by nanoseconds to provide the processor a memory access to data performing an operation.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
51.
COMPRESSION APPARATUS USED FOR DEEP NEURAL NETWORK
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Weng, Kaiheng
Han, Yinhe
Wang, Ying
Abstract
Provided is an acceleration system used for a deep neural network. The system comprises: a 3D memory, a deep neural network calculation unit connected to a memory controller on a logic layer of a vault of the 3D memory, a router connected to the memory controller, and a compressor and a decompressor, wherein the memory controller of each vault carries out data transmission via the router connected to the memory controller and by means of a network-on-chip; and the compressor is used for compressing data to be compressed which needs to be transmitted in the network-on-chip and is used for the deep neural network, and the decompressor is used for decompressing data to be decompressed which comes from the network-on-chip and is used for the deep neural network.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Chen, Tianshi
Du, Zidong
Guo, Qi
Chen, Yunji
Abstract
The present invention is directed to the storage technical field and discloses an on-chip data partitioning read-write method, the method comprises: a data partitioning step for storing on-chip data in different areas, and storing the on-chip data in an on-chip storage medium and an off-chip storage medium respectively, based on a data partitioning strategy; a pre-operation step for performing an operational processing of an on-chip address index of the on-chip storage data in advance when implementing data splicing; and a data splicing step, for splicing the on-chip storage data and the off-chip input data to obtain a representation of the original data based on a data splicing strategy. Also provided are a corresponding on-chip data partitioning read-write system and device. Thus, read and write of repeated data can be efficiently realized, reducing memory access bandwidth requirements while providing good flexibility, thus reducing on-chip storage overhead.
Institute of Computing Technology, Chinese Academy of Sciences (China)
Inventor
Du, Zidong
Guo, Qi
Chen, Tianshi
Chen, Yunji
Abstract
A neural network accelerator and an operation method thereof applicable in the field of neural network algorithms are disclosed. The neural network accelerator comprises an on-chip storage medium for storing data externally transmitted or for storing data generated during computing; an on-chip address index module for mapping to a correct storage address on the basis of an input index when an operation is performed; a core computing module for performing a neural network operation; and a multi-ALU device for obtaining input data from the core computing module or the on-chip storage medium to perform a nonlinear operation which cannot be completed by the core computing module. By introducing a multi -ALU design into the neural network accelerator, an operation speed of the nonlinear operation is increased, such that the neural network accelerator is more efficient.
G06N 3/063 - Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
G06F 7/575 - Basic arithmetic logic units, i.e. devices selectable to perform either addition, subtraction or one of several logical operations, using, at least partially, the same circuitry
54.
Method and device for on-chip repetitive addressing
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCE (China)
Inventor
Guo, Qi
Chen, Tianshi
Chen, Yunji
Abstract
A method may include: partitioning data on an on-chip and/or an off-chip storage medium into different data blocks according to a pre-determined data partitioning principle, wherein data with a reuse distance less than a pre-determined distance threshold value is partitioned into the same data block; and a data indexing step for successively loading different data blocks to at least one on-chip processing unit according a pre-determined ordinal relation of a replacement policy, wherein the repeated data in a loaded data block being subjected to on-chip repetitive addressing. Data with a reuse distance less than a pre-determined distance threshold value is partitioned into the same data block, and the data partitioned into the same data block can be loaded on a chip once for storage, and is then used as many times as possible, so that the access is more efficient.
G06F 12/06 - Addressing a physical block of locations, e.g. base addressing, module addressing, address space extension, memory dedication
G06F 12/123 - Replacement control using replacement algorithms with age lists, e.g. queue, most recently used [MRU] list or least recently used [LRU] list
55.
Fractal-tree communication structure and method, control apparatus and intelligent chip
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCE (China)
Inventor
Lan, Huiying
Luo, Tao
Liu, Shaoli
Zhang, Shijin
Chen, Yunji
Abstract
A communication structure comprises: a central node that is a communication data center of a network-on-chip and used for broadcasting or multicasting communication data to a plurality of leaf nodes; a plurality of leaf nodes that are communication data nodes of the network-on-chip and used for transmitting the communication data to the central node; and forwarder modules for connecting the central node with the plurality of leaf nodes and forwarding the communication data, wherein the plurality of leaf nodes are divided into N groups, each group having the same number of leaf nodes, the central node is individually in communication connection with each group of leaf nodes by means of the forwarder modules, the communication structure is a fractal-tree structure, the communication structure constituted by each group of leaf nodes has self-similarity, and the forwarder modules comprises a central forwarder module, leaf forwarder modules, and intermediate forwarder modules.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Xiao, Hang
Han, Yinhe
Wang, Ying
Lian, Shiqi
Abstract
An inverse kinematics solution system for use with a robot, which is used for obtaining a joint angle value corresponding to a target pose value on the basis of an inputted target pose value and degree of freedom of a robot and which comprises: a parameters initialization module, an inverse kinematics scheduler, a Jacobian calculating unit, a pose updating unit and a parameters selector. The system is implemented by means of hardware and may quickly obtain motion parameters, which are used for controlling a robot, while reducing power consumption.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Han, Yinhe
Xu, Haobo
Wang, Ying
Abstract
The invention provides a processing system for a binary weight convolutional network. The system comprises: at least one storage unit for storing data and an instruction; at least one control unit for obtaining the instruction stored in the storage unit and sending out a control signal; and at least one calculation unit for obtaining the node value of a layer in a convolutional network and corresponding binary weight value data from the storage unit and performing adding and subtraction operations to obtain the node value of the next layer. According to the system in the present invention, the data bit width during a convolutional network calculation process can be reduced; the convolutional operation speed can be increased; and the storage capacity and working energy consumption can be reduced.
Institute of Computing Technology, Chinese Academy of Sciences (China)
Inventor
Zhang, Shijin
Luo, Tao
Liu, Shaoli
Chen, Yunji
Abstract
The present disclosure provides a quick operation device for a nonlinear function, and a method therefor. The device comprises: a domain conversion part for converting an input independent variable into a corresponding value in a table lookup range; a table lookup part for looking up a slope and an intercept of the corresponding piecewise linear fitting based on the input independent variable or an independent variable processed by the domain conversion part; and a linear fitting part for obtaining, a final result in a way of linear fitting based on the slope and the intercept obtained, by means of table lookup, by the table lookup part. The present disclosure solves the problems of slow operation speed, large area of the operation device, and high power consumption caused by the traditional method.
G06F 5/01 - Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising
G06F 7/57 - Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups or for performing logical operations
Institute of Computing Technology, Chinese Academy of Sciences (China)
Inventor
Li, Zhen
Liu, Shaoli
Zhang, Shijin
Luo, Tao
Qian, Cheng
Chen, Yunji
Chen, Tianshi
Abstract
The present disclosure provides a data accumulation device and method, and a digital signal processing device. The device comprises: an accumulation tree module for accumulating input data in the form of a binary tree structure and outputting accumulated result data; a register module including a plurality of groups of registers and used for registering intermediate data generated by the accumulation tree module during an accumulation process and the accumulated result data; and a control circuit for generating a data gating signal to control the accumulation tree module to filter the input data not required to be accumulated, and generating a flag signal to perform the following control: selecting a result obtained after adding one or more of intermediate data stored in the register to the accumulated result as output data, or directly selecting the accumulated result as output data. Thus, a plurality of groups of input data can be rapidly accumulated to a group of sums in a clock cycle. At the same time, the accumulation device can flexibly select to simultaneously accumulate some data of the plurality of input data by means of a control signal.
G06F 7/509 - AddingSubtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination for multiple operands, e.g. digital integrators
Institute of Computing Technology, Chinese Academy of Sciences (China)
Inventor
Li, Zhen
Liu, Shaoli
Zhang, Shijin
Luo, Tao
Qian, Cheng
Chen, Yunji
Chen, Tianshi
Abstract
The present disclosure discloses an adder device, a data accumulation method and a data processing device. The adder device comprises: a first adder module provided with an adder tree unit, composed of a multi-stage adder array, and a first control unit, wherein the adder tree unit accumulates data by means of step-by-step accumulation based on a control signal of the first control unit; a second adder module comprising a two-input addition/subtraction operation unit and a second control unit, and used for performing an addition or subtraction operation on input data; a shift operation module for performing a left shift operation on output data of the first adder module; an AND operation module for performing an AND operation on output data of the shift operation module and output data of the second adder module; and a controller module.
G06F 7/509 - AddingSubtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination for multiple operands, e.g. digital integrators
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Han, Yinhe
Xu, Haobo
Wang, Ying
Abstract
The invention relates to a double cameras-based image processing device, comprising first and second cameras and a control module. The first camera is used for capturing an overall image. The control module is used for sending a shooting instruction to the second camera; and the second camera captures a local image according to the received shooting instruction. The device is characterized in that the shooting instruction includes shooting information of the first camera and image information of the overall image. With the present invention, the imaging effect of a local object can be improved, and representation of details of the whole image can be enhanced.
Intitute of Computing Technology, Chinese Academy of Sciences (China)
Inventor
Li, Zhen
Liu, Shaoli
Zhang, Shijin
Luo, Tao
Qian, Cheng
Chen, Yunji
Chen, Tianshi
Abstract
The present disclosure provides an operation apparatus and method for an acceleration chip for accelerating a deep neural network algorithm. The apparatus comprises: a vector addition processor module and a vector function value arithmetic unit and a vector multiplier-adder module wherein the three modules execute a programmable instruction, and interact with each other to calculate values of neurons and a network output result of a neural network, and a variation amount of a synaptic weight representing the interaction strength of the neurons on an input layer to the neurons on an output layer; and the three modules are all provided with an intermediate value storage region and perform read and write operations on a primary memory.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Han, Yinhe
Xu, Haobo
Wang, Ying
Abstract
The present invention relates to a weight management method and system for neural network processing. The method is divided into two stages: off-chip encryption and on-chip decryption, and comprises: encrypting trained neural network weight data in advance, then inputting an encrypted weight into a neural network processor chip, and decrypting, in real time, the weight by means of a decryption unit located inside the neural network processor chip, so as to carry out relevant neural network computing. The method and the system realize the protection of weight data without affecting the normal computing of a neural network processor.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Han, Yinhe
Xu, Haobo
Wang, Ying
Abstract
Disclosed are an automated design method and system applicable for a neural network processor. The method comprises: acquiring a topological structure configuration file of a neural network model and a hardware resource constraint file of a target hardware circuit; constructing, according to the topological structure configuration file of the neural network model and the hardware resource constraint file, a hardware architecture and descriptor file thereof, of a neural network processor, corresponding to the neural network model, and a control descriptor file for controlling data scheduling, storage and computing of the neural network processor, and then based on the hardware architecture descriptor file and the control descriptor file, generating a hardware descriptor code with the neural network processor, so as to realize a hardware circuit of the neural network processor on the target hardware circuit. The system and method realize an automated design of a neural network processor, shorten the designing cycle of the neural network processor, and adapt to the characteristics of the rapid update of network models in neural network technology and the requirement of high operating speed.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Han, Yinhe
Xu, Haobo
Wang, Ying
Abstract
Disclosed are an automated design method and system for a neural network processor, comprising a neural network model realized by means of a hardware circuit. The method comprises: acquiring a descriptor file of the neural network model and a hardware resource constraint parameter of a target hardware circuit; establishing mapping between a structure of the neural network model and the target hardware circuit, and a control instruction stream and an address access stream corresponding to the neural network model; and then generating, according to the mapping and the control instruction stream and the address stream, a hardware descriptor language code of a neural network processor corresponding to the neural network model, so as to realize the hardware circuit of the neural network processor on the target hardware circuit. The system and method realize an automated design of a neural network processor, shorten the designing cycle of the neural network processor, adapt to the characteristics of the rapid update of network models in neural network technology and the requirement of high operating speed.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Tian, Lin
Zhuo, Ruilian
Zhou, Yiqing
Shi, Jinglin
Abstract
Disclosed is a method for constructing a task processing path in a plurality of processing resource pools. The method comprises: determining a task processing path based on a task requirement, wherein the task processing path includes a processing unit selected from a plurality of resource pools; searching for a destination address corresponding to the processing unit based on the task processing path; and sending a notification message to the processing unit, wherein the notification message includes a destination address to be sent by a data flow of the processing unit. The task processing path can be effectively constructed as needed, so that the configuration of a switching path for transmitting a data flow is completed.
TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED (China)
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Zhou, Ganbin
Lin, Fen
Lu, Yanxiong
Cao, Rongyu
Luo, Ping
Chen, Bo
Abstract
Embodiments of the present application provide a session processing method and apparatus. In the embodiments of the present application, session content is obtained, session texts corresponding to the session content are obtained, the session texts are classified to a plurality of session groups according to different intentions and/or subjects, and session texts of the session groups are analyzed, to generate corresponding session abstracts.
G06F 17/27 - Automatic analysis, e.g. parsing, orthograph correction
G06F 17/30 - Information retrieval; Database structures therefor
G06F 3/023 - Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Du, Zidong
Guo, Qi
Chen, Tianshi
Chen, Yunji
Abstract
A method and system for processing a neural network processing system (100, 200) applied to the technical field of computers. The neural network processing system (100, 200) comprises a multi-core processing module (30) composed of a plurality of core processing modules (31, 203), an on-chip storage medium (10, 201), an on-chip address index module (20, 202) and an ALU module (40, 204), wherein the multi-core processing module (30) is used for executing vector multiplication and addition operations in neural network computing, the ALU module (40, 204) is used for acquiring, from the multi-core processing module (30) or the on-chip storage medium (10, 201), input data to execute a non-linear operation not completable by the multi-core processing module, and the plurality of core processing modules (31) share an on-chip storage medium (10) and an ALU module (40), or the plurality of core processing modules (203) have an independent on-chip storage medium (201) and an ALU module (204). The method and system introduce a multi-core design in a neural network processing system, so as to improve the computing speed of a neural network processing system so that the performance of a neural network processing system is higher and more efficient.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Guo, Qi
Chen, Tianshi
Chen, Yunji
Abstract
A method for on-chip repetitive addressing and a corresponding device. The method comprises: a data partitioning step: partitioning data on an on-chip storage medium and/or an off-chip storage medium into different data blocks according to a pre-determined data partitioning principle, wherein the data partitioning principle comprises partitioning data with a reuse distance less than a pre-determined distance threshold value into the same data block; and a data indexing step: according a pre-determined ordinal relation of a replacement policy, successively loading different data blocks to at least one on-chip processing unit, wherein the repeated data in the loaded data block performs on-chip repetitive addressing. By means of the method for on-chip repetitive addressing and the corresponding device, data with a reuse distance less than a pre-determined distance threshold value is partitioned into the same data block, wherein the data partitioned into the same data block can be loaded on a chip once for storage, and is then used as many times as possible, so that the memory access is more efficient.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Chen, Tianshi
Du, Zidong
Guo, Qi
Chen, Yunji
Abstract
An on-chip data partitioning read-write method, comprising: a data partitioning step: on the basis of a data partitioning policy, storing the on-chip data in different areas, respectively storing same in an on-chip storage medium and an off-chip storage medium (S701); a pre-operation step: when implementing data splicing, first implementing operation processing of the on-chip address index of the data stored on-chip (S702); and a data spicing step: on the basis of a data splicing policy, splicing the data stored on-chip and the off-chip input data to obtain a representation of the original data (S703). Also provided are a corresponding on-chip data partitioning read-write system (100) and apparatus. Thus, reading and writing of repeated data can be efficiently implemented, reducing memory access bandwidth requirements, and simultaneously providing good flexibility, thus reducing on-chip storage overhead.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Du, Zidong
Guo, Qi
Chen, Tianshi
Chen, Yunji
Abstract
A neural network accelerator (100) and an operation method thereof applicable in the field of neural network algorithms. The neural network accelerator (100) comprises an on-chip storage medium (10), an on-chip address index module (20), a core computing module (30) and a multi-ALU apparatus (40), wherein the on-chip storage medium (10) is used for storing externally provided data or for storing data generated during a calculation; the on-chip data index module (20) is used for mapping to a correct storage address on the basis of an input index when an operation is performed; the core computing module (30) is used for performing the neural network operation; and the multi-ALU apparatus (40) is used for obtaining input data from the core computing module (30) or the on-chip storage medium (10) to perform a nonlinear operation which cannot be completed by the core computing module (30). By introducing the multi-ALU design into the neural network accelerator (100), the operation speed of the nonlinear operation is increased, thereby making the neural network accelerator (100) more efficient.
G06F 7/575 - Basic arithmetic logic units, i.e. devices selectable to perform either addition, subtraction or one of several logical operations, using, at least partially, the same circuitry
72.
NEURAL NETWORK COMPUTING METHOD, SYSTEM AND DEVICE THEREFOR
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Du, Zidong
Guo, Qi
Chen, Tianshi
Chen, Yunji
Abstract
A neural network computing method, system and device therefor, to be applied in the technical field of computers. The computing method comprises the following steps: A. dividing a neural network into a plurality of subnetworks having consistent internal data characteristics (S901); B. computing each of the subnetworks to obtain a first computation result for each subnetwork (S902); and C. computing a total computation result of the neural network on the basis of the first computation results of each subnetwork (S903). By means of the method, the computing efficiency of the neutral network is improved.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
SHENZHEN VCYBER TECHNOLOGY CO., LTD (China)
Inventor
Chen, Yiqiang
Yang, Xiaodong
Yu, Hanchao
Zeng, Hui
Huang, Weicheng
Zhong, Xi
Hu, Ziang
Abstract
An ultrasonic wave-based air gesture recognition method and system. The method comprises: using a pre-trained palm movement trend model to recognize palm movement trends on the basis of collected ultrasonic wave signals reflected back by a human hand to obtain a palm movement trend timing sequence comprising a series of palm movement trends, wherein the palm movement trend model is a model obtained by training according to acoustic characteristics of the ultrasonic wave signals reflected back by the human hand and used for recognizing the palm movement trends; and performing gesture recognition on the obtained palm movement trend timing sequence by using a pre-trained gesture recognition model, wherein the gesture recognition model is a model obtained by training according to a training data set consisting of the palm movement trend timing sequence and used for recognizing a gesture. The method and system are applicable to a smart mobile terminal, and can achieve both high precision and high robustness of gesture recognition.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Tao, Jinhua
Luo, Tao
Liu, Shaoli
Zhang, Shijin
Chen, Yunji
Abstract
The present invention provides a fractal tree structure-based data release device and method, a control apparatus, and an intelligent chip. The device comprises: a central node that is a communication data center of a network on chip and used for broadcasting or multicasting communication data to multiple leaf nodes; multiple leaf nodes that are communication data nodes of the network on chip and used for transmitting the communication data to a central leaf node; a forwarder module used for connecting the central node with the multiple leaf nodes and forwarding the communication data, wherein the multiple leaf nodes are grouped into N groups having the same number of leaf nodes; the central node is individually in communication connection with each group of leaf nodes by means of the forwarder module; a communication structure constituted by each group of leaf nodes has self-similarity; the multiple leaf nodes are in communication connection with the central node in a complete multi-way tree approach by means of multiple layers of forwarder modules.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Han, Dong
Luo, Tao
Liu, Shaoli
Zhang, Shijin
Chen, Yunji
Abstract
The present invention provides a device for a vector data returning processing unit in a fractal tree, a method using the device, a control apparatus, and an intelligent chip. The device comprises a central node for receiving vector data returned by leaf nodes, multiple leaf nodes for calculating and shifting the vector data, and a forwarder module comprising a local cache structure and a data processing component, wherein the multiple leaf nodes are grouped into N groups having the same number of leaf nodes; the central node is individually in communication connection with each group of leaf nodes by means of the forwarder module; a communication structure constituted by each group of leaf nodes has self-similarity; the multiple leaf nodes are in communication connection with the central node in a complete M-way tree approach by means of multiple layers of forwarder modules; each leaf node comprises a setting bit; if the setting bits require that the vector data in the leaf nodes are shifted, the leaf nodes shift the vector data of preset bandwidth bits to corresponding positions, and otherwise, the leaf nodes return the vector data to the central node.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Lan, Huiying
Luo, Tao
Liu, Shaoli
Zhang, Shijin
Chen, Yunji
Abstract
A fractal-tree communication structure and method, a control device and an intelligent chip. The communication structure comprises: a central node that is a communication data center of a network on chip and used for broadcasting or multicasting communication data to multiple leaf nodes; multiple leaf nodes that are communication data nodes of the network on chip and used for transmitting the communication data to the central leaf node; and a forwarder module used for connecting the central node with the multiple leaf nodes and forwarding the communication data, wherein the multiple leaf nodes are divided into N groups having the same number of leaf nodes; the central node is individually in communication connection with each group of leaf nodes by means of the forwarder module; the communication structure is a fractal tree structure; the communication structure constituted by each group of leaf nodes has self-similarity; the forwarder module comprises a central forwarder module, a leaf forwarder module, and an intermediate forwarder module.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Zhang, Shijin
Luo, Tao
Liu, Shaoli
Chen, Yunji
Abstract
A device and a method for automatically correcting accessed data of a storage device, relating to the technical fields of data storage and data correction etc. The device comprises: a storage device module for storing data, the storage device module comprising an area for storing data and an area for storing a supervision bit; an encoder module for acquiring data and generating a corresponding supervision bit according to the data; a decoder module for checking, when the storage device module reads the data, the correctness of the data according to the supervision bit, sending, when it is found that incorrect data exist in the data, an error signal and meanwhile correcting the incorrect data, and sending the corrected data to a read-write unit, the read-write unit rewriting the corrected data back into the storage device, so as to avoid the increase of data error.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Zhang, Shijin
Luo, Tao
Liu, Shaoli
Chen, Yunji
Abstract
Disclosed are a device and a method for refreshing a DRAM or an eDRAM, the method comprising: step one, a storage control device receiving a reading/writing request, and determining, according to the output of a refresh control device, to send to a storage device the reading/writing request or a refreshing request; step two, the refresh control device controlling the generation of a refreshing signal, and recording, according to the output of the storage control device, whether the refresh is delayed. The present invention is able to reduce the conflict between reading/writing and refreshing, achieving the effect of improving the performance of the DRAM or eDRAM.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Li, Zhen
Liu, Shaoli
Zhang, Shijin
Luo, Tao
Qian, Cheng
Chen, Yunji
Chen, Tianshi
Abstract
A data accumulation apparatus and method, and a digital signal processing device. The apparatus comprises: an accumulation tree module for accumulating input data in the form of a binary tree structure and outputting accumulated result data; a registration module containing a plurality of groups of registers and used for registering intermediate data generated by the accumulation tree module during an accumulation process and the accumulated result data; and a control circuit for generating a data strobe signal to control the accumulation tree module to filter the input data not required to be accumulated, and generating a flag sign signal to perform the following control: selecting a result obtained after adding one or more pieces of intermediate data stored in the register to the accumulation result as output data, or directly selecting the accumulated result as output data. Thus, a plurality of groups of input data can be rapidly accumulated to a group of sum values in a clock cycle. At the same time, the accumulation apparatus can flexibly choose to simultaneously accumulate partial data of the plurality of pieces of input data by means of a control signal.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Li, Zhen
Liu, Shaoli
Zhang, Shijin
Luo, Tao
Qian, Cheng
Chen, Yunji
Chen, Tianshi
Abstract
An adder device, a data accumulation method and a data processing device. The adder device (100) comprises: a first adder module (110) provided with an adder tree unit, composed of a multi-stage adder array, and a first control unit, wherein the adder tree unit accumulates data by means of step-by-step accumulation based on a control signal of the first control unit; a second adder module (120) comprising a two-input addition and subtraction operation unit and a second control unit, and used for performing an addition or subtraction operation on input data; a shift operation module (130) used for performing a left shift operation on output data of the first adder module; an AND operation module (140) used for performing an AND operation on output data of the shift operation module and output data of the second adder module; and a controller module (150) used for controlling the data input of the first adder module (110) and the second adder module (120), controlling a shift operation of the shift operation module (130), and controlling the transmission of control signals of the first control unit and the second control unit. Thus, quick accumulation of data is achieved.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Li, Zhen
Liu, Shaoli
Zhang, Shijin
Luo, Tao
Qian, Cheng
Chen, Yunji
Chen, Tianshi
Abstract
A pipeline data synchronization apparatus and method for a multi-input multi-output processor. The apparatus comprises: a multi-input multi-output function unit (6) having multiple operating pipeline levels for executing an operation on input operands to respond to an instruction; and a pipeline controller (5) for receiving an instruction, parsing the input operands required by the instruction, determining validity of the input operands, sending the instruction to the multi-input multi-output function unit if the input operands are all valid, and sending a dummy instruction to the multi-input multi-output function unit if at least one input operand is invalid; the pipeline controller (5) receives an output request of the multi-input multi-output function unit (6), determines feasibility thereof, receives, if the output request is feasible, the output request within a takt period of a chip and forwards the output request to memories (1, 2, 3, 4) within a certain time, and stops the output of the multi-input multi-output function unit (6) if the output request is infeasible. Thus, the invention not only solves a pipeline synchronization problem of the multi-input multi-output function unit (6), but also greatly reduces memory access costs of a processor and improves memory access efficiency of the processor.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Li, Zhen
Liu, Shaoli
Zhang, Shijin
Luo, Tao
Qian, Cheng
Chen, Yunji
Chen, Tianshi
Abstract
Disclosed are a pipeline-level computation apparatus, a data processing method and a network-on-chip chip. The pipeline-level computation apparatus comprises three pipeline-level modules, wherein a first pipeline-level module performs a vector addition or subtraction computation on data from a first input cache register module and a maximum value index thereof, a second pipeline-level module solves a derivative value for input data and solves an activation function, and a third pipeline-level module performs multiplication and addition operations on the input data. The apparatus selectively executes any computation process within the computation processes or any two or three combined computation processes in the computation processes in first, second and third pipeline-level modules according to a computation operation designated by a program instruction, and a third cache register outputs a final computation result. Thus, the working efficiency of a chip is improved and data throughput is high, so that the chip achieves the best computation performance.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Liu, Daofu
Zhou, Shengyuan
Chen, Yunji
Abstract
A data ranking apparatus and method implemented by hardware, and a data processing chip comprising the data ranking apparatus which can be applied to an accelerator. The data ranking apparatus comprises: a register group (11) for saving K pieces of temporarily ranked maximum or minimum data in a data ranking process, wherein the register group comprises a plurality of registers (102, 103, 104, 105) connected in parallel and two adjacent registers unidirectionally transmit data from a low level to a high level; a comparator group (12), which comprises a plurality of comparators (106, 107, 108, 109) connected to the registers on a one-to-one basis, compares a magnitude relationship between a plurality of pieces of input data, and outputs a larger or smaller amount of data to the corresponding registers; and control circuits (110, 111, 112, 113) provided with a plurality of flag bits acting on the registers, wherein the flag bits judge whether the registers receive data transmitted from corresponding comparators or lower-level registers, and judge whether the registers transmit data to higher-level registers.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Zhang, Shijin
Luo, Tao
Liu, Shaoli
Chen, Yunji
Abstract
A quick operation device for a nonlinear function, and a method therefor. The device comprises: a definitional domain conversion part (10), used for converting an inputted independent variable into a corresponding value in a table lookup range; a table lookup part (3), used for looking up a slope and an intercept, which are linearly fitted, of a corresponding section, according to the inputted independent variable or an independent variable processed by the definitional domain conversion part (10); and a linear fitting part (20), used for obtaining, by means of a linear fitting method, a final result according to the slope and the intercept obtained, by means of table lookup, by the table lookup part (3).
G06F 7/57 - Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups or for performing logical operations
85.
OPERATION APPARATUS AND METHOD FOR ACCELERATION CHIP FOR ACCELERATING DEEP NEURAL NETWORK ALGORITHM
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Li, Zhen
Liu, Shaoli
Zhang, Shijin
Luo, Tao
Qian, Cheng
Chen, Yunji
Chen, Tianshi
Abstract
Disclosed are an operation apparatus and method for an acceleration chip for accelerating a deep neural network algorithm. The apparatus comprises: a vector addition processor module (1) for performing addition or subtraction of a vector and/or a vectorized operation of a pooling layer algorithm in a deep neural network algorithm (S1); a vector function value arithmetic unit module (2) for performing a vectorized operation of a non-linear evaluation in the deep neural network algorithm (S2); and a vector multiplier-adder module (3) for performing a multiply-add operation on the vector (S3), wherein the three modules execute a programmable instruction, and interact with each other to calculate a neuronal value and a network output result of a neural network, and a variation amount of a synaptic weight representing the interaction strength of an input layer neuron on an output layer neuron; and the three modules are all provided with an intermediate value storage area (6, 7, 8) and perform read and write operations on a primary memory (5). Thus, the present invention can reduce the number of times of reading and writing an intermediate value of a primary memory (5), reduce the energy consumption of an accelerator chip, and avoid the problems of data missing and replacement during a data processing process.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Liu, Yuchen
Luo, Tao
Liu, Shaoli
Zhang, Shijin
Chen, Yunji
Abstract
Provided are a data processing apparatus and method for an interconnection circuit, which relate to an interconnection node used for connecting one or more transaction data sources to one or more transaction data destinations in an interconnection circuit. The data processing apparatus comprises: at least one input end and at least one output end, the input end including a plurality of input ports, an output port, at least two multiplexers and at least one buffer memory; a buffer storage distribution circuit, which controls the multiplexers to distribute a temporary storage location for input transaction data according to a current state of the buffer memory; a routing selection circuit, which selects an output end for the transaction data of a buffer queue; an arbitration circuit, which arbitrates the buffer queue having a transmission priority and enables multiple transaction data transmission contending for the same output end to successively obtain the output path occupation right according to a pre-set arbitration policy; and a multi-path selector circuit, which is connected to the output port and the output end, and transfers data transmission in the interconnection circuit.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Zhou, Yiqing
Liu, Ling
Tian, Lin
Shi, Jinglin
Abstract
The present invention provides a method and a system for accessing a cognitive spectrum by an LTE system. The method for accessing a cognitive spectrum by an LTE system comprises: a UE implementing a start-up process on an LTE authorized spectrum; and when a load rate of the LTE authorized spectrum exceeds a threshold, dispatching the UE needing random access to a cognitive spectrum to work, or the UE working on the LTE authorized spectrum; otherwise, the UE working on the LTE authorized spectrum. By optimizing a protocol, the present invention provides a solution for using a cognitive radio technology in an LTE system, and the solution is backwards compatible with the LTE system; the communication quality is ensured, and the work spectral range of the LTE system is expanded, thereby avoiding complex operations of frequent quitting and re-access by a user, and achieving the objective of saving energy.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Luo, Layong
Xie, Gaogang
Xie, Yingke
Abstract
The present invention provides an IP lookup device, which includes a TCAM-based IP lookup engine and an SRAM-based IP lookup pipeline, allowing IP lookup to be performed simultaneously in the two lookup engines. In the TCAM-based IP lookup engine, stored are all the prefixes corresponding to the leaf nodes of a one-bit Trie tree constructed according to the forwarding information bank of a router. And stored in the SRAM-based IP lookup pipeline are all the prefixes corresponding to the intermediate nodes of the Trie tree. The device can support rapid increment route updates while realizing rapid IP lookup. The TCAM utilization rate is thereby improved and the problem of shortage of storage space of an IP lookup pipeline based on the SRAM within a FPGA chip can be relieved.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Huang, Kun
Xie, Gaogang
Zhang, Dafang
Abstract
A hash method is provided for implementing hash insertion operation on a hash table including a plurality of buckets. The bucket comprises an insertion counter, which is used to record the number of the inserted elements therein, and a deletion counter, which is used to record the number of the deleted elements therein. The method comprises the steps of: according to hash functions, mapping the element to be operated to at least one bucket in the hash table, referred to as a candidate bucket; on the basis of a selection principle of a target bucket, finding the target bucket out from the candidate buckets; inserting the element to be inserted in the target bucket; judging whether the newly inserted element has an influence on the storage location of the previously stored element in the candidate bucket, re-adjusting the storage location of the stored element if the storage location of the stored element no longer satisfies the selection principle of the target bucket. Still included is the step of accumulating the values of the insertion counters of the candidate buckets.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Sun, Yi
Ge, Yuming
Lu, Shan
Li, Jun
Xie, Gaogang
Abstract
A method for automatic allocation of service flows, includes matching a service flow running on a multi-mode mobile terminal with available access networks which are able to transmit the service flow, calculating the weight of each attribute of the available access networks for service allocation decisions when the number of available access networks is more than one, and ordering the available access networks by the weights; selecting an available access network according to the order, and transmitting an access negotiation request to the network access device corresponding to the selected available access network; the network access device decides whether to receive the service flow or not on the basis of network resources; the multi-mode mobile terminal transmits the service flow to the selected available access network on the basis of the decision, or transmitting a new negotiation request after selecting the next available access network according to the order. The invention enables multi-mode mobile terminals located in differently structured networks to automatically allocate different service flows to different access networks, thereby reducing communication costs and capacity use, while ensuring the quality of service for every service.
Institute of Computing Technology, Chinese Academy of Sciences (China)
Inventor
Ji, Xiangyang
Gao, Wen
Ma, Siwei
Zhao, Debin
Lu, Yan
Abstract
A “rounding to zero” method can maintain the exact motion vector and can also be achieved by the method without division so as to improve the precision of calculating the motion vector, embody the motion of the object in video more factually, and obtain the more accurate motion vector prediction. Combining with the forward prediction coding and the backward prediction coding, the present invention realizes a new prediction coding mode, which can guarantee the high efficiency of coding in direct mode as well as is convenient for hardware realization, and gains the same effect as the conventional B frame coding.
H04N 7/12 - Systems in which the television signal is transmitted via one channel or a plurality of parallel channels, the bandwidth of each channel being less than the bandwidth of the television signal
92.
METHOD FOR SYSTEM TERMINAL DEVICE ESTABLISHING NAT TRAVERSING CHANNEL
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Fu, Chuan
Wang, Di
Zhang, Guoqing
Yang, Qingfeng
Qin, Zhuoqiong
Abstract
The present invention provides a method for a system terminal device establishing a Network Address Translator (NAT) traversing channel. It includes that: a calling party system terminal device respectively obtains a data receiving port and a network address on a transmitting service node, a data receiving port and a network address on a NAT device, and a data receiving port and a network address on a system terminal device through the transmitting service node, and sends them to a called party system terminal device; a data sending port of the called party system terminal device respectively performs a direct test, a forward test, and a traversing test in turn for the calling party system terminal device and obtains the network address of the receiving data port and the port; the network address and the port of the called party system terminal device for receiving data are sent to the calling system terminal device. The present invention considers the diversity of end-to-end communication under multi-level NAT, uses the network resources in the private network formed by the multi-level NAT, and can implement the NAT traversing in a variety of scenes.
Institute of Computing Technology Chinese Academy of Sciences (China)
Inventor
Ma, Siwei
Lu, Yan
Gao, Wen
He, Yun
Yu, Lu
Lou, Jian
Abstract
The present invention discloses a method for encoding a flag of an image while encoding an I Frame, firstly setting a start code of an I Frame picture to be coded, for marking a start of the I Frame; setting a flag for indicating whether to code an identification field; judging the set flag, and if the flag indicates to encode the identification field of time and control code of a video tape recorder, encoding the identification field of time and control code of the video tape recorder, otherwise, not encoding the identification field of time and control code of the video tape recorder. In the present invention, the start code is added into the prediction picture header for marking the start of one frame picture data, as well as identifying whether there is the time_code identification field in the picture by the flag information of the time_code identification field, which can realize the objective of identifying the time_code identification field, and avoid encoding additional identification information, therefore it improves coding efficiency, and can be applied to all kinds of video/audio technical standards.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Fu, Chuan
Zhang, Guoqing
Wang, Di
Yang, Qingfeng
Abstract
An application-oriented name registration system for used in multi-layer network address translator environment, a logon method and a query method are provided, said multi-layer network address translator environment includes a public network and at least a private network, and said private network accesses the public network and other private networks via the network address translation unit, said name registration system includes a system terminal device, a calling agent server and a registration service device which accesses the public network and at least a private network; application of the present invention enables the application, service, user to be located by the ID in the multi-layer NAT network environment.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Fu, Chuan
Wang, Di
Zhang, Guoqing
Qin, Zhuoqiong
Abstract
An application-oriented name registration system in multi-layer NAT environment is disclosed, including system terminal device and registration service unit which comprises top-layer registration service unit connected to public network and basic registration service unit connected to private network; the registration service unit is used to receive login request message from the system terminal device and record the user login information of the system terminal device, and the user login information includes at least the user identification, the user login point of the system terminal device used by the user in the network to which the registration service unit is connected, and the system terminal device access point of the system terminal device used by the user in the network to which the registration service unit is connected. The invention can locate the position through the identification in the multi-layer NAT network environment, and make the service in the private network visible to the external network and irrelevant to specific applications. A system and a method that provide more proper access route for the external network nodes are also disclosed.
INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES (China)
Inventor
Wang, Xiangdong
Zhao, Dan
Qian, Yueliang
Lin, Shouxun
Liu, Qun
Abstract
A method for audio matching is disclosed. The method is used for detecting audio sub segments matching with the audio samples in a standard audio database from an audio stream segment to be detected. The method comprises steps: generating energy envelope unit figures respectively for the audio stream segment to be detected and the audio samples in the standard audio database; selecting a candidate matching sub segment in the audio stream segment to be detected, and the candidate matching sub segment having the same length as the audio sample to be compared; comparing the segment points and the probability of the segment points of the energy envelope unit in the candidate matching sub segment with the segment points and the probability of the segment points of the audio sample to be compared and obtaining a matching point and matching probability of the matching point; computing similarity value between the candidate matching sub segment and the audio sample to be compared with comparability measuring function according to the matching probability of the matching point; judging whether the candidate matching sub segment and the audio sample to be compared are matching according to the similarity value.
Institute of Computing Technology, Chinese Academy of Sciences (China)
Inventor
Gao, Wen
Zheng, Junhao
Ma, Siwei
Ji, Xiangyang
Zhang, Peng
Lu, Yan
Abstract
An encoding method for skipped macroblocks in a video image includes the steps of: adding one indication bit into a picture header for indicating a coding mode for skipped macroblocks in a current image; selecting the coding mode for a macroblock type in the current image according to the number of skipped macroblocks, if it is a run_length coding, then setting the indication bit of the picture header as a status indicating a run_length coding, and encoding the macroblock type in the image by the run_length coding mode; if it is a joint coding, then setting the indication bit of the picture header as status indicating a joint coding and encoding the macroblock type in the image by the joint coding mode of the number of skipped macroblocks and the macroblock type; finally, encoding other data in the current macroblock and writing data into a code stream.
H04N 7/12 - Systems in which the television signal is transmitted via one channel or a plurality of parallel channels, the bandwidth of each channel being less than the bandwidth of the television signal
98.
Entropy coding method for coding video prediction residual coefficients
Institute of Computing Technology Chinese Academy of Science (China)
Inventor
Gao, Wen
Zhao, Debin
Wang, Qiang
Ma, Siwei
Lu, Yan
Abstract
The present invention provides an entropy coding method for coding video prediction residual coefficients, comprising the steps of: firstly, zig-zag scanning coefficients of blocks to be coded to form a sequence of (level, run) pairs; secondly, selecting a type of code table for coding a current image block to be coded according to a type of macro block; then switching and coding each (level, run) pair in the obtained sequence of (level, run) pairs with multiple tables, with the reverse zig-zag scanning order for the coding order of the pairs; at last, coding a flag of End of Block EOB with the current code table. The present invention of an entropy coding method for coding video prediction residual coefficients fully considers the context information and the rules of symbol's conditional probability distribution by designing different tables for different block types and different regions of level. The coding efficiency is improved and no impact to computational implementation complexity is involved.
H04N 7/12 - Systems in which the television signal is transmitted via one channel or a plurality of parallel channels, the bandwidth of each channel being less than the bandwidth of the television signal
H04N 11/02 - Colour television systems with bandwidth reduction
H04N 11/04 - Colour television systems using pulse code modulation
Institute of Computing Technology, Chinese Academy of Sciences (China)
Inventor
Gao, Wen
Ji, Xiangyang
Ma, Siwei
Zhao, Debin
Lu, Yan
Abstract
A method for obtaining an image reference block in a code mode of fixed reference frame number includes the steps of: performing motion estimation for each block of a current B frame and obtaining a motion vector MV of a corresponding block of a backward reference frame; discriminating whether the motion vector is beyond a maximum forward reference frame which is possibly pointed by the B frame, if not, then calculating the forward and backward motion vectors in a normal way; if yes, then using the motion vector of the forward reference frame that the B frame can obtain in the same direction to replace the motion vector of the corresponding block in the backward reference, and calculating the forward and the backward motion vectors of the B frame; finally, two image blocks pointed by the final obtained forward and backward motion vectors as the image reference blocks corresponding to the macro block. The present invention solves the possibly appeared problem of un-matching motion vectors, and can guarantee the coding efficiency to the largest extent.
H04N 7/12 - Systems in which the television signal is transmitted via one channel or a plurality of parallel channels, the bandwidth of each channel being less than the bandwidth of the television signal
100.
Bi-directional predicting method for video coding/decoding
Institute of Computing Technology Chinese Academy of Sciences (China)
Inventor
Ji, Xiangyang
Gao, Wen
Zhao, Debin
Lu, Yan
Ma, Siwei
Qi, Honggang
Abstract
The invention discloses a bi-directional prediction method for video coding/decoding. When bi-directional prediction coding at the coding end, firstly the given forward candidate motion vector of the current image block is obtained for every image block of the current B-frame; the backward candidate motion vector is obtained through calculation, and the candidate bi-directional prediction reference block is obtained through bi-directional prediction method; the match is computed within the given searching scope and/or the given matching threshold; finally the optimal matching block is selected to determine the final forward motion vector, and the backward motion vector and the block residual. The present invention achieves the object of bi-directional prediction by coding a single motion vector, furthermore, it will not enhance the complexity of searching for a matching block at the coding end, and may save amount of coding the motion vector and represent the motion of the objects in video more actually. The present invention realizes a new prediction coding type by combining the forward prediction coding with the backward.
H04N 7/12 - Systems in which the television signal is transmitted via one channel or a plurality of parallel channels, the bandwidth of each channel being less than the bandwidth of the television signal
H04N 11/02 - Colour television systems with bandwidth reduction
H04N 11/04 - Colour television systems using pulse code modulation