Provided is a model processing method, a voice interaction method, an electronic device and a storage medium, relating to fields of artificial intelligence, big data and voice technologies. The model processing method includes: obtaining a candidate question set of each initial sample data in M initial sample data, wherein the initial sample data includes m rounds of question-and-answer between an object and an agent, the candidate question set includes a next round of question set corresponding to the mth round of question in the m rounds of question-and-answer; obtaining M training sample data based on the M initial sample data, the candidate question set and label data of each initial sample data, wherein the label data includes a target question to be generated by the agent in the (m+1)th round; and training a model to be trained by using the M training sample data to obtain a target model.
A method for perceiving a road environment, a vehicle control method, a training method, an electronic device, an autonomous driving vehicle, and a storage medium, which relate to fields of artificial intelligence technology, computer vision, deep learning and large model technologies, and may be applied to scenarios such as autonomous driving and unmanned driving. The method for perceiving a road environment includes: acquiring an associated-region lane attribute and an information to be detected, where the information to be detected is collected by an onboard sensor and represents a target region where a vehicle is traveling, the associated-region lane attribute corresponds to an associated region, and the associated region and the target region meet a predetermined similarity condition; and processing the associated-region lane attribute and the information to be detected by using an onboard perception model to obtain a road perception information of the target region.
G06V 20/56 - Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
B60W 60/00 - Drive control systems specially adapted for autonomous road vehicles
G06V 10/80 - Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
3.
METHOD FOR DISTRIBUTED OPERATION BASED ON NEURAL NETWORK MODEL AND RELATED APPARATUS
A method for distributed operation based on a neural network model and a related apparatus are provided, relating to the field of computer technology and in particular to the fields of artificial intelligence, deep learning, machine learning, distributed training and other technologies. The method includes: parsing code of the neural network model to construct an operator topology graph corresponding to the neural network model; generating a distributed operation strategy of the neural network model based on the operator topology graph and a preset resource constraint; and modifying the code of the neural network model based on the distributed operation strategy to obtain target code; where the target code is used to operate the neural network model based on the distributed operation strategy on a computing device corresponding to the resource constraint.
A data processing method and device, which relates to the field of artificial intelligence technology, specifically in the fields of intelligent cloud, network communication, and large language models are provided. The data processing method is applied to a single-layer switch, where the single-layer switch is configured to complete a target operation, and the target operation includes multiple stage operations. The method includes: receiving multiple in-network computation requests sent by a current GPU, where the multiple in-network computation requests correspond to the multiple stage operations one by one; parallelly executing the multiple stage operations for multiple GPUs in a target group where the current GPU is located based on the multiple in-network computation requests.
Baidu.com Times Technology (Beijing) Co., Ltd. (China)
Inventor
Cao, Biao
Zhou, Jingwei
Yin, Zhihui
Duan, Liguo
Zheng, Ran
Abstract
A method and an apparatus for processing metadata of a distributed file system are provided. An implementation of the method includes: in response to an amount of metadata of a distributed file system being less than a preset threshold, storing a metadata storage layer table and a path resolution acceleration layer table of the distributed file system on a given original shard; in response to the amount of metadata of the distributed file system being not less than the preset threshold, splitting the original shard into a metadata storage layer shard and a path resolution acceleration layer shard, where the metadata storage layer shard is used to store the metadata storage layer table, and the path resolution acceleration layer shard is used to store the path resolution acceleration layer table; and scheduling the metadata storage layer table on the metadata storage layer shard to different data shards.
The present disclosure provides a human-computer interaction method and apparatus based on a historical conversation, a device, and a storage medium, relates to the field of artificial intelligence, in particular to the field of human-computer interaction. The specific implementation solution is: acquiring current conversation information of a user and a historical conversation database; where the historical conversation database includes a plurality of historical memory objects, the historical memory object represents the historical conversation information of the user and an analysis result of the historical conversation information, the analysis result includes a timestamp and semantic information of the historical conversation information; determining, from the historical conversation database, a historical memory object associated with the current conversation information as a target memory object; determining, according to the target memory object, response information of the current conversation information.
H04L 51/02 - User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail using automatic reactions or user delegation, e.g. automatic replies or chatbot-generated messages
G06F 16/335 - Filtering based on additional data, e.g. user or group profiles
A method for controlling quality of service of a virtual machine, an electronic device and a storage medium are provided, relating to the fields of cloud computing, virtualization, big data and other technologies. The method includes: storing an operation request in a queue corresponding to a layer of a virtual machine cluster when an available token quantity in a token bucket of the layer is unable to meet a target token quantity required for the operation request, tokens of the layer being periodically generated according to an upper limit of quality of service of the layer; deducting the target token quantity from an updated available token quantity when the available token quantity in the token bucket is updated to meet the target token quantity required for the operation request; and sending the operation request from the queue to the layer for processing when deducting the target token quantity successfully.
A method of performing a task based on a large model and an electronic device are provided, which relate to artificial intelligence technology, and in particular to fields of voice interaction, deep learning, large model, etc. The method includes: acquiring a demand feature characterizing a demand intention; performing a task by using the large model according to the demand feature, to obtain a response text, in which a target response word is determined based on: determining a query feature for each attention subtask in the task based on an associated response word feature; and performing, based on the demand feature read from a storage unit as a value feature and a key feature shared by the plurality of attention subtasks, the plurality of attention subtasks by using a computing unit according to a plurality of query.
G06V 10/50 - Extraction of image or video features by performing operations within image blocksExtraction of image or video features by using histograms, e.g. histogram of oriented gradients [HoG]Extraction of image or video features by summing image-intensity valuesProjection analysis
G10L 15/02 - Feature extraction for speech recognitionSelection of recognition unit
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
9.
INFORMATION PROCESSING METHOD, APPARATUS, DEVICE, MEDIUM, AND PRODUCT BASED ON LARGE MODEL
An information processing method, which relates to the field of artificial intelligence, specifically the technical fields of intelligent cloud, deep learning, and large models is disclosed. The information processing method based on a large model includes: receiving, through a unified entry, problem information sent by a user; generating a prompt sentence based on the problem information; invoking the large model based on the prompt sentence to obtain the target type of the problem information output by the large model; creating a target task based on the problem information and the target type; performing problem processing based on the target task using a problem processing object corresponding to the target type, to obtain a problem processing result; wherein different target types correspond to different problem processing objects; and feeding back the problem processing result to the user.
A method for parallel processing of model is suggested, which relates to the field of artificial intelligence technologies such as deep learning, natural language processing, image processing, and large language models. The method is applied to a first computing device among N computing devices, which includes: obtaining a target first data submatrix in N first data submatrices and a target second data submatrix in N second data submatrices; initiating a matrix multiplication operation process to process the target first data submatrix and the target second data submatrix, and in parallel with the processing, copy a first candidate data submatrix in the other N-1 computing devices; in response to obtaining a first processing result between the target first data submatrix and the target second data submatrix, processing the copied first candidate data submatrix and a target data submatrix corresponding to the first candidate data submatrix; in response to obtaining a second processing result between the first candidate data submatrix and the target data submatrix corresponding to the first candidate data submatrix, obtaining a target processing result of the first computing device based on the first processing result and the second processing result.
A large model-based text generation method, electronic device, and storage medium in the field of artificial intelligence technologies such as large models and natural language processing are provided. The specific implementation includes: obtaining a matching prefix, where the matching prefix includes at least one consecutive token; obtaining a draft token sequence based on the matching prefix according to a pre-configured draft token sequence length, where the draft token sequence includes at least one token; performing validity verification on the draft token sequence using a pre-trained large model based on a speculative decoding algorithm; and in response to passing the verification, using the draft token sequence as generated text.
A method is provided. The method comprises: determining picture evaluation information of a to-be-encoded current block, and determining a plurality of picture evaluation information of a plurality of encoded reference blocks, wherein the pixel positions of the plurality of reference blocks are adjacent to the pixel position of the current block; calculating, based on the picture evaluation information of the current block and the picture evaluation information of the plurality of reference blocks, the correlation between the current block and each of the plurality of reference blocks, respectively, and determining a target reference block in the plurality of reference blocks based on the results of the correlation calculations; performing a prediction search based on a first intra prediction mode of the target reference block, and determining a second intra prediction mode corresponding to the current block based on the result of the prediction search.
H04N 19/159 - Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
H04N 19/147 - Data rate or code amount at the encoder output according to rate distortion criteria
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
13.
INTERLEAVED PIPELINE SCHEDULING METHOD AND APPARATUS, DEVICE, AND STORAGE MEDIUM
The present disclosure provides an interleaved pipeline scheduling method and apparatus, a device, and a storage medium, relating to the technical field of computers, and in particular to the technical field of artificial intelligence such as deep learning, neural networks, and parallel computing. The specific implementation scheme comprises: on the basis of a pipeline segmentation dimension, an interleaving dimension, and an accumulation count, obtaining micro-batch scheduling parameters; on the basis of the micro-batch scheduling parameters, determining a transmission interval for micro-batches requiring buffered scheduling; and on the basis of the transmission interval and a buffer unit, performing buffered scheduling on computation results of micro-batches requiring delayed transmission. According to the present disclosure, a buffer unit, such as a buffer queue, can be created to buffer computation results from some computation units in the buffer unit, and to delay the transmission of the computation results. In this way, the transmission of computation results for some micro-batches can be delayed, thereby improving the applicability of interleaved pipeline scheduling.
A sparse processing method and apparatus for a sparse attention network, and an electronic device. The method specifically comprises: in a memory cell, using constraint conditions to construct a masked sparse representation (S301), wherein the constraint conditions are: the dimension of a mask matrix is [B,a,S], B represents the batch size, a represents the number of heads, S represents the sequence length, and each element in the S dimension represents a masked starting row of each column in the mask matrix; and a calculation unit acquiring the masked sparse representation from the memory cell, using the masked sparse representation to perform sparse processing on input data, and storing the sparse processing result into the memory cell (S302). According to the method, the constraint conditions are used to construct the masked sparse representation, so that memory consumption can be reduced from a quadratic order of the sequence length to a linear order of the sequence length, thereby remarkably reducing memory requirements during large model training, and improving the training efficiency.
The present disclosure relates to the technical field of computers, in particular to the technical fields such as data processing, difference detection, and software development, and provides a document difference detection method and apparatus, a device, and a storage medium. The specific implementation comprises: acquiring at least two documents to be detected, said at least two documents including documents at different stages in a collaborative design process; determining a plurality of pieces of content in each of said documents; determining binding relationships between content in different documents among said documents; and using a preset detection rule and the binding relationships to detect whether there is a difference between said at least two documents. According to the present disclosure, a difference between documents at different stages of collaborative design can be detected.
The present disclosure provides a text inference acceleration method applied to a large language model and a related device, relating to the technical field of data processing, and in particular to the technical fields of large language models, deep learning, and long text inference. The specific implementation scheme is as follows: selecting, from a target token set stored in a video memory, a core token set that needs to be retained, wherein the core token set at least comprises a first token subset, the first token subset is determined on the basis of attention scores obtained by performing a global query operation on the target token set using a plurality of proxy tokens, and each proxy token is selected from the target token set; and on the basis of the core token set, performing an eviction operation on the target token set in the video memory. In this embodiment of the present disclosure, a core token set that needs to be retained is selected by means of a plurality of proxy tokens, such that the processing burden caused by unnecessary tokens is reduced, thereby effectively freeing video memory space and improving the inference speed of the large language model.
An image annotation data generation method and apparatus, a model training method and apparatus, a device and a medium. The image annotation data generation method comprises: acquiring a sample image, a target recognition object in the sample image, and visual description information of the target recognition object (S101); verifying words in the visual description information on the basis of the sample image and the target recognition object to obtain a verification result of the visual description information (S102); adjusting the words in the visual description information on the basis of the verification result of the visual description information to obtain target description information of the target recognition object (S103); and taking the target description information of the target recognition object as annotation data of the target recognition object (S104).
Provided are a training method for an image generation model, an image generation method, apparatus, and a device. The training method includes extracting reference keypoints of a character from a sample reference image; based on a model to be trained, performing motion estimation using sample audio data and the reference keypoints to obtain predicted keypoints that match the sample audio data; performing parameter estimation using the reference keypoints and the predicted keypoints to obtain motion parameters of the predicted keypoints, and performing prior motion estimation using the motion parameters of the predicted keypoints to obtain optical flow of non-key pixel points; performing image prediction using the sample reference image and dense optical flow to obtain predicted image data that matches the sample audio data; performing model training using the predicted image data and annotated image data to obtain the image generation model.
G06T 7/246 - Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
G06V 30/18 - Extraction of features or characteristics of the image
G10L 25/57 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination for processing of video signals
19.
METHOD FOR CONSTRUCTING MAP BASED ON LARGE MODEL, VEHICLE CONTROL METHOD, ELECTRONIC DEVICE, AND STORAGE MEDIUM
A method of constructing a map based on a large model, a vehicle control method, an electronic device, and a storage medium are provided, which relate to a field of artificial intelligence technology, in particular to fields of computer vision, deep learning, large model and generative model technologies. The method includes: acquiring an associated-region lane attribute and an image to be detected, which is collected by an onboard sensor and represents a road region to be detected; constructing a target prompt information based on the associated-region lane attribute; and processing the target prompt information and the image to be detected by using the large model to obtain a regional road map for the road region to be detected. The associated-region lane attribute corresponds to an associated road region, and the associated road region and the road region to be detected meet a predetermined similarity condition.
G06V 10/80 - Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
G06V 20/56 - Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
20.
INTERACTIVE METHOD BASED ON LARGE MODEL, TRAINING METHOD, INTELLIGENT AGENT, DEVICE, AND MEDIUM
An interactive method based on a large model, a training method, and an intelligent agent, which relate to fields of artificial intelligence, speech recognition, speech interaction, deep learning, large models, and application scenarios of knowledge search, autonomous driving, intelligent customer service, intelligent speech control, smart e-commerce, AI healthcare. The interactive method includes: acquiring a request speech; performing a speech recognition on the request speech to obtain a speech recognition feature representing a request semantics; and processing the speech recognition feature using the large model to obtain a response text, where the response text includes response words arranged in sequence, a target response word among the response words is determined by processing the speech recognition feature and an associated response word feature using an attention fusion layer of the large model, and the associated response word feature is related to an associated response word arranged before the target response word.
A task execution method and apparatus for a large model, an electronic device, and a storage medium are provided, which relate to a field of artificial intelligence, and in particular to fields of deep learning and large model technologies. The method includes: executing, according to a target feature to be processed, a collaborative computing task using a target computing unit, where the collaborative computing task includes a first collaborative task and a second collaborative task, the first collaborative task is used to process the target feature to be processed and a first collaborative sub-weight to obtain an intermediate collaborative feature, the second collaborative task is used to process the intermediate collaborative feature and a second collaborative sub-weight to obtain a target collaborative feature; and fusing a target basic feature and the target collaborative feature to obtain a next target feature to be processed
A method, electronic device and computer-readable storage medium for extracting entity relationships, which relates to artificial intelligence technologies such as natural language processing, knowledge graphs, deep learning, and large language models. The method for extracting entity relationships includes: inputting a target long text into a target large language model to obtain a target keyword list based on an output result of the target large language model; inputting the target keyword list into multiple target relationship agents respectively to obtain multiple target regular expressions corresponding to different entity relationships based on output results of the multiple target relationship agents; and processing texts in a preset text set using the multiple target regular expressions to obtain entity relationship extraction results.
Provided is a method for editing an online document, an electronic device, and a storage medium, relating to the fields of computer technology, document processing, document editing, and artificial intelligence. The method includes: a client sends a data acquisition request for a target file to a server in response to an open request for the target file; receives a data entity of the target file returned by the server, where the data entity comprises multiple pieces of log data; loads the data entity in a preset web view container, and displays a web view with the data entity loaded; and receives an editing instruction for the web view, displays response result data in the web view in response to the editing instruction, and sends the response result data to the server so that the server updates the data entity of the target file according to the response result data.
The present disclosure provides an acquisition method and apparatus for attribute information of a directory of a distributed file system, and a device, and relates to the field of computer technologies and, in particular, to the field of big data and a distributed file system. The specific implementation is as follows: when an acquisition instruction is acquired, judging whether a log index of a directory indicated by the acquisition instruction is less than a transaction index of the directory, and whether a queue index of the directory is greater than or equal to the transaction index; if both of the above conditions are met, reading attribute modification information of a directory from a memory queue; and determining the attribute information of the current directory according to the attribute modification information and an attribute file of the directory.
A communication method, apparatus, electronic device and storage medium for a computing power cluster are provided. An implementation of the method includes: during a process of communicating with a communication receiver using Remote Direct Memory Access (RDMA) protocol, obtaining a first packet loss rate corresponding to the RDMA protocol; in response to the first packet loss rate being higher than a first preset packet loss rate, initiating a first handshake request to the communication receiver for requesting to switch to Transmission Control Protocol (TCP) for communication; receiving a first handshake response returned by the communication receiver for the first handshake request, and determining a first starting transmission position of data according to a last data receiving position in the first handshake response; and communicating, by using the TCP, with the communication receiver starting from data corresponding to the first starting transmission position.
An image search method, an intelligent agent, an electronic device, and a storage medium are provided, which relate to a field of artificial intelligence technology. The method includes: acquiring at least one first candidate image matched with an input text information; performing a semantic analysis on the input text information by using a first large model to generate at least one question-answer pair which includes a question information and a first answer information; performing an image-text analysis on the question information and the at least one first candidate image by using a second large model to generate a second answer information for answering each question information; and determining at least one target image matched with the image search requirement from the at least one first candidate image according to a comparison result between the at least one first answer information and the at least one second answer information.
The present disclosure provides a method and an apparatus for intent recognition based on a large language model (LLM), an electronic device, and a storage medium, relating to a field of computer technology, specifically to a field of artificial intelligence technology, such as natural language processing and an LLM. A specific implementation solution is as follows: obtaining a query statement, a preset intent, and descriptive information of the preset intent; obtaining a first candidate intent corresponding to the query statement by matching the query statement with the preset intent and the descriptive information of the preset intent; generating first prompt information based on the query statement, the first candidate intent, and descriptive information of the first candidate intent; and determining a first target intent corresponding to the query statement from the first candidate intent by inputting the first prompt information into the LLM.
A large model-based video processing method, device and storage medium in the field of artificial intelligence technology, particularly in the fields of deep learning and large models are disclosed. The specific solution includes: collecting an imitation video made by a user based on a target video; extracting three-dimensional postures of the imitation video using a pre-trained large model based on the imitation video; and performing posture assessment on the imitation video using the pre-trained large model based on the three-dimensional postures of the imitation video and pre-obtained three-dimensional postures of the target video to obtain an assessment result.
Provided is a method for controlling a virtual machine, an electronic device and a storage medium, relating to the field of computer technology, and in particular to application fields of cloud service, cloud computing and big data processing. The method includes: obtaining a virtual machine control request; generating a control operation instruction for a target virtual machine based on the virtual machine control request by using a computing node service deployed on the smart card; and sending the control operation instruction to a virtualization control program deployed on the physical machine; where the virtualization control program is used to perform a control operation on the target virtual machine based on the control operation instruction.
A method and apparatus for cross-computing power cluster communication, an electronic device, and a computer readable storage medium, are provided. An implementation of the method includes: in response to a communication initiator and a communication receiver respectively belonging to different computing power clusters, increasing a RDMA connection count on the basis of an initial connection count until a detected actual bandwidth value no longer increases, to obtain a target connection count; increasing, with the RDMA connection count being maintained at the target connection count, a Buffer size on the basis of an initial size until a detected actual bandwidth value no longer increases, to obtain a target size; determining cross-cluster transmission parameters, based on the target connection count and the target size; and communicating, cross-cluster transmission parameters, with the communication receiver belonging to a different computing power cluster.
H04L 67/1097 - Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
H04L 49/9005 - Buffering arrangements using dynamic buffer space allocation
31.
METHOD AND APPARATUS FOR PROCESSING DIRECTORY METADATA IN A DISTRIBUTED FILE SYSTEM, AND DEVICE
Baidu International Technology (Shenzhen) Co., Ltd. (China)
Inventor
Cao, Biao
Chen, Qiushi
Jian, Jielong
Duan, Liguo
Zheng, Ran
Abstract
The present disclosure provides a method for processing directory metadata in a distributed file system, an apparatus and a device, which are related to the field of computer technologies, in particular to the field of big data and distributed file systems. The specific solution is as follows: when a processing request is acquired, a first index shard corresponding to directory metadata indicated by the processing request may be determined according to the processing request; the first index shard may be loaded, and a metadata shard corresponding to the directory metadata indicated by the processing request is determined according to the first index shard. Information in the determined metadata shard is adjusted. A second index shard related to the directory metadata indicated by the processing request is determined according to the processing request, and index information corresponding to the directory metadata in the second index shard is adjusted.
Provided are an inference method and apparatus for a large language model, a device, and a storage medium. The inference method for the large language model includes: performing encryption on a target input text to obtain a target input ciphertext; sending the target input ciphertext to a server so that an encrypted model is used for performing inference on the target input ciphertext by the server to obtain a target result ciphertext; receiving the target result ciphertext sent by the server; and performing decryption on the target result ciphertext to obtain a target result plaintext.
H04L 9/06 - Arrangements for secret or secure communicationsNetwork security protocols the encryption apparatus using shift registers or memories for blockwise coding, e.g. D.E.S. systems
An image labeling method and apparatus, a device, and a storage medium. The method comprises: using a preset labeling model to perform initial labeling processing on a target image to be labeled, to obtain target position information of a target object in the target image, and description information to be verified (S101); inputting the target image and target prompt information into a generative pre-trained transformer (GPT) model, to obtain an object search result, the target prompt information being used for prompting the GPT model to search, on the basis of the target description information, the target image for the target object and/or a virtual object associated with the target object, and the target description information being determined on the basis of the description information to be verified (S102); and when it is determined on the basis of the object search result that the verification of the description information to be verified is passed, determining labeling information of the target image on the basis of the description information to be verified and the target position information (S103).
The present application provides a method and apparatus for transmitting audio and video data, a device, and a medium. The method for transmitting audio and video data comprises: determining a plurality of candidate transmission sequences for data frames in audio and video data to be transmitted, the data frames comprising audio frames and video frames; on the basis of frame attributes of the data frames, determining a loss parameter of each candidate transmission sequence; and, on the basis of the loss parameters, determining a target transmission sequence from the plurality of candidate transmission sequences, and transmitting the data frames to a client by using the target transmission sequence.
H04N 21/262 - Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission or generating play-lists
H04N 21/8547 - Content authoring involving timestamps for synchronizing content
36.
DIRECTORY METADATA OPERATION METHOD AND APPARATUS, ELECTRONIC DEVICE, AND READABLE STORAGE MEDIUM
A directory metadata operation method, including: in response to receiving a directory query request, obtaining a query path carried therein; obtaining a prebuilt prefix directory data table and a prebuilt full load directory data table, where a key of the prefix directory data table is a prefix path of the directory metadata, which is a subpath corresponding to a hierarchy other than a default hierarchy counting from end in a path corresponding to the directory metadata, a value of the prefix directory data table is the directory metadata, a key of the full load directory data table is identification information of the directory metadata and a value thereof is the directory metadata; determining a matching result between the query path and the key of the prefix directory data table; and determining a query result based on the matching result and the full load directory data table.
An information prediction method, a method of training an autonomous driving model, a device, a medium, and an autonomous driving vehicle, which relate to a field of artificial intelligence technology, and in particular, to fields of computer vision technology and deep learning technology, which may be applied to scenarios such as autonomous driving. Specific implementation scheme of the information prediction method is: acquiring perception data including image data acquired by a sensor in a vehicle and driving data of the vehicle; encoding the image data to obtain an image token sequence corresponding to the image data; encoding the driving data to obtain a driving feature corresponding to the driving data; and generating, using a generative model, a predicted token sequence corresponding to the image token sequence and a control information for the vehicle based on the driving feature and the image token sequence.
B60W 60/00 - Drive control systems specially adapted for autonomous road vehicles
G05B 13/02 - Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
G06V 10/40 - Extraction of image or video features
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
G06V 20/56 - Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
38.
METHOD AND APPARATUS FOR CONFIGURING SCREEN, SYSTEM, ELECTRONIC DEVICE, AND STORAGE MEDIUM
The present disclosure provides a method and apparatus for configuring a screen, a system, an electronic device, a storage medium, and a program product, relating to the technical field of data processing, in particular to the technical field of display control and the technical field of large models. The specific implementation scheme comprises: on the basis of a received screen configuration statement, determining a display area to be configured and corresponding operation information; on the basis of layout information and the operation information of the display area, generating a screen configuration instruction; and, on the basis of the screen configuration instruction, determining display content and outputting the display content to the display area.
The disclosure provides an audio and video synchronization detection method, an audio and video synchronization detection device, an electronic equipment and a terminal, and relates to a field of image processing, in particular to the technical fields of computer vision, artificial intelligence and the like. The method includes: extracting image data and audio data of a video segment of a target length; obtaining a plurality of face image lists by performing face detection and tracking based on the extracted image data; extracting mouth features corresponding to each face image list based on a traversal result of the face image list, in which the mouth features are used for representing changes in lip shape; and determining a synchronization result of the video segment based on the audio data and the mouth features.
H04N 21/439 - Processing of audio elementary streams
G06T 7/246 - Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
G06V 40/16 - Human faces, e.g. facial parts, sketches or expressions
H04N 21/43 - Processing of content or additional data, e.g. demultiplexing additional data from a digital video streamElementary client operations, e.g. monitoring of home network or synchronizing decoder's clockClient middleware
40.
IMAGE ENCODING METHOD, DEVICE AND ELECTRONIC EQUIPMENT
The disclosure provides a method for coding an image. The method includes: obtaining a contour mask image of an original image; obtaining an enhancement image by performing enhancement on the original image based on the contour mask image; and obtaining a coding stream and a reconstructing image of the original image based on the contour mask image and the enhancement image.
The present disclosure relates to the technical field of data processing, in particular to the technical fields such as map technology, indexing technology, trajectory positioning, and Internet of things technology, and provides a data processing method and apparatus, an electronic device and a computer-readable storage medium. The specific implementation solution comprises: acquiring index data, wherein the index data is obtained by clustering a plurality of area of interest spatial surfaces, the index data comprises a global index and a plurality of local indexes, the local indexes have one-to-one correspondence to clustering results and are relational indexes of one or more area of interest spatial surfaces comprised in the clustering results, and the global index is a relational index of the plurality of clustering results; and on the basis of the index data, acquiring from among full trajectory data the trajectory data corresponding to at least one area of interest spatial surface.
The present disclosure relates to the technical field of computers, and particularly relates to the technical fields of autonomous driving and artificial intelligence. Provided are an autonomous driving model based on temporal recursive autoregressive inference, and a method, an apparatus and a vehicle. In the autonomous driving model, a coding layer is configured to code sensor information of a current moment, so as to obtain a current scenario representation; a trajectory planning layer is configured to determine a driving trajectory from the current moment to a future moment on the basis of a historical scenario representation of the current moment; and an inference layer is configured to determine a predicted scenario representation of at least one future moment and a historical scenario representation of the future moment on the basis of the current scenario representation, the historical scenario representation and current prompt information, wherein the prompt information at least comprises the driving trajectory. Therefore, the autonomous driving model can use an integrated inference layer to realize the learning of historical information and the prediction of future information, so that the model can perform future prediction while learning the history, and thus the prediction effect of the model is improved.
The present disclosure relates to the technical field of computers, in particular to the technical field of autonomous driving and artificial intelligence, and provides a generative diffusion model-based autonomous driving model, a method, an apparatus, and a vehicle. An encoding layer in an autonomous driving model is configured to encode current perception information of an autonomous driving vehicle, to obtain a discrete spatial representation of a current scene, a prediction layer is configured to perform discrete diffusion on the basis of at least one scene discrete spatial representation comprising the discrete spatial representation of the current scene, to determine a prediction spatial representation at a future moment, and a decoding layer is configured to decode the prediction spatial representation, to obtain autonomous driving decision information at the future moment. Thus, according to the autonomous driving model, a generative diffusion model-based output can be used to determine the autonomous driving decision of the autonomous driving model. By improving the accuracy of future prediction, autonomous driving decision and prediction effects are further improved.
Provided are a map updating method and apparatus, a model training method and apparatus, an electronic device, and a medium, relating to the technical field of artificial intelligence, and in particular to the fields such as automatic driving, intelligent traffic, computer vision, and image processing. A specific implementation solution is: for a same piece of position information, determining associated historical map data and M pieces of collected image data based on a time sequence, wherein M is an integer greater than or equal to 1; respectively encoding the historical map data and the M pieces of collected image data to obtain a historical map feature and M collected image features; determining position information and category information of a target instance on the basis of the historical map feature and the M collected image features, wherein the target instance represents lane information; and updating the map data on the basis of the position information and the category information of the target instance.
G06F 16/56 - Information retrievalDatabase structures thereforFile system structures therefor of still image data having vectorial format
G06F 16/587 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location
G06F 18/214 - Generating training patternsBootstrap methods, e.g. bagging or boosting
G06V 20/56 - Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
G06V 20/70 - Labelling scene content, e.g. deriving syntactic or semantic representations
G06V 10/26 - Segmentation of patterns in the image fieldCutting or merging of image elements to establish the pattern region, e.g. clustering-based techniquesDetection of occlusion
G06V 10/80 - Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
G01C 21/00 - NavigationNavigational instruments not provided for in groups
45.
VIDEO RENDERING METHOD FOR LIVE BROADCAST SCENE, ELECTRONIC DEVICE AND STORAGE MEDIUM
Provided is a video rendering method for a live broadcast scene, relating to the field of live broadcast and the field of large model. The method includes: recording a live broadcast of an anchor to obtain a first video stream; performing speech recognition on live speech in the first video stream to obtain first text information; determining topic popularity of the live broadcast based on audience response information in a process of recording the live broadcast and the first text information; determining corresponding reply text information based on the first text information when the topic popularity of the live broadcast meets a first set condition; rendering virtual characters based on the reply text information to obtain a second video stream; and generating a third video stream of the anchor chatting with the virtual characters based on the first video stream and the second video stream.
Provided are a resource processing method and apparatus based on a hybrid content delivery network (CDN) system and a device. The method is performed by a CDN edge node in the hybrid CDN system and includes: in response to a user request, requesting a resource from a CDN parent node, and determining, by the CDN parent node, whether the user request satisfies a hybrid pull condition for pulling the resource from a CDN hybrid node; and when the user request satisfies the hybrid pull condition, acquiring a to-be-accessed first resource from the CDN hybrid node.
A method for training a multimodal large model includes: obtaining first training data and second training data; obtaining an initial multimodal large model, wherein the multimodal large model comprises a backbone network and multiple codec networks corresponding to the multiple non-textual modalities; and the multiple codec networks perform encoding and decoding based on a same multimodal word list; performing a joint training on the multiple codec networks and the multimodal word list based on the data under the multiple non-textual modalities; and training the backbone network based on the multimodal sample reference data and the sample generation data under the target task in the second training data. The multiple codec networks perform the encoding and decoding based on the same multimodal word list, which reduces the difficulty and the cost of the model training.
G06V 10/77 - Processing image or video features in feature spacesArrangements for image or video recognition or understanding using pattern recognition or machine learning using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]Blind source separation
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
48.
AUTONOMOUS DRIVING MODEL, METHOD, APPARATUS AND VEHICLE CAPABLE OF ACHIEVING MULTI-MODAL INTERACTION
Provided are an autonomous driving model, method, apparatus and vehicle capable of achieving multi-modal interaction. An input layer (210) in an autonomous driving model (200) is configured to receive historical decision information (201), sensing information (202), traffic information (203) and interaction information (204) at a current moment; an encoding layer (220) is configured to encode input information; an autoregressive inference layer (230) is configured to obtain a hidden state for a next moment; and a decoding layer (240) is configured to perform decoding so as to obtain interaction information for the next moment and autonomous driving decision-making information for the next moment. The autonomous driving model (200) may understand a current driving environment by inferring the sensing information (202), the traffic information (203) and the interaction information (204), and gains a better understanding of the influence of historical operations on the autonomous driving process by inferring historical decision-making data, thereby making an output result of the autonomous driving model achieve better interpretability and controllability.
The present disclosure relates to the field of computer technology and particularly relates to the technical fields of autonomous driving and artificial intelligence. Provided are an autonomous driving method, apparatus and vehicle capable of following instructions for self-recovery. The autonomous driving method comprises: acquiring input information; encoding the input information to obtain an input tensor corresponding to the input information; performing autoregressive inference on the input tensor to obtain a hidden state for a first moment after a current moment; and performing decoding on the basis of the hidden state of the first moment to obtain interaction information for the first moment and autonomous driving decision-making information for the first moment, wherein the interaction information for the first moment comprises a signal used for indicating that assistance is required during autonomous driving. In this way, prompt information for natural language interaction can be used to instruct an autonomous driving model to control a vehicle, thereby achieving the rapid generation of a self-recovery scheme. In addition, the vehicle can perform autonomous self-recovery on the basis of self-recovery scheme instructions.
The present disclosure relates to the technical field of artificial intelligence such as intelligent transportation and computer vision, and provides a method and apparatus for detecting failure to yield to pedestrians of a vehicle, an electronic device, and a readable storage medium. The method for detecting failure to yield to pedestrians of a vehicle comprises: acquiring a marked crosswalk area on the basis of surveillance images in a surveillance video stream; for each surveillance image frame in the surveillance video stream, acquiring the intersection over union between a target vehicle and the marked crosswalk area in the surveillance image frame, and selecting multiple surveillance image frames having intersection over union greater than a preset intersection over union threshold as target surveillance images corresponding to the target vehicle; and sequentially performing human detection on each target surveillance image frame in descending order of intersection over union, and when it is determined that a human detection result corresponding to a current target surveillance image satisfies a preset requirement, determining that the target vehicle is a regulation-violating vehicle which does not yield to pedestrians. The present disclosure can expand the detection scenario, reduce the computing resources required for detection, and improve the detection efficiency while detecting whether vehicles yield to pedestrians.
In the field of cloud networks and network security, which may be applied to intelligent cloud scenarios, a cloud network message processing method includes: obtaining a cloud network message; determining, from at least one type of pre-configured candidate security device, a target type of candidate security device corresponding to the cloud network message; in the case that there are multiple candidate security devices of the target type, determining a target security device from the multiple candidate security devices of the target type based on session information included in the cloud network message, where cloud network messages with same session information correspond to a same target security device; sending the cloud network message to the target security device for security processing, and sending the cloud network message having been security processed by the target security device to a destination.
The present disclosure relates to the technical fields of artificial intelligence, such as cloud services, big data and large language models, and provides an algorithm service deployment method and apparatus, an electronic device, and a readable storage medium. The algorithm service deployment method comprises: a first server acquiring service operation data and service information of a target algorithm service, wherein the service operation data comprises one of an application program interface, a mirror image file and a model file; determining a service access type on the basis of the service property of the target algorithm service, and acquiring a target access method corresponding to the service access type, wherein the service access type comprises one of application program interface access, mirror image access and model access; and deploying the target algorithm service at the first server on the basis of the target access method, the service operation data and the service information. According to the present disclosure, the first server supports deployment of algorithm services corresponding to different service access types, so that the first server can have stronger service deployment performance.
A motion estimation method, apparatus, electronic device, storage medium, and computer program product are disclosed, which relates to the field of artificial intelligence, specifically cloud storage, cloud computing, video encoding. A method for motion estimation comprises: determining candidate search spaces and candidate search starting points based on a lookahead motion vector and a predicted search starting point of a current block; determining a target search starting point from the candidate search starting points and determining a target search space from the candidate search spaces; performing a search based on the target search starting point and the target search space to obtain an initial motion estimation result for the current block; obtaining a target motion estimation result for the current block based on the initial motion estimation result.
The present disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, deep learning and the like, and provides a target identification method and device. The specific implementation comprises: obtaining a video stream collected by a rotatable camera device; determining a preset position of the camera device on the basis of a target in an image frame corresponding to the video stream; determining a still image frame and a fixed still object in the still image frame on the basis of the preset position and the video stream; and determining a position change result of the target in the video stream on the basis of the still image frame and the fixed still object. The implementation improves the accuracy of position change identification of targets.
A method of training a deep learning model and a method of synthesizing a speech are provided, which relate to a field of artificial intelligence technology, in particular to fields of large model, large language model, generative model, deep learning, and speech processing technologies. The method of training a deep learning model includes: determining a reference speech feature of a sample speech, the reference speech feature being associated with a prosodic feature of the sample speech; retrieving a speech library using a sample text corresponding to the sample speech, so as to obtain a pronunciation expression feature of the sample text; inputting the pronunciation expression feature into the deep learning model to obtain an output speech feature; determining a loss of the deep learning model according to the reference speech feature and the output speech feature; and adjusting a parameter of the deep learning model according to the loss.
Provided are a query processing method based on a large language model, an electronic device, and a storage medium. The query processing method based on a large language model includes acquiring a to-be-processed target query; generating a prompt based on a to-be-used target data model, target format information of a specified data format, and the target query; inputting the prompt into the large language model to obtain a target parsing result of the specified data format outputted by the large language model; and modifying the target parsing result based on the target data model.
The present disclosure relates to the technical field of artificial intelligence, and specifically to technical fields such as computer vision, deep learning and big data. Provided are a rainfall identification method and apparatus, a model training method and apparatus, and a device and a storage medium, which may be applied in scenarios such as smart cities and emergency management. The rainfall identification method comprises: processing a target video collected by a target camera, so as to obtain an initial rainfall identification result; determining a target rainfall amount station, the distance between which and the target camera meets a preset condition, and acquiring target rainfall amount data of the target rainfall amount station; and on the basis of the initial rainfall identification result and the target rainfall amount data, determining a target rainfall identification result. The present disclosure may improve the accuracy of rainfall identification.
Provided are a query processing method based on a large language model, a prompt construction method, an electronic device, and a storage medium. The query processing method includes acquiring a to-be-processed target query; acquiring a data field in a target data model and acquiring target format information of a specified data format; constructing a prompt based on the data field in the target data model, the target format information, and the target query; and inputting the prompt into the large language model to obtain a target format result outputted by the large language model.
A method for predicting a structure of a protein complex includes: obtaining an initial coordinate of each amino acid residue in a target protein complex, and obtaining a target residue pair feature, a first multiple sequence alignment (MSA) feature and a second MSA feature of each protein monomer in the target protein complex; and inputting the initial coordinate of each amino acid residue, and the target residue pair feature, the first MSA feature and the second MSA feature of each protein monomer into an N-level fold iteration network layer, and obtaining a target coordinate of each amino acid residue by predicting a torsion angle, a position transformation at residue level and a position transformation at monomer chain level of each amino acid residue via the N level fold iteration network layer, to obtain a predicted structure of the protein complex.
G16B 40/00 - ICT specially adapted for biostatisticsICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
60.
HUMAN-COMPUTER METHOD AND APPARATUS, ELECTRONIC DEVICE, AND STORAGE MEDIUM
Provided are a human-computer interaction method and apparatus, an electronic device and a storage medium, relating to the technical field of artificial intelligence, in particular to the technical fields of deep learning, natural language processing and large models. The specific implementation solution comprises: in response to a human-computer interaction request and on the basis of a first dialogue text comprised in the human-computer interaction request, determining from among a plurality of plug-ins registered in a large language model a first target plug-in related to the first dialogue text; obtaining a second dialogue text on the basis of the first dialogue text and a description text of the first target plug-in; and inputting the second dialogue text into the large language model to obtain a reply text.
A method for processing a model operator includes: determining an operator set for model networking, wherein the operator set comprises a plurality of operators; determining a storage amount occupied by an output tensor of each operator in the operator set and a computation time period consumed in a forward computation of each operator in the operator set; and determining a first operator participating in recomputation in a model from the operator set, based on the storage amounts and the computation time periods of the plurality of operators.
A method is provided that includes: obtaining first urban data of a first sample urban region; inputting the first urban data into a multi-modal foundation model to obtain respective predicted vector representations of a plurality of first data segments; obtaining a plurality of general-purpose foundation models that are pre-trained; for each general-purpose foundation model: generating a vector representation label of a first data segment of a corresponding data modality by using the general-purpose foundation model; and determining a knowledge distillation loss of the general-purpose foundation model based on the vector representation label and a predicted vector representation of the first data segment; and adjusting parameters of the multi-modal foundation model based on at least respective knowledge distillation losses of the plurality of general-purpose foundation models.
A method for reference frame selection, an apparatus for reference frame selection, an electronic device and a storage medium are provided, which relates to the field of data processing technology, in particular to the fields of video coding technology and unsupervised learning technology. The method includes: acquiring a current frame to be processed and determining attribute information of the current frame; selecting candidate reference frames from a reference frame set according to the attribute information; clustering the selected candidate reference frames to obtain at least one cluster; and selecting one candidate reference frame from each of the at least one cluster and adding the one selected candidate reference frame from each of the at least one cluster to a reference frame list associated with the current frame. The technical solution herein provided can improve the accuracy of reference frame selection.
H04N 19/105 - Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
H04N 19/172 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
64.
TARGET DETECTION METHOD BASED ON MULTI-TASK AI LARGE MODEL, AND MODEL TRAINING METHOD BASED ON MULTI-TASK AI LARGE MODEL
The present disclosure relates to the technical fields of computers, and in particular to the technical fields of artificial intelligence (AI), neural network models, and smart city, and provides a target detection method based on a multi-task AI large model, and a model training method based on a multi-task AI large model. The specific implementation scheme of the target detection method comprises: recognizing a target object of an image under test to obtain a first recognition result; on the basis of the confidence level of the first recognition result and a first threshold corresponding to a first precision, determining a first alarm object from the first recognition result as a detection result; when a trigger condition is met, performing target detection on an image under supplementary test corresponding to the first recognition result to obtain a second recognition result; on the basis of the confidence level of the second recognition result and a second threshold corresponding to a second precision, determining a second alarm object from the second recognition result; and updating the detection result on the basis of the second alarm object. The present disclosure can ensure high precision of target detection and reduce the missing recall.
The present application relates to the technical field of computers, and particularly relates to the field of source codes. Provided are a method and apparatus for generating simulated data, and an electronic device and a storage medium. The specific implementation solution comprises: acquiring a database table of a project to be processed, and determining data simulation configuration information corresponding to a field in the database table, wherein the database table comprises at least one field, which represents the type of simulated data required by said project, and the data simulation configuration information represents a generation means for the simulated data and the format of the simulated data; and on the basis of the data simulation configuration information, generating the simulated data under the field in the database table. Different pieces of configuration information are set for different fields, and when it is necessary to simulate data, corresponding configuration information is searched for to automatically generate simulated data, thereby improving the simulation efficiency of the data.
Disclosed are a sentiment analysis method and apparatus, a large language model training method and apparatus, an electronic device, a storage medium, a computer program product and a computer program. The sentiment analysis method comprises: acquiring first target text; extracting from the first target text an object to be analyzed; generating second target text on the basis of the first target text and said object, wherein the second target text comprises task prompt text, and the task prompt text is used for prompting a large language model to execute a sentiment analysis task on said object on the basis of the first target text; and inputting the second target text into the large language model to obtain the sentiment polarity of said object.
The present application provides a large language model-based event processing method and apparatus, a device and a medium. The large language model-based event processing method comprises: acquiring a question input by a user; acquiring pre-generated event information; fusing the question and the event information to obtain a fusion result; and inputting the fusion result into a pre-trained large language model to obtain reply content corresponding to the question.
The present disclosure relates to the technical field of artificial intelligence, and specifically to the technical fields such as large models and natural language understanding, and provides a chart generation method and apparatus, a device, and a storage medium. The chart generation method comprises: acquiring target text content and target prompt information; on the basis of the target text content and the target prompt information, using a first pre-trained language model to generate structured information, wherein the structured information is used for generating a target chart; on the basis of the structured information, using a second pre-trained language model to generate the target chart; and displaying the target chart. The present disclosure can improve the chart generation efficiency and accuracy.
The present disclosure relates to the technical field of video processing, in particular to the technical field of monitoring video processing, and provides a monitoring video processing method, a monitoring video processing apparatus, an electronic device, a storage medium, and a program product. The specific implementation solution is: acquiring a monitoring video stream to be processed; performing semantic segmentation on video frames in said monitoring video stream to obtain semantic tags of the video frames; on the basis of the semantic tags of the video frames and scenario determination rules, determining service scenarios to which the video frames are applicable; and determining a scenario tag of said monitoring video stream on the basis of the service scenarios to which the plurality of video frames in said monitoring video stream are applicable.
The present disclosure relates to the technical field of code generation, and provides a code generation method and apparatus, a device and a storage medium. The method comprises: in response to receiving operation information of a user, requesting a node backend to create a corresponding project and a project file; in response to determining that the project and the project file are created, requesting a java backend to create a data source; calling the java backend to perform project engineering initialization and code assembly; and in response to determining that the code assembly is completed, performing project verification and code export.
Disclosed are a student model generation method and apparatus based on a large model, and an electronic device, a storage medium, a computer program product and a computer program. The method comprises: acquiring a sample data set; inputting input data and prompt information into a large model, so as to acquire first content generated by the large model; converting the first content into a first prediction result which has the same type as a labeling result; inputting the input data into an initial student model, so as to acquire a second prediction result that is output by the initial student model; determining a correction gradient on the basis of respective differences between the second prediction result and the first prediction result and between the second prediction result and the labeling result; and on the basis of the correction gradient, correcting the initial student model, so as to acquire a target student model.
The present disclosure relates to the technical field of artificial intelligence, and specifically relates to the fields of deep learning and computer vision. Provided are a multi-objective optimization method and apparatus, and a device and a storage medium. The method comprises: acquiring a set of values to be quantized and a set of objectives to be optimized; for each value to be quantized in said set of values, acquiring a quantization coefficient corresponding to said value, and determining a set of adjacent quantization coefficients of the quantization coefficient; on the basis of a set of reconstruction values corresponding to the set of adjacent quantization coefficients, determining a set of reconstruction distortion values; and on the basis of the set of reconstruction distortion values and said set of objectives, determining a target quantization coefficient. The multi-objective optimization method provided in the present disclosure improves the performance of other optimization objectives while ensuring the performance of traditional optimization objectives.
G06V 10/74 - Image or video pattern matchingProximity measures in feature spaces
G06V 10/762 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
The present disclosure provides a data operation method, apparatus, device, and storage medium, which relates to the technical fields of distributed file system, in particular, to the technical fields of multi-version concurrency control and log-structured merge tree. The specific implementation scheme is as follows: obtaining a plurality of operation records on at least one piece of data in a file system; determining a target operation record of target data in the plurality of operation records, where the target operation record is a deletion record, and the target operation record is a latest operation record of the target data; and deleting at least one version of the target data in the file system according to the target operation record.
G06F 16/16 - File or folder operations, e.g. details of user interfaces specifically adapted to file systems
G06F 16/11 - File system administration, e.g. details of archiving or snapshots
74.
METHOD AND APPARATUS FOR GENERATING IMAGE ADVERSARIAL SAMPLE, METHOD AND APPARATUS FOR TRAINING IMAGE PROCESSING MODEL, IMAGE PROCESSING METHOD AND APPARATUS, AND DEVICE AND MEDIUM
Provided in the present application are a method and apparatus for generating an image adversarial sample, a method and apparatus for training an image processing model, an image processing method and apparatus, and a device and a medium. The method for generating an image adversarial sample comprises: acquiring an original image sample, and acquiring a feature vector map corresponding to the original image sample (S110); performing image scaling processing on the feature vector map according to an image scale of the original image sample, so as to obtain a standard-scale feature map (S120); using an attention mechanism network of a target type to process the standard-scale feature map, so as to obtain an attention influence map (S130); and on the basis of the attention influence map, adding a disturbance to the original image sample, so as to obtain an image adversarial sample corresponding to the original image sample (S140).
A method for image processing, including: obtaining an image to be processed; determining a portrait area in the image to be processed, and cropping, based on the portrait area, a target background image from the image to be processed; obtaining a portrait by performing portrait matting on the image to be processed and obtaining an enlarged portrait by enlarging the portrait, wherein a height of the enlarged portrait is greater than a height of the target background image; and generating, based on the target background image and the enlarged portrait, a target image.
The present disclosure provides method and apparatus for generating 3D scene based on large language model, electronic device, and storage medium, which relates to the field of artificial intelligence technologies, particularly the fields of three-dimensional modeling technologies, large language model technologies, or the like. The three-dimensional scene generating method based on a large language model includes: processing description information of a target three-dimensional scene to obtain label information in the description information; generating query operation prompt of the LLM based on the label information, and acquiring a target asset set matched with the label information by the LLM based on the query operation prompt, the target asset set including a target asset in the target three-dimensional scene, target material information of the target asset and target scene attribute information of the target asset; and generating the target three-dimensional scene based on the target asset set.
The present disclosure provides method and apparatus for transferring facial expression of digital human, electronic device, and storage medium, which relates to the fields of augmented reality technologies, virtual reality technologies, computer vision technologies, deep learning technologies, or the like, and can be applied to scenarios, such as metaverse, a virtual digital human, or the like, An implementation includes: screening an identification of a target reference model matched with an object model from a preset reference model library; the reference model library including a plurality of reference models; acquiring an expression library of the target reference model based on the identification of the target reference model; and transferring a last frame of an expression in the expression library of the target reference model into the object model to obtain a last frame of an expression of the object model.
Provided is a method for processing video coding. The method includes: according to domain image blocks of a target image block in a video frame, determining whether the target image block belongs to a candidate caption region; in response to determining that the target image block belongs to the candidate caption region, generating a pixel histogram of the target image block; according to the pixel histogram of the target image block, determining a region type to which the target image block belongs, where the region type is a caption region or a non-caption region; and according to the region type to which the target image block belongs, determining a target coding mode for the target image block.
H04N 19/132 - Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
H04N 19/136 - Incoming video signal characteristics or properties
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/593 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
79.
DATA QUERY OPTIMIZATION METHOD, ELECTRONIC DEVICE AND STORAGE MEDIUM
Provided is a data query optimization method, an electronic device and a storage medium, relating to the field of data processing technology and in particular to the technical fields of distributed database, big data, cloud computing and others. The method includes: determining a plurality of candidate execution plans for a target query request; determining execution costs of the plurality of candidate execution plans; updating the execution costs of the plurality of candidate execution plans based on monitoring data of data nodes involved in the plurality of candidate execution plans, to obtain final costs of the plurality of candidate execution plans; and screening out a final execution plan for the target query request from the plurality of candidate execution plans based on the final costs of the plurality of candidate execution plans.
Provided is a method for processing an oracle region cache, an electronic device and a storage medium, relating to the field of data processing technology, and in particular to the fields of big data, cloud computing, distributed database, intelligent search and other technologies. The method includes: obtaining a benefit parameter of a region to be processed, wherein the benefit parameter is used to represent a difference between benefit and cost of setting the region to be processed in the oracle region cache; and selecting the region to be processed to update the oracle region cache when the benefit parameter of the region to be processed meets a target condition.
A digital human generation method, an electronic device and a storage medium are disclosed. The solution relates to the fields of augmented reality technologies, virtual reality technologies, computer vision technologies, deep learning technologies, or the like, and can be applied to scenarios, such as metaverse, a virtual digital human, or the like. An implementation includes: acquiring a corresponding target object model based on a picture of a to-be-generated digital human; acquiring a corresponding point cloud of a head key feature in the picture from a pre-configured feature library based on the head key feature; and fusing the point cloud of the head key feature in the target object model to obtain a digital human figure.
The disclosure provides a code completion method based on a big model. The method includes: determining a first code element where a position to be completed is located in a first code file to be completed; determining a second code file having a dependency relationship with the first code file from a development project to which the first code file belongs; determining, according to the first code element, a second code element whose correlation with the first code element meets a preset condition, in which the second code element belongs to at least one of the first code file or the second code file; and generating a target code corresponding to the position to be completed through a big model based on a signature of the second code element.
The present disclosure relates to the technical field of artificial intelligence, in particular to the technical fields such as intelligent office, cloud computing, generative dialogue systems, and large language models (LLMs), and provides an LLM-based data query method and apparatus, a device, and a storage medium. The LLM-based data query method comprises: determining a target data table from among candidate data tables on the basis of a query question, wherein the target data table comprises candidate attributes; determining a target attribute from among the candidate attributes on the basis of the query question; generating query instruction prompt information of an LLM on the basis of the query question, table information of the target data table, and attribute information of the target attribute, and using the LLM to generate a query instruction on the basis of the query instruction prompt information; and on the basis of the query instruction, querying from within the target data table to obtain a query answer corresponding to the query question. The present disclosure can improve the data query efficiency and accuracy.
A method and apparatus for processing an access request, and a computer readable storage medium are provided. The method includes acquiring identification information and an IP address of an access account from an authentication message; determining permission configuration information matching the identification information; generating an access control entry based on the permission configuration information and the IP address; and processing an access request of an access account based on an access control entry.
Provided is a method of deploying a multimodal large model, an electronic device and a storage medium, relating to field of artificial intelligence technology, and in particular, to fields of deep learning and model deployment. The method includes: splitting a first multimodal large model into a visual part and a linguistic part; determining a first static graph model corresponding to the visual part and a second static graph model corresponding to the linguistic part; and deploying the first multimodal large model based on the first static graph model and the second static graph model.
A method for obtaining a cover image includes: obtaining a plurality of first cropped images of an original image corresponding to a candidate resource; obtaining an aesthetic score of each of the plurality of first cropped images; and determining a target cover image of the candidate resource from the plurality of first cropped images based on the aesthetic score of each first cropped image.
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
G06V 10/77 - Processing image or video features in feature spacesArrangements for image or video recognition or understanding using pattern recognition or machine learning using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]Blind source separation
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
G06V 30/18 - Extraction of features or characteristics of the image
G06V 30/262 - Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
87.
METHOD AND APPARATUS FOR GENERATING COMMENT INFORMATION BASED ON LARGE MODEL, ELECTRONIC DEVICE AND STORAGE MEDIUM
The disclosure provides a method and an apparatus for generating comment information based on a large model, an electronic device and a storage medium, relates to a technical field of artificial intelligence, and in particular to the technical fields of deep learning, large model, and natural language processing, and the like. The specific technical solution includes: obtaining description information of a resource to be commented on by understanding, based on the large model, the resource to be commented on; obtaining, based on the description information, comment information of the resource to be commented on, in which the comment information includes at least a comment video of the resource to be commented on; and displaying the comment video in a comment section. The intelligent generation of comment videos and texts is realized, improving the accuracy of the comment information, simplifying the comment generation process, and improving the speed of generating comments. Further, by introducing a video comment format, more diverse comment formats are provided for users to select from, greatly enhancing the user experience.
H04N 21/4788 - Supplemental services, e.g. displaying phone caller identification or shopping application communicating with other users, e.g. chatting
G11B 27/031 - Electronic editing of digitised analogue information signals, e.g. audio or video signals
H04L 47/125 - Avoiding congestionRecovering from congestion by balancing the load, e.g. traffic engineering
88.
TRAFFIC LIGHT PREDICTION METHOD, ELECTRONIC DEVICE AND STORAGE MEDIUM
A traffic light prediction method, an apparatus, and an autonomous vehicle are provided. The method includes determining lane line information of a lane where the vehicle is located and information of a target traffic light corresponding to the lane based on current position information of the vehicle, and recording the lane line formation and the information of the target traffic light as element information; recognizing an obstacle in the image acquired by the vehicle to obtain obstacle information; and associating element information with obstacle information to generate topology information, where the topology information is used to represent a binding relationship among a target traffic light, a lane line, and an obstacle; and generating a prediction result of the target traffic light based on the element information, the obstacle information, and the topology information.
G06V 20/58 - Recognition of moving objects or obstacles, e.g. vehicles or pedestriansRecognition of traffic objects, e.g. traffic signs, traffic lights or roads
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
G06V 10/77 - Processing image or video features in feature spacesArrangements for image or video recognition or understanding using pattern recognition or machine learning using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]Blind source separation
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
G06V 20/56 - Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
89.
METHOD FOR TRAINING IMAGE CROPPING MODEL, METHOD FOR PROCESSING IMAGE, ELECTRONIC DEVICE AND STORAGE
Provided is a method for training an image cropping model, a method for processing an image, an electronic device and a storage medium, relating to the field of deep learning and image processing technology. The training method includes: obtaining sample data, wherein the sample data at least includes: a sample image, a first cropped image obtained by cropping the sample image in a first manner, and a second cropped image obtained by cropping the sample image in a second manner; determining a target loss function; and using at least the sample data and the target loss function to perform model training on a preset image cropping model to obtain a target image cropping model.
G06V 10/32 - Normalisation of the pattern dimensions
G06V 10/42 - Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
G06V 10/74 - Image or video pattern matchingProximity measures in feature spaces
G06V 10/766 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
G06V 10/77 - Processing image or video features in feature spacesArrangements for image or video recognition or understanding using pattern recognition or machine learning using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]Blind source separation
G06V 10/774 - Generating sets of training patternsBootstrap methods, e.g. bagging or boosting
G06V 40/10 - Human or animal bodies, e.g. vehicle occupants or pedestriansBody parts, e.g. hands
90.
METHOD FOR INFORMATION PROCESSING, ELECTRONIC DEVICE, AND STORAGE MEDIUM
A computer-implemented method for information processing includes: obtaining text information, in which the text information includes first text information of a resource to be commented on and second text information of a candidate prompt; selecting a target prompt from the candidate prompts based on the text information; and generating comment information of the resource to be commented on, based on the resource to be commented on and the target prompt.
A method for model training based on a large model includes: determining a first large model as a teacher model of a language model, and performing distillation learning on the language model based on the first large model; inputting a first prompt text into the language model, and obtaining a plurality of first response texts for the first prompt text output by the language model; determining a reference response text for the first prompt text from the plurality of first response texts; and training the language model based on the reference response text for the first prompt text.
A large model-based recommendation method includes: determining description information of interested content corresponding to a target user; inputting a content to be recommended, the description information of interested content and current popular search sentences into a large model to generate at least one recommendation card corresponding to the content to be recommended, in which the recommendation card contains a recommendation word associated with the content to be recommended; obtaining a current behavior characteristic of the target user; and in response to the current behavior characteristic satisfying a display condition of the recommendation card, displaying the recommendation card corresponding to at least one content to be recommended.
There is provided a method for video processing, an electronic device, and a storage medium, which relates to the technical field of image processing, specifically to technical fields such as digital video and image display, which may be used in intelligent cloud and cloud computing scenarios. A specific implementation solution involves: acquiring ambient brightness data of a display device, the display device adopting a standard dynamic range (SDR) technology; obtaining screen brightness data of the display device according to video brightness data of to-be-displayed high dynamic range (HDR) video, metadata of the HDR video, and the ambient brightness data; wherein the video brightness data is obtained by tone mapping according to the metadata; and controlling, by using the screen brightness data, the display device to display the HDR video.
G06T 5/92 - Dynamic range modification of images or parts thereof based on global image properties
G06T 5/90 - Dynamic range modification of images or parts thereof
G06V 10/60 - Extraction of image or video features relating to illumination properties, e.g. using a reflectance or lighting model
G09G 3/22 - Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes for presentation of an assembly of a number of characters, e.g. a page, by composing the assembly by combination of individual elements arranged in a matrix using controlled light sources
94.
METHOD OF DETERMINING METEOROLOGICAL INFORMATION, ELECTRONIC DEVICE AND STORAGE MEDIUM
A method of determining meteorological information, an electronic device and a storage medium are provided, which relate to a field of artificial intelligence technology, and in particular to fields of deep learning and large models. The method includes performing a feature extraction on meteorological raster data of a target region within a target time period to obtain a meteorological feature vector; inputting to-be-processed meteorological data of the target region within the target time period into a large language model to obtain a text summary including a meteorological information determination manner; performing an information enhancement processing on the meteorological feature vector by using the text summary to obtain an information enhancement result; and performing a self-attention processing on the information enhancement result to obtain a meteorological information determination result output for the to-be-processed meteorological data.
A method of generating a content based on a large model, an electronic device, and a storage medium are provided, which relate to a field of artificial intelligence technology, and in particular to fields of deep learning, natural language processing, computer vision, large models, etc. The method includes performing an intention recognition on an input information in response to receiving the input information; generating a painting knowledge text by invoking a multimodal large model based on an intention for painting knowledge acquisition in response to recognizing the intention for painting knowledge acquisition from the input information; generating a first driving voice and a first action instruction for driving a virtual character according to the painting knowledge text; and broadcasting the painting knowledge text by driving the virtual character according to the first driving voice and the first action instruction.
Provided is a performance optimization method for a model training device, an electronic device, and a storage medium, relating to the fields of deep learning, large model training, and distributed parallel strategies. The method includes: determining communication timing of a current model training device with respect to a target model block at a target sorting position, so as to be able to perform synchronously collective communication with other model training devices of a plurality of model training devices with respect to model blocks at the target sorting position; and performing the collective communication on a backward gradient of the target model block at the communication timing.
A method for processing a query-response information is provided, which relates to a field of artificial intelligence technology, and in particular to fields of deep learning, large models, intelligent query and response, etc. The method for processing a query-response information includes: generating at least one initial response information according to a query information provided by an object; acquiring at least one feedback information corresponding to the at least one initial response information, wherein the feedback information indicates a preference degree of the object for the initial response information; and generating a training sample according to the query information, the at least one initial response information and the at least one feedback information. The present disclosure further provides a method for training a conversational model, an electronic device, and a storage medium.
A method for information processing, is performed by an electronic device, and the method includes: obtaining a residue sequence AT that does not carry amino acid information and a first protein backbone structure BT generated by pure noise; and performing iterative denoising on the residue sequence AT and the first protein backbone structure BT; for a tth denoising, obtaining coevolution information of a residue sequence AT+1−t, and obtaining, based on the coevolution information and a first protein backbone structure BT+1−t, a residue sequence AT−t and a first protein backbone structure BT−t after the tth denoising, until the denoising is completed and a target amino acid sequence and a second protein backbone structure are obtained, where t is a positive integer, and 1≤t≤T, and T is a number of denoising times.
G06F 30/27 - Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
G16B 30/00 - ICT specially adapted for sequence analysis involving nucleotides or amino acids
Data query method and apparatus based on large model, an electronic device, and a storage medium are disclosed, which relates to the field of artificial intelligence, specifically in natural language processing, deep learning, and large model technologies, applicable to scenarios such as dialogue systems and information retrieval. The method includes: performing entity recognition on a query to obtain the target entity in the query; obtaining a first related content associated with the target entity from internal information, and performing data analysis on the first related content using a large language model (LLM) to obtain a data analysis result; obtaining a second related content associated with the target entity from external information, and performing data generation on the second related content using the LLM to obtain a data generation result; obtaining a query result corresponding to the query based on the data analysis result and the data generation result.
The disclosure provides a method for optimizing content generated by a large model, an apparatus for optimizing content generated by a large model, an electronic device and a storage medium, and relates to the technical field of artificial intelligence, especially to the technical fields of text processing, large language model and the like. It can be applied to official document processing, automatic contract generation, legal document writing, enterprise internal system management and so on. The method includes: obtaining a question entered by a user, wherein the question is used to instruct a generation of a text of a target type; obtaining a set of target rules corresponding to the target type from a plurality of preset sets of rules, in which the set of target rules includes a plurality of target rules, and the target rules are rules followed by the target type of text; and according to a sequence of the target rules, inputting the plurality of target rules into a large language model sequentially to obtain a target text of the target type generated by the large language model. In this way, the accuracy of generating text following certain rules by the large language model is improved.