Systems, methods, and apparatus for obtaining proteins and small molecules representations for manufacture, using a herein disclosed dynamic Context Load Update Engine (CLUE) during output generation by reasoning models. Pre-trained neural networks equipped with retrieval augmentation and trained on chain-of-thought data for reasoning capacity are used. The pre-trained models are further equipped with an indicator mechanism. During the course of output generation, the indicator mechanism indicates when a need for an update to the context arises; wherein the context is a combination of the input query and the theretofore generated output. Output generation continues between each context update till completion. In one embodiment of the invention, transfer learning is used to train the pre-trained neural network in conjunction with its associated indicator and retrieval mechanisms. The trained system is used to generate representations of proteins or small molecule drugs in response to specifying queries. The generated representations are then manufactured.
G16B 40/00 - ICT specially adapted for biostatisticsICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
G16B 50/30 - Data warehousingComputing architectures
2.
CAM-guided transformers for AI-based protein and drug design
Systems, methods, and apparatus for peptide ligand and small molecule dug design given target protein sequence and structure are presented. The methods use class activation mapping (CAM)-guided transformers to generate the ligand. Given a target protein structure, a CAM-guided structure refinement process is used to optimize the structure towards the desired ligand effect classification. The embedding of the target protein's refined structure along with its residue embeddings are the input array into a transformer architecture.
G06F 30/27 - Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
Methods and apparatus for determining protein and ligand sequence, structure, and docking site given a target protein sequence and structure are presented. A multicapitate transformer architecture with a number of heads including a sequence head and a structure head is introduced, wherein given a target protein sequence and structure, a candidate ligand is generated, wherein the transformer's sequence head yields the ligand sequence and the structure head yields the ligand structure and docking site. Non-capitate weights are shared between the output heads. In one embodiment, a discriminative feature localization method is used to optimize the target protein's input structure representation towards the desired ligand effect class. The methods and apparatus presented enable design and synthesis of both peptide ligands and small molecule drugs each with specified ligand effect categories.
Methods and apparatus for protein and drug design using multicapitate (“two or more headed”) neural networks, wherein one head, a sequence head, is trained to generate the sequence of a protein, and another head, a structure head, is trained to generate the structure of the protein; and wherein the neural network is configured to accept a representation of a specified condition as input, and output a representation of a protein's sequence and structure. The structure head and sequence head each have their own loss functions, and the weights of the neural network body are shared, and jointly updated during training. Non-limiting examples of specified input conditions include representations of associated proteins and/or sets of properties of the desired output protein. Some embodiments of the invention include for the design and synthesis of effective peptide drug ligands, synthetic biologic antibody drugs, antibody drug conjugates, and monoclonal antibody (mAb) drugs.
Methods and apparatus for obtaining representations of proteins and small molecule drugs for synthesis; wherein input queries into trained mixed modality protein and natural language models are augmented with relevant query-related documents. In one embodiment, the relevant query-related documents are obtained by maximum inner product search of an embedding latent vector space into which the query and the documents are projected. The top-k most relevant documents to the query are then combined with the query as input into the trained mixed modality language model. In one embodiment, the mixed modality model is an autoregressive multicapitate transformer whose decoder output heads correspond to the represented modalities. The method returns mixed modality output representations of proteins or small molecule drugs for synthesis or manufacture.
Methods and apparatus using a mixture of representation modalities including natural language, protein sequence, protein structure, property-vector, and small molecule drug representations to jointly train a neural network which accepts mixed modality queries as input and produces mixed modality output responses including representations of proteins for synthesis and of small molecule drugs for manufacture. In one embodiment of the invention, multicapitate transformers wherein each decoder head has a distinct loss function and represents a distinct modality, are used. Modality-specific embeddings are implemented for the mixed modality input query, and an autoregressive process yields the output protein for synthesis or small molecule drug for manufacture.
Methods and apparatus for obtaining representations of proteins and small molecule drugs for synthesis; wherein pre-trained mixed modality protein and natural language fusion models are further trained by supervised fine tuning using reasoning-oriented query—chain-of-thought (CoT) response pairs. The resulting reasoning-oriented neural network is then used to obtain representations of output proteins or small molecule drugs, in response to mixed modality reasoning-oriented input queries specifying conditions on the output. In one embodiment, the neural network is an autoregressive multicapitate transformer whose decoder output heads correspond to the represented modalities. The method returns mixed modality output representations of proteins or small molecule drugs for synthesis or manufacture.
42 - Scientific, technological and industrial services, research and design
45 - Legal and security services; personal services for individuals.
Goods & Services
Pharmaceutical products development; Design and development of artificial intelligence (AI) software; Pharmaceutical research services; Pharmaceutical product evaluation; Laboratory research in the field of pharmaceuticals; Laboratory services for the production of pharmaceuticals; Development of pharmaceuticals for cell and gene therapies; Research and development in the field of pharmaceuticals; Research and development services in the field of pharmaceutical preparations; Research and development of pharmaceutical preparations for treating malignant tumours; Conducting early evaluations in the field of new pharmaceuticals; Providing medical research information in the field of pharmaceuticals; Testing, inspection, research, or development of pharmaceutical preparations for gene therapy; Research and development of new products in the field of pharmaceuticals; Providing medical and scientific research information in the field of pharmaceuticals and clinical trials; Development of pharmaceutical preparations and medicines; Artificial intelligence as a service (AIAAS) services featuring software using artificial intelligence for data exploration; Artificial intelligence as a service (AIAAS) services featuring software using artificial intelligence for data analytics; Research in the field of artificial intelligence; Artificial intelligence as a service (AIAAS) services featuring software using artificial intelligence for business analytics; Artificial intelligence as a service (AIAAS) services featuring software using artificial intelligence for developing data science models; Artificial intelligence as a service (AIAAS) featuring software using artificial intelligence for analyzing data and interacting with humans; Artificial intelligence as a service (AIAAS) services featuring software using artificial intelligence for creating complex data science analyses; Technology consultation in the field of artificial intelligence; Research in the field of artificial intelligence technology; Consultancy in the field of artificial intelligence (AI) technology; Research in the field of artificial intelligence (AI) software; Research, design and development of software using artificial intelligence; Advanced product research in the field of artificial intelligence (AI); Software authoring; Customizing computer software; Computer software design; Developing computer software; Consultation services relating to computer software; Computer programming and software design Licensing of software; Licensing of intellectual property rights; Licensing of intellectual property; Granting of licences for intellectual property rights
42 - Scientific, technological and industrial services, research and design
45 - Legal and security services; personal services for individuals.
Goods & Services
Software authoring; Computer software design; Developing computer software; Software design and development; Computer software design services; Computer software design and development; Design, development and implementation of software; Pharmaceutical product evaluation; Pharmaceutical research services; Pharmaceutical products development; Laboratory research services relating to pharmaceuticals; Laboratory research in the field of pharmaceuticals; Laboratory services for the production of pharmaceuticals; Development of pharmaceuticals for cell and gene therapies; Research and development of pharmaceutical preparations for treating malignant tumours; Research and development services in the field of pharmaceutical preparations; Development of pharmaceutical preparations and medicines; Artificial intelligence as a service (AIAAS) services featuring software using artificial intelligence for business analytics; Artificial intelligence as a service (AIAAS) services featuring software using artificial intelligence for data exploration; Artificial intelligence as a service (AIAAS) services featuring software using artificial intelligence for data assessment; Artificial intelligence as a service (AIAAS) services featuring software using artificial intelligence for data analytics; Research in the field of artificial intelligence; Artificial intelligence as a service (AIAAS) services featuring software using artificial intelligence for developing data science models; Artificial intelligence as a service (AIAAS) featuring software using artificial intelligence for analyzing data and interacting with humans; Artificial intelligence as a service (AIAAS) services featuring software using artificial intelligence for creating complex data science analyses; Technology consultation in the field of artificial intelligence; Research in the field of artificial intelligence technology; Design and development of artificial intelligence (AI) software; Research, design and development of software using artificial intelligence; Technology consultation in the field of artificial intelligence (AI); Research in the field of artificial intelligence (AI) technology; Advanced product research in the field of artificial intelligence (AI) Software licensing; Computer software licensing; Licensing of intellectual property; Licensing of intellectual property rights; Granting of licenses for intellectual property rights
10.
Recursive transformers for AI-based protein-protein interaction and drug design
Methods and apparatus for determining a representation of a protein-protein complex, given a constituent target complex of the protein-protein complex are presented; where the constituent target complex is some subset of the protein-protein complex. A recursive transformer neural network is devised, wherein for each iteration of the recursion, a representation of the output constituent protein complexed with the input constituent target complex is passed into the transformer as input for the next iteration. Some embodiments of the invention include design and manufacturing of effective synthetic biologic drugs, monoclonal antibody (mAb) drug, Antibody Drug Conjugate (ADC), peptide ligand drug, and small molecule drugs (SMDs).
Methods and apparatus for determining protein and ligand structure, for identifying ligand docking sites, and for obtaining both peptide and non-peptide drug ligand candidates for target proteins are presented. Methods include receiving a plurality of protein-ligand complex structures at a processor, converting to volumetric probability representation, and generating a training dataset by sequentially transforming the voxel-wise probability distributions. A discrepancy measure between consecutive transformations is bounded; that discrepancy measure between each state and the final diffused state progressively decreases; and localization probability of each residue summed over the diffusion volume is constant. A neural network is trained to learn protein and ligand residue localization, given a diffused representation. The methods serve to generate a protein structure given its sequence; or to generate a candidate ligand structure for a given target protein, given only ligand residue composition; or to determine promising candidate peptide and non-peptide drug ligands for synthesis.
Methods, systems, and apparatus for determining a conformational structure of a protein by using discriminative feature localization to iteratively update the protein structure locally, optimizing with respect to a physical or biological property of the structure representation. In one aspect, a method comprises initialization a plurality of structure parameters, selecting a physical or biological property of interest, training a neural network to score protein structural conformations on their measure of the selected property, using the neural network to perform inference yielding both a classification score and a discriminative feature localization map; and iteratively updating the structure parameters over the discriminative feature map, optimizing with respect to the physical or biological property of interest.