Genome-wide association studies may allow for detection of variants that are statistically significantly associated with disease risk. However, inferring which are the genes underlying these variant associations may be difficult. The presently disclosed approaches utilize machine learning techniques to predict genes from genome-wide association study summary statistics that substantially improves causal gene identification in terms of both precision and recall compared to other techniques.
G16H 50/20 - ICT specially adapted for medical diagnosis, medical simulation or medical data miningICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
2.
METHOD FOR DIAGNOSING RESPIRATORY PATHOGENS AND PREDICTING COVID-19 RELATED OUTCOMES
THE REGENTS OF THE UNIVERSITY OF COLORADO, A BODY CORPORATE (USA)
ILLUMINA SOFTWARE, INC. (USA)
Inventor
Barnes, Kathleen
Yang, Ivana
Gignoux, Christopher
Mathias, Rasika
Norman, Paul
Taye, Alem
Porecha, Rishi
Barnes, Bret
Peterson, Brett
Abstract
Provided by the inventive concept is a DNA methylation-based platform, and machine learning algorithms, for diagnosing respiratory pathogens including SARS-CoV-2 and predicting COVID-19 related outcomes, and methods of using the same, such as in identifying the presence of a viral infection, such as a SARS-CoV-2 infection, determining whether a subject has COVID-19, and/or whether a subject with COVID-19 is likely to develop acute respiratory distress syndrome or multisystem inflammatory syndrome in children.
C12Q 1/70 - Measuring or testing processes involving enzymes, nucleic acids or microorganismsCompositions thereforProcesses of preparing such compositions involving virus or bacteriophage
CYP21A2CYP21A1P CYP21A1P gene, the copy numbers of the RCCX region, and candidate haplotypes. Also disclosed herein are systems, devices, and methods for detecting one or more single-nucleotide variants or indels in a RCCX region in a nucleic acid sample.
G16B 20/20 - Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
C12Q 1/6883 - Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
5.
GENERATING AND IMPLEMENTING A STRUCTURAL VARIATION GRAPH GENOME
This disclosure describes methods, non-transitory computer readable media, and systems that can generate a structural variation graph genome with alternate contiguous sequences representing structural variant haplotypes. For instance, the disclosed systems can identify candidate structural variants that satisfy an occurrence threshold within a genomic sample database. From among the candidate structural variants, the systems select structural variant haplotypes based on one or both of the structural variant haplotypes satisfying a relative haplotype frequency and finding flanking variants adjacent to particular structural variant haplotypes. The systems can likewise select reference haplotypes corresponding to the selected structural variant haplotypes from a reference genome. Based on the selected haplotypes, the disclosed systems generate a structural variation graph genome comprising both alternate contiguous sequences representing the structural variant haplotypes and reference sequences representing the reference haplotypes.
This disclosure describes methods, non-transitory-computer readable media, and systems that can accurately genotype one or more human leukocyte antigen (HLA) alleles from a genomic sample by using alignment-score-based filtering and read-support-equivalence grouping of reads for genotype inference. To genotype HLA alleles, the disclosed systems extract a genomic sample's reads corresponding to an HLA genomic region and align the extracted reads with HLA-allele-reference sequences. The disclosed systems further select a subset of read alignments for the extracted reads based on alignment scores for alignments between the extracted reads and the HLA-allele-reference sequences. Based on the selected subset of read alignments, the disclosed systems group individual reads into HLA equivalence classes and determine candidate HLA alleles for the genome sample at one or more HLA loci. From among the candidate HLA alleles, the disclosed systems determine genotype calls that a genomic sample includes particular HLA alleles at one or more HLA loci.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for efficiently identifying and selecting split groups corresponding to one or more nucleotide reads. Generally, split groups comprise chains of fragments forming split-alignments of one read. The disclosed system utilizes dynamic programming to generate and evaluate candidate split groups. The disclosed system can generate split group scores for each of the candidate split groups. To generate the split group scores, the disclosed system considers fragment alignment scores and geometries of fragment alignments within the candidate split groups. The disclosed systems select a predicted split group from the candidate split groups based on the split group scores.
Some embodiments of the methods and compositions provided herein relate to obtaining long read information from short reads of a target nucleic acid. Some embodiments include steps to selectively generate, mark, and amplify long nucleic acid fragments. Some embodiments include enriching for certain sequences in the long fragments with selection probes directed to certain genes throughout the genome and expressed regions with low mappability. Some embodiments also include fragmenting the long nucleic acid fragments into shorter fragments for sequencing, and informatically reconstructing a sequence of the target nucleic acid.
This disclosure describes methods, non-transitory computer readable media, and systems that can configure a field programmable gate array (FPGA) or other configurable processor to implement a neural network and train the neural network using the configurable processor by modifying certain network parameters of a subset of the neural network's layers. For instance, the disclosed systems can configure a configurable processor on a computing device to implement a base-calling-neural network (or other neural network) that includes different sets of layers. Based on a set of images of oligonucleotide clusters or other datasets, the neural network generates predicted classes, such as by generating nucleobase calls for oligonucleotide clusters. Based on the predicted classes, the disclosed systems subsequently modify certain network parameters for a subset of the neural network's layers, such by modifying parameters for a set of top layers.
This disclosure describes methods, non-transitory computer readable media, and systems that can use a machine-learning model to classify or predict a probability of an oligonucleotide probe yielding an accurate genotype call or hybridizing with a target oligonucleotide—based on the oligonucleotide probe's nucleotide-sequence composition. To intelligently identify oligonucleotide probes that are more likely to yield accurate downstream genotyping—or more likely to successfully hybridize with target oligonucleotides—some embodiments of the disclosed machine-learning model include customized layers trained to detect motifs or other nucleotide-sequence patterns that correlate with favorable or unfavorable probe accuracy. By intelligently processing the nucleotide sequences of candidate oligonucleotide probes before implementing a microarray for a particular target oligonucleotide, the disclosed system can identify oligonucleotide probes with better genotyping accuracy (or better binding accuracy) than existing microarray systems for use in a microarray.
Systems, methods, and apparatus are described herein for training machine learning models to predict probe intensity values using sample-specific image data and/or applying the predicted probe intensity values. As described herein, sample-specific image may include a signal associated with a sample for a process probe in a microarray relating to a single individual. The machine learning model may be trained, using the sample-specific image data, to predict a probe intensity value. The probe intensity value may be a raw probe intensity value or a normalized probe intensity value. After being trained, the machine learning model may receive as input a probe sequence or probe features. The machine learning model may be used to predict a total probe intensity value based on the probe sequence or the one or more probe features.
We disclose a computer-implemented method of base calling. The technology disclosed accesses a time series sequence of a read. Respective time series elements in the time series sequence represent respective bases in the read. Then, a composite sequence for the read is generated based on respective aggregate transformations of respective sliding windows of time series elements in the time series sequence. A subject composite element in the composite sequence is generated based on an aggregate transformation of a corresponding window of time series elements in the time series sequence. Then, the composite sequence is processed as an aggregate and generates a base call sequence that has respective base calls for the respective bases in the read.
The invention relates to methods and kits for use in nucleic acid sequencing, in particular methods for use in concurrent sequencing, and in particular concurrent sequencing of tandem insert libraries.
The invention relates to methods and associated products for preparing polynucleotide sequences for detection of modified cytosines and sequencing said polynucleotides to detect modified cytosines. The methods comprise treatment of the target polynucleotide with a conversion reagent that is configured to convert a modified cytosine to thymine or a nucleobase which is read as thymine/uracil, and/or configured to convert an unmodified cytosine to uracil or a nucleobase which is read as thymine/uracil. In particular embodiments, portions of both strands of the treated target are sequenced concurrently.
A method of base calling nucleobases of two or more polynucleotide sequence portions, wherein said polynucleotide sequence portions have been selectively processed such that an intensity of the signals obtained based upon the respective first nucleobase is greater than an intensity of the signals obtained based upon the respective second nucleobase.
Disclosed herein include systems, machines, devices, and methods for single-pass methylation mapping. C-to-T converted sequence reads and G-to-A converted sequence reads generated from a sample subjected to a methylation assay can be mapped to a mapping reference sequence comprising a C-to-T converted reference sequence and a G-to-A converted reference sequence generated to a reference genome sequence. The counts of Cs and Ts of sequence reads mapped to each of one or more positions with Cs in the reference genome sequence can be used to determine whether the position is a methylated C or an unmethylated C in the sample.
Systems and methods of identifying nucleobases in a template polynucleotide are disclosed. In one embodiment, such a method may include providing a substrate comprising a plurality of double stranded template polynucleotides in a cluster. Each double stranded template polynucleotide may comprise a first strand and a second strand. The method may further include contacting the plurality of double stranded template polynucleotides with first primers which bind to the first strand and second primers which bind to the second strand. The method may further include extending the first primers and the second primers by contacting the cluster with labeled nucleobases to form first labeled primers and second labeled primers. The method may further include stimulating light emissions from the first and second labeled primers, wherein an amplitude of the signal generated by the first labeled primers is greater than an amplitude of the signal generated by the second labeled primers. The method may further include identifying the labeled nucleobases added to the first primers and the second primers based on the amplitude of the signal generated by the labeled nucleobases.
Systems and methods of identifying nucleobases in a template polynucleotide are disclosed. In one embodiment, such a method may include providing a substrate comprising a plurality of the template polynucleotides in a cluster. The method may further include generating light to stimulate fluorescent emissions from the cluster. The method may further include receiving a first signal emitted at a first intensity from a first plurality of nucleotide analogs hybridized to the plurality of template polynucleotides at a first site. The method may further include receiving a second signal emitted at a second intensity from a second plurality of nucleotide analogs hybridized to the plurality of template polynucleotides at a second site. The method may further include identifying the nucleobases hybridized at the first and second sites of the template polynucleotide based on a combination of the first and second signals.
This disclosure describes methods, non-transitory computer readable media, and systems that can introduce short calibration sequences into a sequencing device and run calibration cycles to adjust or otherwise determine a sequencing parameter corresponding to the sequencing device. For instance, the disclosed systems can detect a flow cell (or other sample-nucleotide slide) with calibration sequences incorporated into samples' library fragments or into a surface of the sample-nucleotide slide. By running one or more calibration cycles to incorporate nucleobases on oligonucleotides corresponding to calibration sequences and capture corresponding images for calibration sequences—separate from genomic sequencing cycles for sample genomic sequences—the disclosed systems can determine a sequencing parameter corresponding to the sequencing device.
Artificial intelligence driven signal enhancement of sequencing images enables enhanced sequencing by synthesis that determines a sequence of bases in genetic material with any one or more of: improved performance, improved accuracy, and/or reduced cost. A training set of images taken at unreduced and reduced power levels used to excite fluorescence during sequencing by synthesis is used to train a neural network to enable the neural network to recover enhanced images, as if taken at the unreduced power level, from unenhanced images taken at the reduced power level.
Artificial intelligence driven enhancement of motion blurred sequencing images enables enhanced sequencing that determines a sequence of bases in genetic material with any one or more of: improved performance, improved accuracy, and/or reduced cost. A training set of images taken after unreduced and reduced movement settling times during sequencing is used to train a neural network to enable the neural network to recover enhanced images, as if taken after the unreduced movement settling time, from unenhanced images taken after the reduced movement settling time.
Described herein are technologies for converting context of an ANN or context of another type of computing system that is trainable through machine learning. In some implementations, the technologies convert a first context of a computing system (such as an ANN), which is to provide pathogenicity of variants of genomes of a population, to a second context of the computing system, which is to provide pathogenicity of indels of the genomes of the population.
This disclosure describes methods, non-transitory computer readable media, and systems that can flexibly and efficiently change versions of a variant analysis model for different genomic analysis applications. For example, the disclosed systems can determine a particular version of a variant analysis model indicated by a genomic analysis application and can update a genomic analysis device (e.g., FPGA, CPU) by installing the indicated version of the variant analysis model. The disclosed systems can further execute a genomic analysis application to analyze nucleotide base calls utilizing the version of variant analysis model indicated by the genomic analysis application.
G16B 20/20 - Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
G16B 5/00 - ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
30.
MACHINE LEARNING MODEL FOR RECALIBRATING NUCLEOTIDE BASE CALLS CORRESPONDING TO TARGET VARIANTS
This disclosure describes methods, non-transitory computer readable media, and systems that can utilize a machine learning model to recalibrate nucleotide base calls (e.g., variant calls) of a call generation model. For instance, the disclosed systems can train and utilize a call recalibration machine learning model to generate a set of predicted variant call classifications based on sequencing metrics associated with a sample nucleotide sequence. Leveraging the set of variant call classifications, the disclosed systems can further update or modify nucleotide base calls (e.g., variant calls) corresponding to genomic coordinates, such as multiallelic genomic coordinates, haploid genomic coordinates, and genomic coordinates indicated (by the call generation model) to exhibit homozygous reference genotypes.
This disclosure describes methods, non-transitory computer readable media, and systems that can query the status of various stages in an end-to-end sequencing process and generate a graphical status summary for the sequencing process that depicts icons indicating statuses of the various stages. For instance, the disclosed systems can generate a graphical status summary for a nucleotide sequencing taskset that includes icons depicting statuses of a sequencing run, a data transfer of base-call data to a device for variant analysis, and the variant analysis—each part of the same nucleotide sequencing taskset. By exchanging data with a sequencing device for read data and one or more servers for variant analysis, the disclosed system can quickly provide a graphical status summary of an end-to-end sequencing process marked by various tasks within a nucleotide sequencing taskset.
This disclosure describes methods, non-transitory computer readable media, and systems that can facilitate execution of external workflows for diagnostic analysis of nucleotide sequencing data utilizing a container orchestration engine. For example, the disclosed systems can utilize a container orchestration engine to allow external systems (e.g., third-party systems) to generate and implement workflows for analyzing sequencing data. In executing individual workflow containers of a sequencing diagnostic workflow, the disclosed systems can isolate the workflow containers to prevent access to, or corruption of, other data while also orchestrating allocation of computing resources available at a genomic sequence processing device to execute the workflow containers.
An iterative process may be implemented for incrementally aggregating available batches of sample data with previously available batches to perform sequencing analysis. Genomic variant call files associated with one or more samples may be received in batches from sequencing devices and aggregated for performing sequencing analysis. The aggregated genomic variant call files may be used to generate cohort files and census files that comprise summary information related to the genomic variant call files in each batch. The census data in census files may be aggregated into a global census file that includes summary genome variant data. Multi-sample variant call files may be generated based on the global census file, cohort files, and census files. The genomic variant call files may be processed using parallel processing at multiple compute nodes. The files may be further compressed and overlapping data may be efficiently stored in buffer positions.
This disclosures describes embodiments of methods, systems, and non-transitory computer readable media that accurately and efficiently estimate the effects of phasing and pre-phasing for a particular cluster of oligonucleotides and determining a cluster-specific-phasing correction for the cluster. For instance, the disclosed systems can dynamically identify clusters of oligonucleotides exhibiting error-inducing sequences that frequently cause phasing or pre-phasing. When the disclosed systems detect signals during cycles at read positions following such an error-inducing sequence, the disclosed systems can generate cluster-specific-phasing coefficients and correct the signals according to such cluster-specific-phasing coefficients. For instance, the disclosed system can utilize a linear equalizer, decision feedback equalizer, or a maximum likelihood sequence estimator to generate cluster-specific-phasing coefficients.
The technology disclosed relates to state-based base calling. In particular, the technology disclosed relates to incorporating state information about data from previous sequencing cycles into the analysis of data from a current sequencing cycle when generating a base call for the current sequencing cycle. For example, when generating a base call for an Nth sequencing cycle, the technology disclosed can incorporate into the base calling logic state information about data from sequencing cycles 1 to N-1.
The technology discloses comprises a system. The system comprises a spatial convolutional neural network configured to process sequencing images of clusters, and produce spatially convolved features, a filtering logic configured to select, from the spatially convolved features, a subset of spatially convolved features that contain centers of the clusters, a compression logic configured to compress the subset of spatially convolved features into a set of compressed features, a contextualization logic configured to access state information for compressed features in the set of compressed features, a temporal convolutional neural network configured to process the set of stateful compressed features, and produce temporally convolved stateful features, and a base calling logic configured to generate base calls for the clusters based on the temporally convolved stateful features.
Methods, systems, and non-transitory computer readable media are disclosed for accurately and efficiently identifying base-call-error scars or patterns from sequencing data to determine failure sources that contribute to the base-call-error scars or patterns. For example, the disclosed system can utilize a reference genome to determine nucleotide-specific errors within a run of a sequencing pipeline. Based on the co-occurrence of different nucleotide-specific errors, the disclosed system can determine a base-call-error scar. The disclosed system can further determine one or more sample error scars from sample sequencing runs that correlate to the base-call-error scar. Based on the correlation and by utilizing a statistical model, the disclosed system can identify failure sources contributing to the nucleotide-specific errors within the base-call-error scar.
The disclosed technology relates to systems and methods for nucleic acid sequencing utilizing a single light source and a single detector. The disclosed technology may use the light source to stimulate a fluorescence emission from a polynucleotide and identify a nucleobase in the polynucleotide based on the intensity of the fluorescence emission received by the detector. The disclosed technology may utilize four types of nucleotide analogs which emit light at four distinguishable levels when excited by the light source. In various embodiments, the four types of nucleotide analogs may be coupled to different fluorophores or the same fluorophore with different probabilities or copy numbers.
Detecting analytes using proximity-induced tagmentation, strand invasion, restriction, or ligation is provided herein. In some examples, detecting an analyte includes coupling a donor recognition probe to a first portion of the analyte. The donor recognition probe includes a first recognition element specific to the first portion of the analyte, a first oligonucleotide corresponding to the first portion, and a transposase coupled to the first recognition element and the first oligonucleotide. An acceptor recognition probe is coupled to a second portion of the analyte. The acceptor recognition probe includes a second recognition element specific to the second portion of the analyte and a second oligonucleotide coupled to the second recognition element and corresponding to the second portion. The transposase is used to generate a reporter polynucleotide including the first and second oligonucleotides. The analyte is detected based on the reporter including comprising the first and second oligonucleotides.
A method of base calling using at least two base callers is disclosed. The method includes executing at least a first base caller and a second base caller on sensor data generated for sensing cycles in a series of sensing cycles; generating, by the first base caller, first classification information associated with the sensor data, based on executing the first base caller on the sensor data; and generating, by the second base caller, second classification information associated with the sensor data, based on executing the second base caller on the sensor data. In an example, based on the first classification information and the second classification information, a final classification information is generated, where the final classification information includes one or more base calls for the sensor data.
A method of generating base calls by a base caller is disclosed. The method includes receiving a plurality of sensor data from a flow cell, wherein the plurality of sensor data is within a first range and identifying a second range, such that at least a threshold percentage of the plurality of sensor data are within the second range. At least a subset of the plurality of sensor data, that are within the second range, are mapped to a third range, thereby generating a plurality of normalized sensor data. The plurality of normalized sensor data is processed in a base caller, to call, for the plurality of normalized sensor data, one or more corresponding bases.
The technology disclosed attenuates spatial crosstalk from sequencing images for base calling. The technology disclosed accesses a section of an image output by a biosensor, where the section of the image includes a plurality of pixels depicting intensity emission values from a plurality of clusters within the biosensor and from locations within the biosensor that are adjacent to the plurality of clusters. The plurality of clusters includes a target cluster. The section of the image is convolved with a convolution kernel, to generate a feature map comprising a plurality of features having a corresponding plurality of feature values. A weighted feature value is assigned to the target cluster, where the weighted feature value is based on one or more features values of the plurality of feature values of the feature map. The weighted feature value assigned to the target cluster is processed, to base call the target cluster.
G06V 20/69 - Microscopic objects, e.g. biological cells or cellular parts
G06V 10/778 - Active pattern-learning, e.g. online learning of image or video features
G06V 10/82 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
G06K 9/62 - Methods or arrangements for recognition using electronic means
G06V 10/762 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
We disclose a system. The system comprises a memory and a runtime logic. The memory stores a plurality of specialist signal profilers. Each specialist signal profiler in the plurality of specialist signal profilers is trained to maximize signal-to-noise ratio of sequenced signals in a particular signal profile detected for analytes in a particular analyte class and characterized in a particular training data set. The runtime logic, having access to the memory, is configured to execute a base calling operation by applying respective specialist signal profilers in the plurality of specialist signal profilers to sequenced signals in respective signal profiles detected for analytes in respective analyte classes during the base calling operation.
This disclosure describes methods, non-transitory computer readable media, and systems that can utilize a machine learning model to recalibrate nucleotide-base calls (e.g., variant calls) of a call-generation model. For instance, the disclosed systems can train and utilize a call-recalibration-machine-learning model to generate a set of predicted variant-call classifications based on sequencing metrics associated with a sample nucleotide sequence. Leveraging the set of variant-call classifications, the disclosed systems can further update or modify nucleotide-base calls (e.g., variant calls) corresponding to genomic coordinates. Indeed, the disclosed systems can generate an initial nucleotide-base call based on sequencing metrics for nucleotide reads of a sample sequence utilizing a call-generation model and further utilize a call-recalibration-machine-learning model to generate classification predictions for updating or recalibrating the initial nucleotide-base call from a subset of the same sequencing metrics or other sequencing metrics.
This disclosure describes methods, non-transitory computer readable media, and systems that can generate signal-to-noise-ratio metrics for clusters of oligonucleotides to which tagged nucleotide bases are added and utilize the signal-to-noise-ratio metrics to generate nucleotide-base calls and determine base-call quality. For example, the disclosed systems can generate the signal-to-noise-ratio metrics using scaling factors and noise levels associated with light signals detected from the clusters of oligonucleotides. The disclosed systems can utilize the signal-to-noise-ratio metrics to generate intensity-value boundaries for generating nucleotide-base-calls for the signals in accordance with one or more base-call-distribution models. Additionally, the disclosed systems can utilize a threshold to filter out signals detected from the clusters of oligonucleotides that have low signal-to-noise-ratio metrics. The disclosed systems can further utilize the signal-to-noise-ratio metrics to generate quality metrics for generated nucleotide-base calls
Registration of a patterned flow cell may utilize fiducials comprising sets or groupings of features (e.g., sites, sample wells, nanowells) having known locations and in which the placement of the features is not in accordance with a periodic pattern or is otherwise distinguishable from the periodic pattern of sites present in non-fiducial regions of the flow cell substrate. In certain embodiments the positioning of the sites that are part of the fiducial represent a break or discontinuity in the periodic pattern of sites that are otherwise present on the surface of a patterned flow cell.
The disclosed technology relates to the field of nucleic acid sequencing, and more particularly, to systems and methods for DNA sequencing utilizing a single optical excitation and at least three fluorescent labels. In some embodiments, the disclosed technology uses a first nucleotide coupled to a first fluorescent label which can emit light to be detectable by a first detector, a second nucleotide coupled to a second fluorescent label which can emit light to be detectable by a second detector, a third nucleotide coupled to a third fluorescent label which can emit light to be detectable by both the first and second detectors, and a fourth nucleotide coupled to no fluorescent label. The disclosed technology may identify a nucleotide in the nucleic acid sequence based on whether the emission is received by the first detector, the second detector, both the first and second detectors, or neither the first nor second detector.
Methods, systems, and non-transitory computer readable media are disclosed for accurately and efficiently detect when bubbles impact nucleic-acid-sequencing runs based on data captured during (or derived from) base calls during sequencing runs. In particular, in one or more embodiments, the disclosed systems receive data identifying nucleobase calls and data identifying quality metrics for the nucleobase calls during sequencing cycles. Based on particular nucleobase calls and threshold markers for the quality metrics, the disclosed system utilizes a machine-learning-model to detect a presence of a bubble in a nucleotide-sample slide. Beyond simply detecting the presence of a bubble, the disclosed system can also classify different detected bubbles, such as air bubbles, oil bubbles, or ghost bubbles, or other outputs during sequencing. By utilizing call data and quality metrics, the disclose system can use readily available sequencing data in a platform-agnostic approach to detect bubbles using a uniquely trained machine-learning model.
Methods, systems, and non-transitory computer readable media are disclosed for accurately and efficiently detect when bubbles impact nucleic-acid-sequencing runs based on data captured during (or derived from) base calls during sequencing runs. In particular, in one or more embodiments, the disclosed systems receive data identifying nucleobase calls and data identifying quality metrics for the nucleobase calls during sequencing cycles. Based on particular nucleobase calls and threshold markers for the quality metrics, the disclosed system utilizes a machine-learning-model to detect a presence of a bubble in a nucleotide-sample slide. Beyond simply detecting the presence of a bubble, the disclosed system can also classify different detected bubbles, such as air bubbles, oil bubbles, or ghost bubbles, or other outputs during sequencing. By utilizing call data and quality metrics, the disclose system can use readily available sequencing data in a platform-agnostic approach to detect bubbles using a uniquely trained machine-learning model.
A system for base calling includes memory storing a topology of a neural network, a plurality of weights sets, and sensor data for a series of sensing cycles. Sequencing events span temporal progression of the base calling operation through subseries of sensing cycles, and spatial progression of the base calling operation through locations on a biosensor. A configurable processor is configured to load the topology on the configurable processor, select a weight set in dependence upon a subject subseries of sensing cycles and/or a subject location on the biosensor, load subject sensor data for the subject subseries of sensing cycles and the subject location on the processing elements, configure the topology using the selected weight set, and cause the neural network to process the subject sensor data to produce base call classification data for the subject subseries and the subject location.
A system for base calling includes memory storing a topology of a neural network, a plurality of weights sets, and sensor data for a series of sensing cycles. Sequencing events span temporal progression of the base calling operation through subseries of sensing cycles, and spatial progression of the base calling operation through locations on a biosensor. A configurable processor is configured to load the topology on the configurable processor, select a weight set in dependence upon a subject subseries of sensing cycles and/or a subject location on the biosensor, load subject sensor data for the subject subseries of sensing cycles and the subject location on the processing elements, configure the topology using the selected weight set, and cause the neural network to process the subject sensor data to produce base call classification data for the subject subseries and the subject location.
A method of quantizing parameters of a neural network includes grouping a plurality of parameters of a neural network in a plurality of groups. Each group of the plurality of groups includes corresponding two or more parameters of the plurality of parameters. In an example, for each group, a corresponding quantization format is selected from a plurality of available quantization formats, such that a first quantization format selected for at least a first group is different from a second quantization format selected for at least a second group. For each group, individual parameters within the corresponding group are quantized using the quantization format selected for the corresponding group. The quantized parameters of the plurality of groups are stored in a memory.
A method of quantizing parameters of a neural network includes grouping a plurality of parameters of a neural network in a plurality of groups. Each group of the plurality of groups includes corresponding two or more parameters of the plurality of parameters. In an example, for each group, a corresponding quantization format is selected from a plurality of available quantization formats, such that a first quantization format selected for at least a first group is different from a second quantization format selected for at least a second group. For each group, individual parameters within the corresponding group are quantized using the quantization format selected for the corresponding group. The quantized parameters of the plurality of groups are stored in a memory.
The disclosed technology relates to automated fluid handling systems and automated sequencing methods for re-analyzing a sample to achieve a more informative test result. In one embodiment, a method of processing a sample nucleic acid to identify a target mutation comprises performing a first sequencing reaction to determine sample specific properties. The method further comprises determining a statistical measure to determine if a first read coverage for the target mutation from the first sequencing reaction is above or below a threshold. If the determined first read coverage does not exceed the threshold, the method further comprises determining if a sufficient amount of sample nucleic acid is available to perform a second sequencing reaction to increase the read coverage above the threshold. If a sufficient amount of sample nucleic acid is available, the method proceeds to perform re-sequencing of the sample nucleic acid to achieve a second read coverage exceeding the threshold.
The disclosed technology relates to automated fluid handling systems and automated sequencing methods for re-analyzing a sample to achieve a more informative test result. In one embodiment, a method of processing a sample nucleic acid to identify a target mutation comprises performing a first sequencing reaction to determine sample specific properties. The method further comprises determining a statistical measure to determine if a first read coverage for the target mutation from the first sequencing reaction is above or below a threshold. If the determined first read coverage does not exceed the threshold, the method further comprises determining if a sufficient amount of sample nucleic acid is available to perform a second sequencing reaction to increase the read coverage above the threshold. If a sufficient amount of sample nucleic acid is available, the method proceeds to perform re-sequencing of the sample nucleic acid to achieve a second read coverage exceeding the threshold.
Presented are automated fluid handling systems and automated sequencing methods for re-analyzing a sample to achieve a more informative test result. In one embodiment, a method of processing a sample nucleic acid to identify a target mutation comprises performing a first sequencing reaction to determine sample specific properties. The method further comprises determining a statistical measure to determine if a first read coverage for the target mutation from the first sequencing reaction is above or below a threshold. If the determined first read coverage does not exceed the threshold, the method further comprises determining if a sufficient amount of sample nucleic acid is available to perform a second sequencing reaction to increase the read coverage above the threshold. If a sufficient amount of sample nucleic acid is available, the method proceeds to perform re-sequencing of the sample nucleic acid to achieve a second read coverage exceeding the threshold.
Methods and systems are disclosed which can gather large data sets from nucleic acid sequencing technologies and devices, filter relevant genomic information and sequence variant information of biological samples from files of various formats, generate a custom data file having only relevant information in a standardized format, and provide the generated information to downstream analysis for personalized medicine use.
THE REGENTS OF THE UNIVERSITY OF COLORADO A BODY CORPORATE (USA)
ILLUMINA SOFTWARE, INC. (USA)
Inventor
Barnes, Kathleen
Yang, Ivana
Gignoux, Christopher
Mathias, Rasika
Norman, Paul
Taye, Alem
Porecha, Rishi
Barnes, Bret
Peterson, Brett
Abstract
Provided by the inventive concept is a DNA methylation-based platform, and machine learning algorithms, for diagnosing respiratory pathogens including SARS-CoV-2 and predicting COVID-19 related outcomes, and methods of using the same, such as in identifying the presence of a viral infection, such as a SARS-CoV-2 infection, determining whether a subject has COVID-19, and/or whether a subject with COVID-19 is likely to develop acute respiratory distress syndrome or multisystem inflammatory syndrome in children.
A61K 39/215 - Coronaviridae, e.g. avian infectious bronchitis virus
C12Q 1/689 - Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for bacteria
C12Q 1/6895 - Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for plants, fungi or algae
C12Q 1/70 - Measuring or testing processes involving enzymes, nucleic acids or microorganismsCompositions thereforProcesses of preparing such compositions involving virus or bacteriophage
60.
METHODS FOR DIAGNOSING RESPIRATORY PATHOGENS AND PREDICTING COVID-19 RELATED OUTCOMES
THE REGENTS OF THE UNIVERSITY OF COLORADO, A BODY CORPORATE (USA)
ILLUMINA SOFTWARE, INC. (USA)
Inventor
Barnes, Kathleen
Yang, Ivana
Gignoux, Christopher
Mathias, Rasika
Norman, Paul
Taye, Alem
Porecha, Rishi
Barnes, Bret
Peterson, Brett
Abstract
Provided by the inventive concept is a DNA methylation-based platform, and machine learning algorithms, for diagnosing respiratory pathogens including SARS-CoV-2 and predicting COVID-19 related outcomes, and methods of using the same, such as in identifying the presence of a viral infection, such as a SARS-CoV-2 infection, determining whether a subject has COVID-19, and/or whether a subject with COVID-19 is likely to develop acute respiratory distress syndrome or multisystem inflammatory syndrome in children.
C12Q 1/70 - Measuring or testing processes involving enzymes, nucleic acids or microorganismsCompositions thereforProcesses of preparing such compositions involving virus or bacteriophage
C12Q 1/689 - Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for bacteria
C12Q 1/6895 - Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for plants, fungi or algae
A61K 39/215 - Coronaviridae, e.g. avian infectious bronchitis virus