1
|
Kurdiumov S, Papasimakis N, Ou JY, Zheludev NI. Far-field optical classification of subwavelength objects. OPTICS EXPRESS 2025; 33:15380-15389. [PMID: 40219450 DOI: 10.1364/oe.558631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2025] [Accepted: 03/14/2025] [Indexed: 04/14/2025]
Abstract
Object detection requires localizing and classifying the size and shape of an unknown object. Here we show that artificial-intelligence-enabled analysis of light scattered on objects that are not resolvable by conventional microscopy can be used for their shape classification. In a proof-of-principle experiment, we demonstrate classification with ∼90% accuracy for objects of unknown subwavelength dimensions in the range from λ/6 to λ/2 (where λ is the illumination wavelength) belonging to one of five shape classes. The method can be scaled to applications across the entire electromagnetic spectrum and used in a variety of tasks, such as the detection and study of biological particles, environmental sensing, and device diagnostics.
Collapse
|
2
|
Zárate A, Díaz-González L, Taboada B. VirDetect-AI: a residual and convolutional neural network-based metagenomic tool for eukaryotic viral protein identification. Brief Bioinform 2024; 26:bbaf001. [PMID: 39808116 PMCID: PMC11729733 DOI: 10.1093/bib/bbaf001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 11/12/2024] [Accepted: 08/01/2025] [Indexed: 01/16/2025] Open
Abstract
This study addresses the challenging task of identifying viruses within metagenomic data, which encompasses a broad array of biological samples, including animal reservoirs, environmental sources, and the human body. Traditional methods for virus identification often face limitations due to the diversity and rapid evolution of viral genomes. In response, recent efforts have focused on leveraging artificial intelligence (AI) techniques to enhance accuracy and efficiency in virus detection. However, existing AI-based approaches are primarily binary classifiers, lacking specificity in identifying viral types and reliant on nucleotide sequences. To address these limitations, VirDetect-AI, a novel tool specifically designed for the identification of eukaryotic viruses within metagenomic datasets, is introduced. The VirDetect-AI model employs a combination of convolutional neural networks and residual neural networks to effectively extract hierarchical features and detailed patterns from complex amino acid genomic data. The results demonstrated that the model has outstanding results in all metrics, with a sensitivity of 0.97, a precision of 0.98, and an F1-score of 0.98. VirDetect-AI improves our comprehension of viral ecology and can accurately classify metagenomic sequences into 980 viral protein classes, hence enabling the identification of new viruses. These classes encompass an extensive array of viral genera and families, as well as protein functions and hosts.
Collapse
Affiliation(s)
- Alida Zárate
- Doctorado en Ciencias, Instituto de Investigación en Ciencias Básicas Aplicadas (IICBA), Universidad Autónoma del Estado de Morelos, Cuernavaca, Morelos 62210, México
| | - Lorena Díaz-González
- Centro de Investigación en Ciencias, Universidad Autónoma del Estado de Morelos, Cuernavaca, Morelos 62210, México
| | - Blanca Taboada
- Departamento de Genética del Desarrollo y Fisiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos 62210, México
| |
Collapse
|
3
|
Rahimian M, Panahi B. Metagenome sequence data mining for viral interaction studies: Review on progress and prospects. Virus Res 2024; 349:199450. [PMID: 39151562 PMCID: PMC11388672 DOI: 10.1016/j.virusres.2024.199450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2024] [Revised: 08/11/2024] [Accepted: 08/13/2024] [Indexed: 08/19/2024]
Abstract
Metagenomics has been greatly accelerated by the development of next-generation sequencing (NGS) technologies, which allow scientists to discover and describe novel microorganisms without the need for conventional culture techniques. Examining integrative bioinformatics methods used in viral interaction research, this study highlights metagenomic data from various contexts. Accurate viral identification depends on high-purity genetic material extraction, appropriate NGS platform selection, and sophisticated bioinformatics tools like VirPipe and VirFinder. The efficiency and precision of metagenomic analysis are further improved with the advent of AI-based techniques. The diversity and dynamics of viral communities are demonstrated by case studies from a variety of environments, emphasizing the seasonal and geographical variations that influence viral populations. In addition to speeding up the discovery of new viruses, metagenomics offers thorough understanding of virus-host interactions and their ecological effects. This review provides a promising framework for comprehending the complexity of viral communities and their interactions with hosts, highlighting the transformational potential of metagenomics and bioinformatics in viral research.
Collapse
Affiliation(s)
- Mohammadreza Rahimian
- Department of Biology, Faculty of Basic Sciences, University of Maragheh, Maragheh, Iran
| | - Bahman Panahi
- Department of Genomics, Branch for Northwest & West Region, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research, Education and Extension Organization (AREEO), Tabriz, Iran.
| |
Collapse
|
4
|
Elste J, Saini A, Mejia-Alvarez R, Mejía A, Millán-Pacheco C, Swanson-Mungerson M, Tiwari V. Significance of Artificial Intelligence in the Study of Virus-Host Cell Interactions. Biomolecules 2024; 14:911. [PMID: 39199298 PMCID: PMC11352483 DOI: 10.3390/biom14080911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Revised: 07/11/2024] [Accepted: 07/23/2024] [Indexed: 09/01/2024] Open
Abstract
A highly critical event in a virus's life cycle is successfully entering a given host. This process begins when a viral glycoprotein interacts with a target cell receptor, which provides the molecular basis for target virus-host cell interactions for novel drug discovery. Over the years, extensive research has been carried out in the field of virus-host cell interaction, generating a massive number of genetic and molecular data sources. These datasets are an asset for predicting virus-host interactions at the molecular level using machine learning (ML), a subset of artificial intelligence (AI). In this direction, ML tools are now being applied to recognize patterns in these massive datasets to predict critical interactions between virus and host cells at the protein-protein and protein-sugar levels, as well as to perform transcriptional and translational analysis. On the other end, deep learning (DL) algorithms-a subfield of ML-can extract high-level features from very large datasets to recognize the hidden patterns within genomic sequences and images to develop models for rapid drug discovery predictions that address pathogenic viruses displaying heightened affinity for receptor docking and enhanced cell entry. ML and DL are pivotal forces, driving innovation with their ability to perform analysis of enormous datasets in a highly efficient, cost-effective, accurate, and high-throughput manner. This review focuses on the complexity of virus-host cell interactions at the molecular level in light of the current advances of ML and AI in viral pathogenesis to improve new treatments and prevention strategies.
Collapse
Affiliation(s)
- James Elste
- Department of Microbiology & Immunology, College of Graduate Studies, Midwestern University, Downers Grove, IL 60515, USA; (J.E.); (M.S.-M.)
| | - Akash Saini
- Hinsdale Central High School, 5500 S Grant St, Hinsdale, IL 60521, USA;
| | - Rafael Mejia-Alvarez
- Department of Physiology, College of Graduate Studies, Midwestern University, Downers Grove, IL 60515, USA;
| | - Armando Mejía
- Departamento de Biotechnology, Universidad Autónoma Metropolitana-Iztapalapa, Ciudad de Mexico 09340, Mexico;
| | - Cesar Millán-Pacheco
- Facultad de Farmacia, Universidad Autónoma del Estado de Morelos, Av. Universidad No. 1001, Col Chamilpa, Cuernavaca 62209, Mexico;
| | - Michelle Swanson-Mungerson
- Department of Microbiology & Immunology, College of Graduate Studies, Midwestern University, Downers Grove, IL 60515, USA; (J.E.); (M.S.-M.)
| | - Vaibhav Tiwari
- Department of Microbiology & Immunology, College of Graduate Studies, Midwestern University, Downers Grove, IL 60515, USA; (J.E.); (M.S.-M.)
| |
Collapse
|
5
|
Miranda FM, Azevedo VC, Ramos RJ, Renard BY, Piro VC. Hitac: a hierarchical taxonomic classifier for fungal ITS sequences compatible with QIIME2. BMC Bioinformatics 2024; 25:228. [PMID: 38956506 PMCID: PMC11220968 DOI: 10.1186/s12859-024-05839-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 06/11/2024] [Indexed: 07/04/2024] Open
Abstract
BACKGROUND Fungi play a key role in several important ecological functions, ranging from organic matter decomposition to symbiotic associations with plants. Moreover, fungi naturally inhabit the human body and can be beneficial when administered as probiotics. In mycology, the internal transcribed spacer (ITS) region was adopted as the universal marker for classifying fungi. Hence, an accurate and robust method for ITS classification is not only desired for the purpose of better diversity estimation, but it can also help us gain a deeper insight into the dynamics of environmental communities and ultimately comprehend whether the abundance of certain species correlate with health and disease. Although many methods have been proposed for taxonomic classification, to the best of our knowledge, none of them fully explore the taxonomic tree hierarchy when building their models. This in turn, leads to lower generalization power and higher risk of committing classification errors. RESULTS Here we introduce HiTaC, a robust hierarchical machine learning model for accurate ITS classification, which requires a small amount of data for training and can handle imbalanced datasets. HiTaC was thoroughly evaluated with the established TAXXI benchmark and could correctly classify fungal ITS sequences of varying lengths and a range of identity differences between the training and test data. HiTaC outperforms state-of-the-art methods when trained over noisy data, consistently achieving higher F1-score and sensitivity across different taxonomic ranks, improving sensitivity by 6.9 percentage points over top methods in the most noisy dataset available on TAXXI. CONCLUSIONS HiTaC is publicly available at the Python package index, BIOCONDA and Docker Hub. It is released under the new BSD license, allowing free use in academia and industry. Source code and documentation, which includes installation and usage instructions, are available at https://gitlab.com/dacs-hpi/hitac .
Collapse
Affiliation(s)
- Fábio M Miranda
- Data Analytics and Computational Statistics, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Vasco C Azevedo
- Institute of Biological Sciences, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Rommel J Ramos
- Institute of Biological Sciences, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
- Institute of Biological Sciences, Federal University of Pará, Belém, Brazil
- Centro de Computação de Alto Desempenho, Universidade Federal do Pará, Belém, Brazil
| | - Bernhard Y Renard
- Data Analytics and Computational Statistics, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | - Vitor C Piro
- Data Analytics and Computational Statistics, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany.
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany.
| |
Collapse
|
6
|
Gauthier NPG, Chorlton SD, Krajden M, Manges AR. Agnostic Sequencing for Detection of Viral Pathogens. Clin Microbiol Rev 2023; 36:e0011922. [PMID: 36847515 PMCID: PMC10035330 DOI: 10.1128/cmr.00119-22] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2023] Open
Abstract
The advent of next-generation sequencing (NGS) technologies has expanded our ability to detect and analyze microbial genomes and has yielded novel molecular approaches for infectious disease diagnostics. While several targeted multiplex PCR and NGS-based assays have been widely used in public health settings in recent years, these targeted approaches are limited in that they still rely on a priori knowledge of a pathogen's genome, and an untargeted or unknown pathogen will not be detected. Recent public health crises have emphasized the need to prepare for a wide and rapid deployment of an agnostic diagnostic assay at the start of an outbreak to ensure an effective response to emerging viral pathogens. Metagenomic techniques can nonspecifically sequence all detectable nucleic acids in a sample and therefore do not rely on prior knowledge of a pathogen's genome. While this technology has been reviewed for bacterial diagnostics and adopted in research settings for the detection and characterization of viruses, viral metagenomics has yet to be widely deployed as a diagnostic tool in clinical laboratories. In this review, we highlight recent improvements to the performance of metagenomic viral sequencing, the current applications of metagenomic sequencing in clinical laboratories, as well as the challenges that impede the widespread adoption of this technology.
Collapse
Affiliation(s)
- Nick P. G. Gauthier
- Department of Microbiology and Immunology, University of British Columbia, Vancouver, British Columbia, Canada
| | | | - Mel Krajden
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia, Canada
- British Columbia Centre for Disease Control, Vancouver, British Columbia, Canada
| | - Amee R. Manges
- British Columbia Centre for Disease Control, Vancouver, British Columbia, Canada
- School of Population and Public Health, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
7
|
Viral Metagenomic Analysis of the Fecal Samples in Domestic Dogs (Canis lupus familiaris). Viruses 2023; 15:v15030685. [PMID: 36992396 PMCID: PMC10058366 DOI: 10.3390/v15030685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 02/24/2023] [Accepted: 03/02/2023] [Indexed: 03/08/2023] Open
Abstract
Canine diarrhea is a common intestinal illness that is usually caused by viruses, bacteria, and parasites, and canine diarrhea may induce morbidity and mortality of domestic dogs if treated improperly. Recently, viral metagenomics was applied to investigate the signatures of the enteric virome in mammals. In this research, the characteristics of the gut virome in healthy dogs and dogs with diarrhea were analyzed and compared using viral metagenomics. The alpha diversity analysis indicated that the richness and diversity of the gut virome in the dogs with diarrhea were much higher than the healthy dogs, while the beta diversity analysis revealed that the gut virome of the two groups was quite different. At the family level, the predominant viruses in the canine gut virome were certified to be Microviridae, Parvoviridae, Siphoviridae, Inoviridae, Podoviridae, Myoviridae, and others. At the genus level, the predominant viruses in the canine gut virome were certified to be Protoparvovirus, Inovirus, Chlamydiamicrovirus, Lambdavirus, Dependoparvovirus, Lightbulbvirus, Kostyavirus, Punavirus, Lederbergvirus, Fibrovirus, Peduovirus, and others. However, the viral communities between the two groups differed significantly. The unique viral taxa identified in the healthy dogs group were Chlamydiamicrovirus and Lightbulbvirus, while the unique viral taxa identified in the dogs with diarrhea group were Inovirus, Protoparvovirus, Lambdavirus, Dependoparvovirus, Kostyavirus, Punavirus, and other viruses. Phylogenetic analysis based on the near-complete genome sequences showed that the CPV strains collected in this study together with other CPV Chinese isolates clustered into a separate branch, while the identified CAV-2 strain D5-8081 and AAV-5 strain AAV-D5 were both the first near-complete genome sequences in China. Moreover, the predicted bacterial hosts of phages were certified to be Campylobacter, Escherichia, Salmonella, Pseudomonas, Acinetobacter, Moraxella, Mediterraneibacter, and other commensal microbiota. In conclusion, the enteric virome of the healthy dogs group and the dogs with diarrhea group was investigated and compared using viral metagenomics, and the viral communities might influence canine health and disease by interacting with the commensal gut microbiome.
Collapse
|
8
|
Hallee L, Khomtchouk BB. Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life. Sci Rep 2023; 13:2088. [PMID: 36747072 PMCID: PMC9902438 DOI: 10.1038/s41598-023-28965-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 01/27/2023] [Indexed: 02/08/2023] Open
Abstract
In this study, we investigate how an organism's codon usage bias can serve as a predictor and classifier of various genomic and evolutionary traits across the domains of life. We perform secondary analysis of existing genetic datasets to build several AI/machine learning models. When trained on codon usage patterns of nearly 13,000 organisms, our models accurately predict the organelle of origin and taxonomic identity of nucleotide samples. We extend our analysis to identify the most influential codons for phylogenetic prediction with a custom feature ranking ensemble. Our results suggest that the genetic code can be utilized to train accurate classifiers of taxonomic and phylogenetic features. We then apply this classification framework to open reading frame (ORF) detection. Our statistical model assesses all possible ORFs in a nucleotide sample and rejects or deems them plausible based on the codon usage distribution. Our dataset and analyses are made publicly available on GitHub and the UCI ML Repository to facilitate open-source reproducibility and community engagement.
Collapse
Affiliation(s)
- Logan Hallee
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, 19713, USA
| | - Bohdan B Khomtchouk
- Department of BioHealth Informatics, Center for Computational Biology and Bioinformatics, Indiana University, Indianapolis, IN, 46202, USA.
| |
Collapse
|
9
|
Coutinho MG, Câmara GB, Barbosa RDM, Fernandes MA. SARS-CoV-2 virus classification based on stacked sparse autoencoder. Comput Struct Biotechnol J 2022; 21:284-298. [PMID: 36530948 PMCID: PMC9742810 DOI: 10.1016/j.csbj.2022.12.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 12/04/2022] [Accepted: 12/05/2022] [Indexed: 12/13/2022] Open
Abstract
Since December 2019, the world has been intensely affected by the COVID-19 pandemic, caused by the SARS-CoV-2. In the case of a novel virus identification, the early elucidation of taxonomic classification and origin of the virus genomic sequence is essential for strategic planning, containment, and treatments. Deep learning techniques have been successfully used in many viral classification problems associated with viral infection diagnosis, metagenomics, phylogenetics, and analysis. Considering that motivation, the authors proposed an efficient viral genome classifier for the SARS-CoV-2 using the deep neural network based on the stacked sparse autoencoder (SSAE). For the best performance of the model, we explored the utilization of image representations of the complete genome sequences as the SSAE input to provide a classification of the SARS-CoV-2. For that, a dataset based on k-mers image representation was applied. We performed four experiments to provide different levels of taxonomic classification of the SARS-CoV-2. The SSAE technique provided great performance results in all experiments, achieving classification accuracy between 92% and 100% for the validation set and between 98.9% and 100% when the SARS-CoV-2 samples were applied for the test set. In this work, samples of the SARS-CoV-2 were not used during the training process, only during subsequent tests, in which the model was able to infer the correct classification of the samples in the vast majority of cases. This indicates that our model can be adapted to classify other emerging viruses. Finally, the results indicated the applicability of this deep learning technique in genome classification problems.
Collapse
Affiliation(s)
- Maria G.F. Coutinho
- Laboratory of Machine Learning and Intelligent Instrumentation, IMD/nPITI, Federal University of Rio Grande do Norte, Natal, Brazil
| | - Gabriel B.M. Câmara
- Laboratory of Machine Learning and Intelligent Instrumentation, IMD/nPITI, Federal University of Rio Grande do Norte, Natal, Brazil
| | - Raquel de M. Barbosa
- Department of Pharmacy and Pharmaceutical Technology, University of Granada, 18071 Granada, Spain
| | - Marcelo A.C. Fernandes
- Laboratory of Machine Learning and Intelligent Instrumentation, IMD/nPITI, Federal University of Rio Grande do Norte, Natal, Brazil
- Department of Computer and Automation Engineering, Federal University of Rio Grande do Norte, Natal, Brazil
| |
Collapse
|
10
|
Sinwar D, Dhaka VS, Tesfaye BA, Raghuwanshi G, Kumar A, Maakar SK, Agrawal S. Artificial Intelligence and Deep Learning Assisted Rapid Diagnosis of COVID-19 from Chest Radiographical Images: A Survey. CONTRAST MEDIA & MOLECULAR IMAGING 2022; 2022:1306664. [PMID: 36304775 PMCID: PMC9581633 DOI: 10.1155/2022/1306664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 09/06/2022] [Accepted: 09/27/2022] [Indexed: 01/26/2023]
Abstract
Artificial Intelligence (AI) has been applied successfully in many real-life domains for solving complex problems. With the invention of Machine Learning (ML) paradigms, it becomes convenient for researchers to predict the outcome based on past data. Nowadays, ML is acting as the biggest weapon against the COVID-19 pandemic by detecting symptomatic cases at an early stage and warning people about its futuristic effects. It is observed that COVID-19 has blown out globally so much in a short period because of the shortage of testing facilities and delays in test reports. To address this challenge, AI can be effectively applied to produce fast as well as cost-effective solutions. Plenty of researchers come up with AI-based solutions for preliminary diagnosis using chest CT Images, respiratory sound analysis, voice analysis of symptomatic persons with asymptomatic ones, and so forth. Some AI-based applications claim good accuracy in predicting the chances of being COVID-19-positive. Within a short period, plenty of research work is published regarding the identification of COVID-19. This paper has carefully examined and presented a comprehensive survey of more than 110 papers that came from various reputed sources, that is, Springer, IEEE, Elsevier, MDPI, arXiv, and medRxiv. Most of the papers selected for this survey presented candid work to detect and classify COVID-19, using deep-learning-based models from chest X-Rays and CT scan images. We hope that this survey covers most of the work and provides insights to the research community in proposing efficient as well as accurate solutions for fighting the pandemic.
Collapse
Affiliation(s)
- Deepak Sinwar
- Department of Computer and Communication Engineering, Manipal University Jaipur, Jaipur, India
| | - Vijaypal Singh Dhaka
- Department of Computer and Communication Engineering, Manipal University Jaipur, Jaipur, India
| | - Biniyam Alemu Tesfaye
- Department of Computer Science, College of Informatics, Bule Hora University, Bule Hora, Ethiopia
| | - Ghanshyam Raghuwanshi
- Department of Computer and Communication Engineering, Manipal University Jaipur, Jaipur, India
| | - Ashish Kumar
- Department of Mathematics and Statistics, Manipal University Jaipur, Jaipur, India
| | - Sunil Kr. Maakar
- School of Computing Science & Engineering, Galgotias University, Greater Noida, India
| | - Sanjay Agrawal
- Department of Electrical Engineering, Rajkiya Engineering College, Akbarpur, Ambedkar Nagar, India
| |
Collapse
|
11
|
Wani AK, Roy P, Kumar V, Mir TUG. Metagenomics and artificial intelligence in the context of human health. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2022; 100:105267. [PMID: 35278679 DOI: 10.1016/j.meegid.2022.105267] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 03/03/2022] [Accepted: 03/04/2022] [Indexed: 12/12/2022]
Abstract
Human microbiome is ubiquitous, dynamic, and site-specific consortia of microbial communities. The pathogenic nature of microorganisms within human tissues has led to an increase in microbial studies. Characterization of genera, like Streptococcus, Cutibacterium, Staphylococcus, Bifidobacterium, Lactococcus and Lactobacillus through culture-dependent and culture-independent techniques has been reported. However, due to the unique environment within human tissues, it is difficult to culture these microorganisms making their molecular studies strenuous. MGs offer a gateway to explore and characterize hidden microbial communities through a culture-independent mode by direct DNA isolation. By function and sequence-based MGs, Scientists can explore the mechanistic details of numerous microbes and their interaction with the niche. Since the data generated from MGs studies is highly complex and multi-dimensional, it requires accurate analytical tools to evaluate and interpret the data. Artificial intelligence (AI) provides the luxury to automatically learn the data dimensionality and ease its complexity that makes the disease diagnosis and disease response easy, accurate and timely. This review provides insight into the human microbiota and its exploration and expansion through MG studies. The review elucidates the significance of MGs in studying the changing microbiota during disease conditions besides highlighting the role of AI in computational analysis of MG data.
Collapse
Affiliation(s)
- Atif Khurshid Wani
- Department of Biotechnology, School of Bioengineering and Biosciences, Lovely Professional University, Punjab 144411, India
| | - Priyanka Roy
- Department of Basic and Applied Sciences, National Institute of Food Technology Entrepreneurship and Management, Sonipat 131 028, Haryana, India
| | - Vijay Kumar
- Department of Basic and Applied Sciences, National Institute of Food Technology Entrepreneurship and Management, Sonipat 131 028, Haryana, India.
| | - Tahir Ul Gani Mir
- Department of Biotechnology, School of Bioengineering and Biosciences, Lovely Professional University, Punjab 144411, India
| |
Collapse
|
12
|
Vanderbilt CM, Bowman AS, Middha S, Petrova-Drus K, Tang YW, Chen X, Wang Y, Chang J, Rekhtman N, Busam KJ, Gupta S, Hameed M, Arcila ME, Ladanyi M, Berger MF, Dogan S, Zehir A. Defining Novel DNA Virus-Tumor Associations and Genomic Correlates Using Prospective Clinical Tumor/Normal Matched Sequencing Data. J Mol Diagn 2022; 24:515-528. [PMID: 35331965 PMCID: PMC9127461 DOI: 10.1016/j.jmoldx.2022.01.011] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 12/27/2021] [Accepted: 01/31/2022] [Indexed: 12/11/2022] Open
Abstract
This study is the largest analysis of DNA viruses in solid tumors with associated genomics. To achieve this, a novel method for discovery of DNA viruses from matched tumor/normal next-generation sequencing samples was developed and validated. This method performed comparably to reference methods for the detection of high-risk (HR) human papilloma virus (HPV) (area under the receiver operating characteristic curve = 0.953). After virus identification in 48,148 consecutives samples from 42,846 unique patients, novel virus tumor associations were established by segregating tumor types to determine whether each DNA virus was enriched in each of the tumor types compared with the remaining cohort. All firmly established solid tumor-virus associations (eg, HR HPV in cervical cancer) were confirmed, and the novel associations discovered included: human herpes virus 6 in neuroblastoma, human herpes virus 7 in esophagogastric cancer, and HPV42 in digital papillary adenocarcinoma. These associations were confirmed in an independent validation cohort. HR HPV- and Epstein-Barr virus-associated tumors showed newly discovered genomic associations, including a lower tumor mutation burden. The study demonstrated the ability to study the role of DNA viruses in human cancer from clinical genomics data and established the largest cohort that can be utilized as a validation set for future discovery efforts.
Collapse
Affiliation(s)
- Chad M Vanderbilt
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York.
| | - Anita S Bowman
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Sumit Middha
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Kseniya Petrova-Drus
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Yi-Wei Tang
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Xin Chen
- Atila Biosystems Inc., Mountain View, California
| | | | - Jason Chang
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Natasha Rekhtman
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Klaus J Busam
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Sounak Gupta
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York; Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota
| | - Meera Hameed
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Maria E Arcila
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Marc Ladanyi
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Michael F Berger
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Snjezana Dogan
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Ahmet Zehir
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York
| |
Collapse
|
13
|
Sokhansanj BA, Rosen GL. Mapping Data to Deep Understanding: Making the Most of the Deluge of SARS-CoV-2 Genome Sequences. mSystems 2022; 7:e0003522. [PMID: 35311562 PMCID: PMC9040592 DOI: 10.1128/msystems.00035-22] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/27/2022] [Indexed: 12/22/2022] Open
Abstract
Next-generation sequencing has been essential to the global response to the COVID-19 pandemic. As of January 2022, nearly 7 million severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequences are available to researchers in public databases. Sequence databases are an abundant resource from which to extract biologically relevant and clinically actionable information. As the pandemic has gone on, SARS-CoV-2 has rapidly evolved, involving complex genomic changes that challenge current approaches to classifying SARS-CoV-2 variants. Deep sequence learning could be a potentially powerful way to build complex sequence-to-phenotype models. Unfortunately, while they can be predictive, deep learning typically produces "black box" models that cannot directly provide biological and clinical insight. Researchers should therefore consider implementing emerging methods for visualizing and interpreting deep sequence models. Finally, researchers should address important data limitations, including (i) global sequencing disparities, (ii) insufficient sequence metadata, and (iii) screening artifacts due to poor sequence quality control.
Collapse
Affiliation(s)
- Bahrad A. Sokhansanj
- Drexel University, Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Department of Electrical & Computer Engineering, College of Engineering, Philadelphia, Pennsylvania, USA
| | - Gail L. Rosen
- Drexel University, Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Department of Electrical & Computer Engineering, College of Engineering, Philadelphia, Pennsylvania, USA
| |
Collapse
|
14
|
Rawson TM, Peiffer-Smadja N, Holmes A. Artificial Intelligence in Infectious Diseases. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
15
|
Rani G, Oza MG, Dhaka VS, Pradhan N, Verma S, Rodrigues JJPC. Applying deep learning-based multi-modal for detection of coronavirus. MULTIMEDIA SYSTEMS 2022; 28:1251-1262. [PMID: 34305327 PMCID: PMC8294320 DOI: 10.1007/s00530-021-00824-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 06/20/2021] [Indexed: 05/11/2023]
Abstract
Amidst the global pandemic and catastrophe created by 'COVID-19', every research institution and scientist are doing their best efforts to invent or find the vaccine or medicine for the disease. The objective of this research is to design and develop a deep learning-based multi-modal for the screening of COVID-19 using chest radiographs and genomic sequences. The modal is also effective in finding the degree of genomic similarity among the Severe Acute Respiratory Syndrome-Coronavirus 2 and other prevalent viruses such as Severe Acute Respiratory Syndrome-Coronavirus, Middle East Respiratory Syndrome-Coronavirus, Human Immunodeficiency Virus, and Human T-cell Leukaemia Virus. The experimental results on the datasets available at National Centre for Biotechnology Information, GitHub, and Kaggle repositories show that it is successful in detecting the genome of 'SARS-CoV-2' in the host genome with an accuracy of 99.27% and screening of chest radiographs into COVID-19, non-COVID pneumonia and healthy with a sensitivity of 95.47%. Thus, it may prove a useful tool for doctors to quickly classify the infected and non-infected genomes. It can also be useful in finding the most effective drug from the available drugs for the treatment of 'COVID-19'.
Collapse
Affiliation(s)
- Geeta Rani
- Department of Computer and Communication Engineering, Manipal University Jaipur, Jaipur, Rajasthan India
| | - Meet Ganpatlal Oza
- Department of Computer and Communication Engineering, Manipal University Jaipur, Jaipur, Rajasthan India
| | - Vijaypal Singh Dhaka
- Department of Computer and Communication Engineering, Manipal University Jaipur, Jaipur, Rajasthan India
| | - Nitesh Pradhan
- Department of Computer Science and Engineering, Manipal University Jaipur, Jaipur, Rajasthan India
| | - Sahil Verma
- Department of Computer Science and Engineering, Chandigarh University, Mohali, 140413 India
| | - Joel J. P. C. Rodrigues
- Federal University of Piauí (UFPI) Teresina, Teresina, PI Brazil
- Instituto de Telecomunicações, Aveiro, Portugal
| |
Collapse
|
16
|
Dasari CM, Bhukya R. Explainable deep neural networks for novel viral genome prediction. APPL INTELL 2021; 52:3002-3017. [PMID: 34764607 PMCID: PMC8232563 DOI: 10.1007/s10489-021-02572-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/26/2021] [Indexed: 11/27/2022]
Abstract
Viral infection causes a wide variety of human diseases including cancer and COVID-19. Viruses invade host cells and associate with host molecules, potentially disrupting the normal function of hosts that leads to fatal diseases. Novel viral genome prediction is crucial for understanding the complex viral diseases like AIDS and Ebola. While most existing computational techniques classify viral genomes, the efficiency of the classification depends solely on the structural features extracted. The state-of-the-art DNN models achieved excellent performance by automatic extraction of classification features, but the degree of model explainability is relatively poor. During model training for viral prediction, proposed CNN, CNN-LSTM based methods (EdeepVPP, EdeepVPP-hybrid) automatically extracts features. EdeepVPP also performs model interpretability in order to extract the most important patterns that cause viral genomes through learned filters. It is an interpretable CNN model that extracts vital biologically relevant patterns (features) from feature maps of viral sequences. The EdeepVPP-hybrid predictor outperforms all the existing methods by achieving 0.992 mean AUC-ROC and 0.990 AUC-PR on 19 human metagenomic contig experiment datasets using 10-fold cross-validation. We evaluate the ability of CNN filters to detect patterns across high average activation values. To further asses the robustness of EdeepVPP model, we perform leave-one-experiment-out cross-validation. It can work as a recommendation system to further analyze the raw sequences labeled as ‘unknown’ by alignment-based methods. We show that our interpretable model can extract patterns that are considered to be the most important features for predicting virus sequences through learned filters.
Collapse
Affiliation(s)
| | - Raju Bhukya
- National Institute of Technology, Warangal, Telangana 506004 India
| |
Collapse
|
17
|
Prusty MR, Tripathi V, Dubey A. A novel data augmentation approach for mask detection using deep transfer learning. ACTA ACUST UNITED AC 2021; 5:100037. [PMID: 34179856 PMCID: PMC8216842 DOI: 10.1016/j.ibmed.2021.100037] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Revised: 04/08/2021] [Accepted: 06/16/2021] [Indexed: 11/11/2022]
Abstract
At the onset of 2020, the world saw the rise and spread of a global pandemic named COVID-19 which caused numerous deaths and affected millions of people around the world. Due to its highly contagious nature, this disease spread across the world within a short span of time. It forced almost all the nations to implement strict social distancing rules along with use of face masks to reduce the risk of getting infected. While the virus is still on loose, markets and business firms have reopened to keep the economy alive. This calls for modification of existing technological models to cater for the safety of individuals and stop the spread of virus in public places. One such stringent implementation to achieve this safety would be deployment of a mask detection model. The proposed mask detection models can serve as a vital utility in the coming years for ensuring proper enforcement of safety protocols. This research paper explores the use of state of the art YOLOv3 model, a deep transfer learning object detection technique, to develop a mask detection model. Along with the implementation of a standard approach of any object detection algorithm, this paper has proposed the use of a data augmentation approach for mask detection. The proposed model focuses on generating an augmented dataset from the standard dataset with the help of data augmentation done by using image filtering techniques such as grayscale and Gaussian blur. This augmented dataset is used for training the object detection model for mask detection. The mean average precision for the Data augmentation based mask detection model is observed to be 99.8% while training. Finally, a comparison on the model performance is evaluated for the standard and proposed augmented data approach. The experiment conducted showed that the average confidence level for Standard mask detection model was 0.94, 0.93, 0.91 for images of individuals (type A), images with groups of people (type B) and video with the group of people (type C) respectively. The average confidence levels for the Data augmentation based mask detection model for types A, B and C are 0.97, 0.96 0.93 respectively. This paper therefore concludes that the proposed Data augmentation based mask detection model performs better than the Standard mask detection model.
Collapse
Affiliation(s)
- Manas Ranjan Prusty
- Centre for Cyber Physical Systems, Vellore Institute of Technology, Chennai, 600127, India.,School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, 600127, India
| | - Vaibhav Tripathi
- School of Electronics Engineering, Vellore Institute of Technology, Chennai, 600127, India
| | - Anmol Dubey
- School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, 600127, India
| |
Collapse
|
18
|
Lin Y, Wang G, Yu J, Sung JJY. Artificial intelligence and metagenomics in intestinal diseases. J Gastroenterol Hepatol 2021; 36:841-847. [PMID: 33880764 DOI: 10.1111/jgh.15501] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 02/24/2021] [Accepted: 03/18/2021] [Indexed: 12/12/2022]
Abstract
Gut microbiota has been shown to associate with the development of gastrointestinal diseases. In the last decade, development in whole metagenome sequencing and 16S rRNA sequencing technology has dramatically accelerated the gut microbiome's research and revealed its association with gastrointestinal disorders. Because of high dimensionality and complexity's intrinsic data characteristics, traditional bioinformatical methods could only explain the most significant changes with limited prediction accuracy. In contrast, machine learning is the application of artificial intelligence that provides the computational systems to automatically learn and improve from experience (training cohort) without being explicitly programmed. It is thus capable of unwiring high dimensionality and complicated correlational hitches. With modern computation power, machine learning is widely utilized to analyze microorganisms related to disease onset and other clinical features. It could help explore and identify novel biomarkers or improve the accuracy rate of disease diagnostic. This review summarized the most recent research that utilized machine learning to reveal the role of gut microbiota in intestinal disorders.
Collapse
Affiliation(s)
- Yufeng Lin
- Institute of Digestive Disease and Department of Medicine and Therapeutics, State Key Laboratory of Digestive Disease, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong
| | - Guoping Wang
- Institute of Digestive Disease and Department of Medicine and Therapeutics, State Key Laboratory of Digestive Disease, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong
| | - Jun Yu
- Institute of Digestive Disease and Department of Medicine and Therapeutics, State Key Laboratory of Digestive Disease, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong
| | - Joseph J Y Sung
- Institute of Digestive Disease and Department of Medicine and Therapeutics, State Key Laboratory of Digestive Disease, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong
| |
Collapse
|
19
|
Acera Mateos P, Balboa RF, Easteal S, Eyras E, Patel HR. PACIFIC: a lightweight deep-learning classifier of SARS-CoV-2 and co-infecting RNA viruses. Sci Rep 2021; 11:3209. [PMID: 33547380 PMCID: PMC7864945 DOI: 10.1038/s41598-021-82043-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 01/12/2021] [Indexed: 01/30/2023] Open
Abstract
Viral co-infections occur in COVID-19 patients, potentially impacting disease progression and severity. However, there is currently no dedicated method to identify viral co-infections in patient RNA-seq data. We developed PACIFIC, a deep-learning algorithm that accurately detects SARS-CoV-2 and other common RNA respiratory viruses from RNA-seq data. Using in silico data, PACIFIC recovers the presence and relative concentrations of viruses with > 99% precision and recall. PACIFIC accurately detects SARS-CoV-2 and other viral infections in 63 independent in vitro cell culture and patient datasets. PACIFIC is an end-to-end tool that enables the systematic monitoring of viral infections in the current global pandemic.
Collapse
Affiliation(s)
- Pablo Acera Mateos
- John Curtin School of Medical Research, Australian National University, Canberra, ACT 2600 Australia
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, ACT 2600 Australia
| | - Renzo F. Balboa
- John Curtin School of Medical Research, Australian National University, Canberra, ACT 2600 Australia
- National Centre for Indigenous Genomics, Australian National University, Canberra, ACT 2600 Australia
| | - Simon Easteal
- John Curtin School of Medical Research, Australian National University, Canberra, ACT 2600 Australia
- National Centre for Indigenous Genomics, Australian National University, Canberra, ACT 2600 Australia
| | - Eduardo Eyras
- John Curtin School of Medical Research, Australian National University, Canberra, ACT 2600 Australia
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, ACT 2600 Australia
- IMIM - Hospital del Mar Medical Research Institute, 08003 Barcelona, Spain
- Catalan Institution for Research and Advanced Studies, 08010 Barcelona, Spain
| | - Hardip R. Patel
- John Curtin School of Medical Research, Australian National University, Canberra, ACT 2600 Australia
- National Centre for Indigenous Genomics, Australian National University, Canberra, ACT 2600 Australia
| |
Collapse
|
20
|
Nguyen M, Wemheuer B, Laffy PW, Webster NS, Thomas T. Taxonomic, functional and expression analysis of viral communities associated with marine sponges. PeerJ 2021; 9:e10715. [PMID: 33604175 PMCID: PMC7863781 DOI: 10.7717/peerj.10715] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 12/15/2020] [Indexed: 12/19/2022] Open
Abstract
Viruses play an essential role in shaping the structure and function of ecological communities. Marine sponges have the capacity to filter large volumes of ‘virus-laden’ seawater through their bodies and host dense communities of microbial symbionts, which are likely accessible to viral infection. However, despite the potential of sponges and their symbionts to act as viral reservoirs, little is known about the sponge-associated virome. Here we address this knowledge gap by analysing metagenomic and (meta-) transcriptomic datasets from several sponge species to determine what viruses are present and elucidate their predicted and expressed functionality. Sponges were found to carry diverse, abundant and active bacteriophages as well as eukaryotic viruses belonging to the Megavirales and Phycodnaviridae. These viruses contain and express auxiliary metabolic genes (AMGs) for photosynthesis and vitamin synthesis as well as for the production of antimicrobials and the defence against toxins. These viral AMGs can therefore contribute to the metabolic capacities of their hosts and also potentially enhance the survival of infected cells. This suggest that viruses may play a key role in regulating the abundance and activities of members of the sponge holobiont.
Collapse
Affiliation(s)
- Mary Nguyen
- Centre for Marine Science and Innovation & School of Biological & Earth and Environmental Sciences, University of New South Wales, Sydney, NSW, Australia
| | - Bernd Wemheuer
- Centre for Marine Science and Innovation & School of Biological & Earth and Environmental Sciences, University of New South Wales, Sydney, NSW, Australia
| | - Patrick W Laffy
- Australian Institute of Marine Science, Townsville, QLD, Australia
| | - Nicole S Webster
- Australian Institute of Marine Science, Townsville, QLD, Australia.,Australian Centre for Ecogenomics, University of Queensland, Brisbane, QLD, Australia
| | - Torsten Thomas
- Centre for Marine Science and Innovation & School of Biological & Earth and Environmental Sciences, University of New South Wales, Sydney, NSW, Australia
| |
Collapse
|
21
|
Artificial Intelligence in Infectious Diseases. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_103-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
22
|
Eldow ME. The worldwide methods of artificial intelligence for detection and diagnosis of COVID-19. LEVERAGING ARTIFICIAL INTELLIGENCE IN GLOBAL EPIDEMICS 2021. [PMCID: PMC8342405 DOI: 10.1016/b978-0-323-89777-8.00012-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
This chapter aims to present and review the early and recent initiatives, researches, and applications of artificial intelligence (AI) and its methods on the detection and diagnosis of COVID-19 virus. Common methods of AI will be presented on these applications including the traditional work on artificial neural networks and the recent approaches on machine learning and deep learning. In this chapter, a survey on many examples of the application of the AI methods on the detection and diagnosis of COVID-19 will be highlighted including the early trials in China and the recent researches in other regions, beside most trials in all other regions of the World including Asia, North America, Europe, and Africa. This chapter will also show many comparisons of the explained approaches and trials based on methods, type of applications, and regions. Brief view of the future and expected applications and trends of AI in the area of detection and diagnosis of viruses and especially the COVID-19 are explained and discussed at the end of the chapter.
Collapse
|
23
|
Rahbar MR, Zarei M, Jahangiri A, Khalili S, Nezafat N, Negahdaripour M, Fattahian Y, Savardashtaki A, Ghasemi Y. Non-adaptive Evolution of Trimeric Autotransporters in Brucellaceae. Front Microbiol 2020; 11:560667. [PMID: 33281759 PMCID: PMC7688925 DOI: 10.3389/fmicb.2020.560667] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2020] [Accepted: 10/05/2020] [Indexed: 12/14/2022] Open
Abstract
Brucella species are Gram-negative, facultative intracellular pathogens. They are the main cause of brucellosis, which has led to a global health burden. Adherence of the pathogen to the host cells is the first step in the infection process. The bacteria can adhere to various biotic and abiotic surfaces using their outer membrane proteins. Trimeric autotransporter adhesins (TAAs) are modular homotrimers of various length and domain complexity. They are a diverse, and widespread gene family constituting the type Vc secretion pathway. These adhesins have been established as virulence factors in Brucellaceae. To date, no comprehensive and exhaustive study has been performed on the trimeric autotransporter family in the genus. In the present study, various bioinformatics tools were used to provide a novel evolutionary insight into the sequence and structure of this protein family in Brucellaceae. To this end, a dataset of all trimeric autotransporters from the Brucella genomes was built. Analyses included but were not limited to sequence alignment, phylogenetic tree constructions, codon-based test for selection, clustering of the sequences, and structure (primary to quaternary) predictions. Batch analyzes of the dataset suggested the existence of a few structural domains within the whole population. BatA from the B. abortus 2308 genome was selected as a reference to describe the features of these structural domains. Furthermore, we examined the structural basis for the observed rigidity and resiliency of the protein structure through a molecular dynamics evaluation, which led us to deduce that the random drift results in the non-adaptive evolution of the trimeric autotransporter genes in the Brucella genus. Notably, the modifications have occurred across the genus without interference of gene transmission.
Collapse
Affiliation(s)
- Mohammad Reza Rahbar
- Pharmaceutical Sciences Research Center, School of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Mahboubeh Zarei
- Pharmaceutical Sciences Research Center, School of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Abolfazl Jahangiri
- Applied Microbiology Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Saeed Khalili
- Department of Biology Sciences, Shahid Rajaee Teacher Training University, Tehran, Iran
| | - Navid Nezafat
- Pharmaceutical Sciences Research Center, School of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran
- Department of Pharmaceutical Biotechnology, School of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Manica Negahdaripour
- Pharmaceutical Sciences Research Center, School of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran
- Department of Pharmaceutical Biotechnology, School of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Yaser Fattahian
- Department of Biotechnology, Institute of Science and High Technology and Environmental Sciences, Graduate University of Advanced Technology, Kerman, Iran
| | - Amir Savardashtaki
- Department of Medical Biotechnology, School of Advanced Medical Sciences and Technologies, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Younes Ghasemi
- Pharmaceutical Sciences Research Center, School of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran
- Department of Pharmaceutical Biotechnology, School of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran
| |
Collapse
|
24
|
Alam MNU, Chowdhury UF. Short k-mer abundance profiles yield robust machine learning features and accurate classifiers for RNA viruses. PLoS One 2020; 15:e0239381. [PMID: 32946529 PMCID: PMC7500682 DOI: 10.1371/journal.pone.0239381] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Accepted: 09/06/2020] [Indexed: 01/20/2023] Open
Abstract
High-throughput sequencing technologies have greatly enabled the study of genomics, transcriptomics and metagenomics. Automated annotation and classification of the vast amounts of generated sequence data has become paramount for facilitating biological sciences. Genomes of viruses can be radically different from all life, both in terms of molecular structure and primary sequence. Alignment-based and profile-based searches are commonly employed for characterization of assembled viral contigs from high-throughput sequencing data. Recent attempts have highlighted the use of machine learning models for the task, but these models rely entirely on DNA genomes and owing to the intrinsic genomic complexity of viruses, RNA viruses have gone completely overlooked. Here, we present a novel short k-mer based sequence scoring method that generates robust sequence information for training machine learning classifiers. We trained 18 classifiers for the task of distinguishing viral RNA from human transcripts. We challenged our models with very stringent testing protocols across different species and evaluated performance against BLASTn, BLASTx and HMMER3 searches. For clean sequence data retrieved from curated databases, our models display near perfect accuracy, outperforming all similar attempts previously reported. On de novo assemblies of raw RNA-Seq data from cells subjected to Ebola virus, the area under the ROC curve varied from 0.6 to 0.86 depending on the software used for assembly. Our classifier was able to properly classify the majority of the false hits generated by BLAST and HMMER3 searches on the same data. The outstanding performance metrics of our model lays the groundwork for robust machine learning methods for the automated annotation of sequence data.
Collapse
Affiliation(s)
- Md. Nafis Ul Alam
- Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka, Bangladesh
| | - Umar Faruq Chowdhury
- Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka, Bangladesh
| |
Collapse
|
25
|
Peiffer-Smadja N, Dellière S, Rodriguez C, Birgand G, Lescure FX, Fourati S, Ruppé E. Machine learning in the clinical microbiology laboratory: has the time come for routine practice? Clin Microbiol Infect 2020; 26:1300-1309. [PMID: 32061795 DOI: 10.1016/j.cmi.2020.02.006] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 02/04/2020] [Accepted: 02/06/2020] [Indexed: 12/20/2022]
Abstract
BACKGROUND Machine learning (ML) allows the analysis of complex and large data sets and has the potential to improve health care. The clinical microbiology laboratory, at the interface of clinical practice and diagnostics, is of special interest for the development of ML systems. AIMS This narrative review aims to explore the current use of ML In clinical microbiology. SOURCES References for this review were identified through searches of MEDLINE/PubMed, EMBASE, Google Scholar, biorXiv, arXiV, ACM Digital Library and IEEE Xplore Digital Library up to November 2019. CONTENT We found 97 ML systems aiming to assist clinical microbiologists. Overall, 82 ML systems (85%) targeted bacterial infections, 11 (11%) parasitic infections, nine (9%) viral infections and three (3%) fungal infections. Forty ML systems (41%) focused on microorganism detection, identification and quantification, 36 (37%) evaluated antimicrobial susceptibility, and 21 (22%) targeted the diagnosis, disease classification and prediction of clinical outcomes. The ML systems used very diverse data sources: 21 (22%) used genomic data of microorganisms, 19 (20%) microbiota data obtained by metagenomic sequencing, 19 (20%) analysed microscopic images, 17 (18%) spectroscopy data, eight (8%) targeted gene sequencing, six (6%) volatile organic compounds, four (4%) photographs of bacterial colonies, four (4%) transcriptome data, three (3%) protein structure, and three (3%) clinical data. Most systems used data from high-income countries (n = 71, 73%) but a significant number used data from low- and middle-income countries (n = 36, 37%). Performance measures were reported for the 97 ML systems, but no article described their use in clinical practice or reported impact on processes or clinical outcomes. IMPLICATIONS In clinical microbiology, ML has been used with various data sources and diverse practical applications. The evaluation and implementation processes represent the main gap in existing ML systems, requiring a focus on their interpretability and potential integration into real-world settings.
Collapse
Affiliation(s)
- N Peiffer-Smadja
- National Institute for Health Research Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, Imperial College London, London, UK; Université de Paris, IAME, INSERM, F-75018 Paris, France
| | - S Dellière
- Université de Paris, Laboratoire de Parasitologie-Mycologie, Groupe Hospitalier Saint-Louis-Lariboisière-Fernand-Widal, Assistance Publique-Hôpitaux de Paris (AP-HP), Paris, France
| | - C Rodriguez
- Department of Prevention, Diagnosis and Treatment of Infections, Henri-Mondor Hospital, APHP, Université Paris-Est Créteil, IMRB, INSERM U955, Créteil, France
| | - G Birgand
- National Institute for Health Research Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, Imperial College London, London, UK
| | - F-X Lescure
- Université de Paris, IAME, INSERM, F-75018 Paris, France
| | - S Fourati
- Department of Prevention, Diagnosis and Treatment of Infections, Henri-Mondor Hospital, APHP, Université Paris-Est Créteil, IMRB, INSERM U955, Créteil, France
| | - E Ruppé
- Université de Paris, IAME, INSERM, F-75018 Paris, France.
| |
Collapse
|
26
|
Tampuu A, Bzhalava Z, Dillner J, Vicente R. ViraMiner: Deep learning on raw DNA sequences for identifying viral genomes in human samples. PLoS One 2019; 14:e0222271. [PMID: 31509583 PMCID: PMC6738585 DOI: 10.1371/journal.pone.0222271] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Accepted: 08/22/2019] [Indexed: 11/23/2022] Open
Abstract
Despite its clinical importance, detection of highly divergent or yet unknown viruses is a major challenge. When human samples are sequenced, conventional alignments classify many assembled contigs as "unknown" since many of the sequences are not similar to known genomes. In this work, we developed ViraMiner, a deep learning-based method to identify viruses in various human biospecimens. ViraMiner contains two branches of Convolutional Neural Networks designed to detect both patterns and pattern-frequencies on raw metagenomics contigs. The training dataset included sequences obtained from 19 metagenomic experiments which were analyzed and labeled by BLAST. The model achieves significantly improved accuracy compared to other machine learning methods for viral genome classification. Using 300 bp contigs ViraMiner achieves 0.923 area under the ROC curve. To our knowledge, this is the first machine learning methodology that can detect the presence of viral sequences among raw metagenomic contigs from diverse human samples. We suggest that the proposed model captures different types of information of genome composition, and can be used as a recommendation system to further investigate sequences labeled as "unknown" by conventional alignment methods. Exploring these highly-divergent viruses, in turn, can enhance our knowledge of infectious causes of diseases.
Collapse
Affiliation(s)
- Ardi Tampuu
- Computational Neuroscience Lab, Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Zurab Bzhalava
- Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Joakim Dillner
- Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
- Karolinska University Laboratory, Karolinska University Hospital, Stockholm, Sweden
| | - Raul Vicente
- Computational Neuroscience Lab, Institute of Computer Science, University of Tartu, Tartu, Estonia
| |
Collapse
|
27
|
Kallies R, Hölzer M, Brizola Toscan R, Nunes da Rocha U, Anders J, Marz M, Chatzinotas A. Evaluation of Sequencing Library Preparation Protocols for Viral Metagenomic Analysis from Pristine Aquifer Groundwaters. Viruses 2019; 11:E484. [PMID: 31141902 PMCID: PMC6631259 DOI: 10.3390/v11060484] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Revised: 05/26/2019] [Accepted: 05/27/2019] [Indexed: 01/03/2023] Open
Abstract
Viral ecology of terrestrial habitats is yet-to be extensively explored, in particular the terrestrial subsurface. One problem in obtaining viral sequences from groundwater aquifer samples is the relatively low amount of virus particles. As a result, the amount of extracted DNA may not be sufficient for direct sequencing of such samples. Here we compared three DNA amplification methods to enrich viral DNA from three pristine limestone aquifer assemblages of the Hainich Critical Zone Exploratory to evaluate potential bias created by the different amplification methods as determined by viral metagenomics. Linker amplification shotgun libraries resulted in lowest redundancy among the sequencing reads and showed the highest diversity, while multiple displacement amplification produced the highest number of contigs with the longest average contig size, suggesting a combination of these two methods is suitable for the successful enrichment of viral DNA from pristine groundwater samples. In total, we identified 27,173, 5,886 and 32,613 viral contigs from the three samples from which 11.92 to 18.65% could be assigned to taxonomy using blast. Among these, members of the Caudovirales order were the most abundant group (52.20 to 69.12%) dominated by Myoviridae and Siphoviridae. Those, and the high number of unknown viral sequences, substantially expand the known virosphere.
Collapse
Affiliation(s)
- René Kallies
- Helmholtz Centre for Environmental Research - UFZ, Department of Environmental Microbiology, 04318 Leipzig, Germany.
| | - Martin Hölzer
- Friedrich Schiller University Jena, RNA Bioinformatics and High-Throughput Analysis, 07743 Jena, Germany.
- European Virus Bioinformatics Center, 07743 Jena, Germany.
| | - Rodolfo Brizola Toscan
- Helmholtz Centre for Environmental Research - UFZ, Department of Environmental Microbiology, 04318 Leipzig, Germany.
| | - Ulisses Nunes da Rocha
- Helmholtz Centre for Environmental Research - UFZ, Department of Environmental Microbiology, 04318 Leipzig, Germany.
| | - John Anders
- Helmholtz Centre for Environmental Research - UFZ, Department of Environmental Microbiology, 04318 Leipzig, Germany.
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, University Leipzig, 04081 Leipzig, Germany.
| | - Manja Marz
- Friedrich Schiller University Jena, RNA Bioinformatics and High-Throughput Analysis, 07743 Jena, Germany.
- European Virus Bioinformatics Center, 07743 Jena, Germany.
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, 04103 Leipzig, Germany.
| | - Antonis Chatzinotas
- Helmholtz Centre for Environmental Research - UFZ, Department of Environmental Microbiology, 04318 Leipzig, Germany.
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, 04103 Leipzig, Germany.
| |
Collapse
|
28
|
Ponsero AJ, Hurwitz BL. The Promises and Pitfalls of Machine Learning for Detecting Viruses in Aquatic Metagenomes. Front Microbiol 2019; 10:806. [PMID: 31057513 PMCID: PMC6477088 DOI: 10.3389/fmicb.2019.00806] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Accepted: 03/29/2019] [Indexed: 01/21/2023] Open
Abstract
Tools allowing for the identification of viral sequences in host-associated and environmental metagenomes allows for a better understanding of the genetics and ecology of viruses and their hosts. Recently, new approaches using machine learning methods to distinguish viral from bacterial signal using k-mer sequence signatures were published for identifying viral contigs in metagenomes. The promise of these content-based approaches is the ability to discover new viruses, with no or few known relatives. In this perspective paper, we examine the use of the content-based machine learning tool VirFinder for the identification of viral sequences in aquatic metagenomes and explore the possibility of using ecosystem-focused models targeted to marine metagenomes. We discuss the impact of the training set composition on the tool performance and the current limitation for the retrieval of low abundance viral sequences in metagenomes. We identify potential biases that could arise from machine learning approaches for viral hunting in real-world datasets and suggest possible avenues to overcome them.
Collapse
Affiliation(s)
- Alise J Ponsero
- Department of Biosystems Engineering, The University of Arizona, Tucson, AZ, United States
| | - Bonnie L Hurwitz
- Department of Biosystems Engineering, The University of Arizona, Tucson, AZ, United States.,BIO5 Institute, The University of Arizona, Tucson, AZ, United States
| |
Collapse
|
29
|
Ponsero AJ, Hurwitz BL. The Promises and Pitfalls of Machine Learning for Detecting Viruses in Aquatic Metagenomes. Front Microbiol 2019. [PMID: 31057513 DOI: 10.3389/fmicb.2019.00806/full] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/26/2023] Open
Abstract
Tools allowing for the identification of viral sequences in host-associated and environmental metagenomes allows for a better understanding of the genetics and ecology of viruses and their hosts. Recently, new approaches using machine learning methods to distinguish viral from bacterial signal using k-mer sequence signatures were published for identifying viral contigs in metagenomes. The promise of these content-based approaches is the ability to discover new viruses, with no or few known relatives. In this perspective paper, we examine the use of the content-based machine learning tool VirFinder for the identification of viral sequences in aquatic metagenomes and explore the possibility of using ecosystem-focused models targeted to marine metagenomes. We discuss the impact of the training set composition on the tool performance and the current limitation for the retrieval of low abundance viral sequences in metagenomes. We identify potential biases that could arise from machine learning approaches for viral hunting in real-world datasets and suggest possible avenues to overcome them.
Collapse
Affiliation(s)
- Alise J Ponsero
- Department of Biosystems Engineering, The University of Arizona, Tucson, AZ, United States
| | - Bonnie L Hurwitz
- Department of Biosystems Engineering, The University of Arizona, Tucson, AZ, United States
- BIO5 Institute, The University of Arizona, Tucson, AZ, United States
| |
Collapse
|