Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Fiannaca A, La Paglia L, La Rosa M, Lo Bosco G, Renda G, Rizzo R, Gaglio S, Urso A. Deep learning models for bacteria taxonomic classification of metagenomic data. BMC Bioinformatics 2018;19:198. [PMID: 30066629 PMCID: PMC6069770 DOI: 10.1186/s12859-018-2182-6] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

For:	Fiannaca A, La Paglia L, La Rosa M, Lo Bosco G, Renda G, Rizzo R, Gaglio S, Urso A. Deep learning models for bacteria taxonomic classification of metagenomic data. BMC Bioinformatics 2018;19:198. [PMID: 30066629 PMCID: PMC6069770 DOI: 10.1186/s12859-018-2182-6] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Number

Cited by Other Article(s)

Gündüz HA, Mreches R, Moosbauer J, Robertson G, To XY, Franzosa EA, Huttenhower C, Rezaei M, McHardy AC, Bischl B, Münch PC, Binder M. Optimized model architectures for deep learning on genomic data. Commun Biol 2024;7:516. [PMID: 38693292 PMCID: PMC11063068 DOI: 10.1038/s42003-024-06161-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 04/08/2024] [Indexed: 05/03/2024] Open

Affiliation(s)

Hüseyin Anil Gündüz Department of Statistics, LMU Munich, Munich, Germany Munich Center for Machine Learning, Munich, Germany
René Mreches Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, 38124, Braunschweig, Germany Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
Julia Moosbauer Department of Statistics, LMU Munich, Munich, Germany Munich Center for Machine Learning, Munich, Germany
Gary Robertson Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, 38124, Braunschweig, Germany Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
Xiao-Yin To Department of Statistics, LMU Munich, Munich, Germany Munich Center for Machine Learning, Munich, Germany Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, 38124, Braunschweig, Germany Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
Eric A Franzosa Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA
Curtis Huttenhower Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA
Mina Rezaei Department of Statistics, LMU Munich, Munich, Germany Munich Center for Machine Learning, Munich, Germany
Alice C McHardy Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, 38124, Braunschweig, Germany Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany German Centre for Infection Research (DZIF), partner site Hannover Braunschweig, Braunschweig, Germany
Bernd Bischl Department of Statistics, LMU Munich, Munich, Germany Munich Center for Machine Learning, Munich, Germany
Philipp C Münch Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, 38124, Braunschweig, Germany. Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany. Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA. German Centre for Infection Research (DZIF), partner site Hannover Braunschweig, Braunschweig, Germany.
Martin Binder Department of Statistics, LMU Munich, Munich, Germany. Munich Center for Machine Learning, Munich, Germany.

Collapse

Roy G, Prifti E, Belda E, Zucker JD. Deep learning methods in metagenomics: a review. Microb Genom 2024;10. [PMID: 38630611 DOI: 10.1099/mgen.0.001231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024] Open

Baddal B, Taner F, Uzun Ozsahin D. Harnessing of Artificial Intelligence for the Diagnosis and Prevention of Hospital-Acquired Infections: A Systematic Review. Diagnostics (Basel) 2024;14:484. [PMID: 38472956 DOI: 10.3390/diagnostics14050484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Revised: 01/23/2024] [Accepted: 02/19/2024] [Indexed: 03/14/2024] Open

Wu Z, Guo Y, Hayakawa M, Yang W, Lu Y, Ma J, Li L, Li C, Liu Y, Niu J. Artificial intelligence-driven microbiome data analysis for estimation of postmortem interval and crime location. Front Microbiol 2024;15:1334703. [PMID: 38314433 PMCID: PMC10834752 DOI: 10.3389/fmicb.2024.1334703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 01/08/2024] [Indexed: 02/06/2024] Open

Arias PM, Butler J, Randhawa GS, Soltysiak MPM, Hill KA, Kari L. Environment and taxonomy shape the genomic signature of prokaryotic extremophiles. Sci Rep 2023;13:16105. [PMID: 37752120 PMCID: PMC10522608 DOI: 10.1038/s41598-023-42518-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 09/11/2023] [Indexed: 09/28/2023] Open

Gündüz HA, Binder M, To XY, Mreches R, Bischl B, McHardy AC, Münch PC, Rezaei M. A self-supervised deep learning method for data-efficient training in genomics. Commun Biol 2023;6:928. [PMID: 37696966 PMCID: PMC10495322 DOI: 10.1038/s42003-023-05310-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 09/01/2023] [Indexed: 09/13/2023] Open

Park H, Lim SJ, Cosme J, O'Connell K, Sandeep J, Gayanilo F, Cutter Jr. GR, Montes E, Nitikitpaiboon C, Fisher S, Moustahfid H, Thompson LR. Investigation of machine learning algorithms for taxonomic classification of marine metagenomes. Microbiol Spectr 2023;11:e0523722. [PMID: 37695074 PMCID: PMC10580933 DOI: 10.1128/spectrum.05237-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 06/30/2023] [Indexed: 09/12/2023] Open

Abstract

Microbial communities play key roles in ocean ecosystems through regulation of biogeochemical processes such as carbon and nutrient cycling, food web dynamics, and gut microbiomes of invertebrates, fish, reptiles, and mammals. Assessments of marine microbial diversity are therefore critical to understanding spatiotemporal variations in microbial community structure and function in ocean ecosystems. With recent advances in DNA shotgun sequencing for metagenome samples and computational analysis, it is now possible to access the taxonomic and genomic content of ocean microbial communities to study their structural patterns, diversity, and functional potential. However, existing taxonomic classification tools depend upon manually curated phylogenetic trees, which can create inaccuracies in metagenomes from less well-characterized communities, such as from ocean water. Herein, we explore the utility of deep learning tools-DeepMicrobes and a novel Residual Network architecture-that leverage natural language processing and convolutional neural network architectures to map input sequence data (k-mers) to output labels (taxonomic groups) without reliance on a curated taxonomic tree. We trained both models using metagenomic reads simulated from marine microbial genomes in the MarRef database. The performance of both models (accuracy, precision, and percent microbe predicted) was compared with the standard taxonomic classification tool Kraken2 using 10 complex metagenomic data sets simulated from MarRef. Our results demonstrate that time, compute power, and microbial genomic diversity still pose challenges for machine learning (ML). Moreover, our results suggest that high genome coverage and rectification of class imbalance are prerequisites for a well-trained model, and therefore should be a major consideration in future ML work. IMPORTANCE Taxonomic profiling of microbial communities is essential to model microbial interactions and inform habitat conservation. This work develops approaches in constructing training/testing data sets from publicly available marine metagenomes and evaluates the performance of machine learning (ML) approaches in read-based taxonomic classification of marine metagenomes. Predictions from two models are used to test accuracy in metagenomic classification and to guide improvements in ML approaches. Our study provides insights on the methods, results, and challenges of deep learning on marine microbial metagenomic data sets. Future machine learning approaches can be improved by rectifying genome coverage and class imbalance in the training data sets, developing alternative models, and increasing the accessibility of computational resources for model training and refinement.

Collapse

Affiliation(s)

Helen Park Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua-Peking Center for Life Sciences, Tsinghua University, Beijing, China EPSRC/BBSRC Future Biomanufacturing Research Hub, EPSRC Synthetic Biology Research Centre SYNBIOCHEM Manchester Institute of Biotechnology and School of Chemistry, The University of Manchester, Manchester, United Kingdom
Shen Jean Lim Cooperative Institute for Marine and Atmospheric Studies, Rosenstiel School of Marine, Atmospheric, and Earth Science, University of Miami, Miami, Florida, USA Ocean Chemistry and Ecosystems Division, Atlantic Oceanographic and Meteorological Laboratory, National Oceanic and Atmospheric Administration, Miami, Florida, USA College of Marine Science, University of South Florida, St Petersburg, Florida, USA
Jonathan Cosme Run:AI, Office of the CTO, Tel Aviv, Israel
Kyle O'Connell Deloitte Consulting LLP, Biomedical Data Science Team, Arlington, Virginia, USA Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Northwest, Washington, DC, USA
Jilla Sandeep Harte Research Institute, Texas A&M University-Corpus Christi, Corpus Christi, Texas, USA
Felimon Gayanilo Harte Research Institute, Texas A&M University-Corpus Christi, Corpus Christi, Texas, USA
George R. Cutter Jr. Southwest Fisheries Science Center, Antarctic Ecosystem Research Division, National Oceanic and Atmospheric Administration, La Jolla, California, USA
Enrique Montes Cooperative Institute for Marine and Atmospheric Studies, Rosenstiel School of Marine, Atmospheric, and Earth Science, University of Miami, Miami, Florida, USA Ocean Chemistry and Ecosystems Division, Atlantic Oceanographic and Meteorological Laboratory, National Oceanic and Atmospheric Administration, Miami, Florida, USA
Chotinan Nitikitpaiboon Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
Sam Fisher Deloitte Consulting LLP, Biomedical Data Science Team, Arlington, Virginia, USA
Hassan Moustahfid NOAA/US Integrated Ocean Observing System (IOOS), Silver Spring, Maryland, USA
Luke R. Thompson Ocean Chemistry and Ecosystems Division, Atlantic Oceanographic and Meteorological Laboratory, National Oceanic and Atmospheric Administration, Miami, Florida, USA Northern Gulf Institute, Mississippi State University, Mississippi, USA

Collapse

Zhao L, Walkowiak S, Fernando WGD. Artificial Intelligence: A Promising Tool in Exploring the Phytomicrobiome in Managing Disease and Promoting Plant Health. PLANTS (BASEL, SWITZERLAND) 2023;12:plants12091852. [PMID: 37176910 PMCID: PMC10180744 DOI: 10.3390/plants12091852] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 04/25/2023] [Accepted: 04/27/2023] [Indexed: 05/15/2023]

Cres CM, Tritt A, Bouchard KE, Zhang Y. DL-TODA: A Deep Learning Tool for Omics Data Analysis. Biomolecules 2023;13:biom13040585. [PMID: 37189333 DOI: 10.3390/biom13040585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 03/07/2023] [Accepted: 03/22/2023] [Indexed: 05/17/2023] Open

CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification. Genes (Basel) 2023;14:genes14030634. [PMID: 36980906 PMCID: PMC10048311 DOI: 10.3390/genes14030634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 12/28/2022] [Accepted: 01/09/2023] [Indexed: 03/06/2023] Open

Abadi SAR, Mohammadi A, Koohi S. An automated ultra-fast, memory-efficient, and accurate method for viral genome classification. J Biomed Inform 2023;139:104316. [PMID: 36781036 DOI: 10.1016/j.jbi.2023.104316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Revised: 01/30/2023] [Accepted: 02/08/2023] [Indexed: 02/13/2023]

Madival SD, Mishra DC, Sharma A, Kumar S, Maji AK, Budhlakoti N, Sinha D, Rai A. A Deep Clustering-based Novel Approach for Binning of Metagenomics Data. Curr Genomics 2022;23:353-368. [PMID: 36778191 PMCID: PMC9878855 DOI: 10.2174/1389202923666220928150100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 08/30/2022] [Accepted: 09/02/2022] [Indexed: 11/22/2022] Open

Deciphering microbial gene function using natural language processing. Nat Commun 2022;13:5731. [PMID: 36175448 PMCID: PMC9523054 DOI: 10.1038/s41467-022-33397-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Accepted: 09/16/2022] [Indexed: 11/08/2022] Open

Taxonomic classification of DNA sequences beyond sequence similarity using deep neural networks. Proc Natl Acad Sci U S A 2022;119:e2122636119. [PMID: 36018838 PMCID: PMC9436379 DOI: 10.1073/pnas.2122636119] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Bai X, Ren J, Sun F. MLR-OOD: A Markov Chain Based Likelihood Ratio Method for Out-Of-Distribution Detection of Genomic Sequences. J Mol Biol 2022;434:167586. [PMID: 35427634 PMCID: PMC10433695 DOI: 10.1016/j.jmb.2022.167586] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Revised: 04/05/2022] [Accepted: 04/05/2022] [Indexed: 12/23/2022]

Câmara GBM, Coutinho MGF, da Silva LMD, Gadelha WVDN, Torquato MF, Barbosa RDM, Fernandes MAC. Convolutional Neural Network Applied to SARS-CoV-2 Sequence Classification. SENSORS (BASEL, SWITZERLAND) 2022;22:5730. [PMID: 35957287 PMCID: PMC9371030 DOI: 10.3390/s22155730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 07/28/2022] [Accepted: 07/28/2022] [Indexed: 06/15/2023]

Abstract

COVID-19, the illness caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus belonging to the Coronaviridade family, a single-strand positive-sense RNA genome, has been spreading around the world and has been declared a pandemic by the World Health Organization. On 17 January 2022, there were more than 329 million cases, with more than 5.5 million deaths. Although COVID-19 has a low mortality rate, its high capacities for contamination, spread, and mutation worry the authorities, especially after the emergence of the Omicron variant, which has a high transmission capacity and can more easily contaminate even vaccinated people. Such outbreaks require elucidation of the taxonomic classification and origin of the virus (SARS-CoV-2) from the genomic sequence for strategic planning, containment, and treatment of the disease. Thus, this work proposes a high-accuracy technique to classify viruses and other organisms from a genome sequence using a deep learning convolutional neural network (CNN). Unlike the other literature, the proposed approach does not limit the length of the genome sequence. The results show that the novel proposal accurately distinguishes SARS-CoV-2 from the sequences of other viruses. The results were obtained from 1557 instances of SARS-CoV-2 from the National Center for Biotechnology Information (NCBI) and 14,684 different viruses from the Virus-Host DB. As a CNN has several changeable parameters, the tests were performed with forty-eight different architectures; the best of these had an accuracy of 91.94 ± 2.62% in classifying viruses into their realms correctly, in addition to 100% accuracy in classifying SARS-CoV-2 into its respective realm, Riboviria. For the subsequent classifications (family, genera, and subgenus), this accuracy increased, which shows that the proposed architecture may be viable in the classification of the virus that causes COVID-19.

Collapse

Affiliation(s)

Gabriel B. M. Câmara Bioinformatics Multidisciplinary Environment (BioME), Federal University of Rio Grande do Norte, Natal 59078-970, RN, Brazil; Laboratory of Machine Learning and Intelligent Instrumentation, Federal University of Rio Grande do Norte, Natal 59078-970, RN, Brazil; (M.G.F.C.); (L.M.D.d.S.); (W.V.d.N.G.); (M.F.T.)
Maria G. F. Coutinho Laboratory of Machine Learning and Intelligent Instrumentation, Federal University of Rio Grande do Norte, Natal 59078-970, RN, Brazil; (M.G.F.C.); (L.M.D.d.S.); (W.V.d.N.G.); (M.F.T.)
Lucileide M. D. da Silva Laboratory of Machine Learning and Intelligent Instrumentation, Federal University of Rio Grande do Norte, Natal 59078-970, RN, Brazil; (M.G.F.C.); (L.M.D.d.S.); (W.V.d.N.G.); (M.F.T.) Federal Institute of Education, Science and Technology of Rio Grande do Norte, Paraiso, Santa Cruz 59200-000, RN, Brazil
Walter V. do N. Gadelha Laboratory of Machine Learning and Intelligent Instrumentation, Federal University of Rio Grande do Norte, Natal 59078-970, RN, Brazil; (M.G.F.C.); (L.M.D.d.S.); (W.V.d.N.G.); (M.F.T.)
Matheus F. Torquato Laboratory of Machine Learning and Intelligent Instrumentation, Federal University of Rio Grande do Norte, Natal 59078-970, RN, Brazil; (M.G.F.C.); (L.M.D.d.S.); (W.V.d.N.G.); (M.F.T.)
Raquel de M. Barbosa Laboratory of Machine Learning and Intelligent Instrumentation, Federal University of Rio Grande do Norte, Natal 59078-970, RN, Brazil; (M.G.F.C.); (L.M.D.d.S.); (W.V.d.N.G.); (M.F.T.) Department of Pharmacy and Pharmaceutical Technology, University of Granada, 18071 Granada, Spain
Marcelo A. C. Fernandes Bioinformatics Multidisciplinary Environment (BioME), Federal University of Rio Grande do Norte, Natal 59078-970, RN, Brazil; Laboratory of Machine Learning and Intelligent Instrumentation, Federal University of Rio Grande do Norte, Natal 59078-970, RN, Brazil; (M.G.F.C.); (L.M.D.d.S.); (W.V.d.N.G.); (M.F.T.) Department of Computer Engineering and Automation, Federal University of Rio Grande do Norte, Natal 59078-970, RN, Brazil

Collapse

Jiang Y, Luo J, Huang D, Liu Y, Li DD. Machine Learning Advances in Microbiology: A Review of Methods and Applications. Front Microbiol 2022;13:925454. [PMID: 35711777 PMCID: PMC9196628 DOI: 10.3389/fmicb.2022.925454] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 05/09/2022] [Indexed: 12/18/2022] Open

McElhinney JMWR, Catacutan MK, Mawart A, Hasan A, Dias J. Interfacing Machine Learning and Microbial Omics: A Promising Means to Address Environmental Challenges. Front Microbiol 2022;13:851450. [PMID: 35547145 PMCID: PMC9083327 DOI: 10.3389/fmicb.2022.851450] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2022] [Accepted: 03/14/2022] [Indexed: 11/13/2022] Open

WalkIm: Compact image-based encoding for high-performance classification of biological sequences using simple tuning-free CNNs. PLoS One 2022;17:e0267106. [PMID: 35427371 PMCID: PMC9012348 DOI: 10.1371/journal.pone.0267106] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Accepted: 04/01/2022] [Indexed: 11/28/2022] Open

Abstract

The classification of biological sequences is an open issue for a variety of data sets, such as viral and metagenomics sequences. Therefore, many studies utilize neural network tools, as the well-known methods in this field, and focus on designing customized network structures. However, a few works focus on more effective factors, such as input encoding method or implementation technology, to address accuracy and efficiency issues in this area. Therefore, in this work, we propose an image-based encoding method, called as WalkIm, whose adoption, even in a simple neural network, provides competitive accuracy and superior efficiency, compared to the existing classification methods (e.g. VGDC, CASTOR, and DLM-CNN) for a variety of biological sequences. Using WalkIm for classifying various data sets (i.e. viruses whole-genome data, metagenomics read data, and metabarcoding data), it achieves the same performance as the existing methods, with no enforcement of parameter initialization or network architecture adjustment for each data set. It is worth noting that even in the case of classifying high-mutant data sets, such as Coronaviruses, it achieves almost 100% accuracy for classifying its various types. In addition, WalkIm achieves high-speed convergence during network training, as well as reduction of network complexity. Therefore WalkIm method enables us to execute the classifying neural networks on a normal desktop system in a short time interval. Moreover, we addressed the compatibility of WalkIm encoding method with free-space optical processing technology. Taking advantages of optical implementation of convolutional layers, we illustrated that the training time can be reduced by up to 500 time. In addition to all aforementioned advantages, this encoding method preserves the structure of generated images in various modes of sequence transformation, such as reverse complement, complement, and reverse modes.

Collapse

Mathieu A, Leclercq M, Sanabria M, Perin O, Droit A. Machine Learning and Deep Learning Applications in Metagenomic Taxonomy and Functional Annotation. Front Microbiol 2022;13:811495. [PMID: 35359727 PMCID: PMC8964132 DOI: 10.3389/fmicb.2022.811495] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 02/02/2022] [Indexed: 12/12/2022] Open

Efficient and Quality-Optimized Metagenomic Pipeline Designed for Taxonomic Classification in Routine Microbiological Clinical Tests. Microorganisms 2022;10:microorganisms10040711. [PMID: 35456762 PMCID: PMC9026403 DOI: 10.3390/microorganisms10040711] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Revised: 03/09/2022] [Accepted: 03/23/2022] [Indexed: 01/26/2023] Open

Herrera-García JA, Martinez M, Zamora-Tavares P, Vargas-Ponce O, Hernández-Sandoval L, Rodríguez-Zaragoza FA. Metabarcoding of the phytotelmata of Pseudalcantarea grandis (Bromeliaceae) from an arid zone. PeerJ 2022;10:e12706. [PMID: 35127281 PMCID: PMC8801176 DOI: 10.7717/peerj.12706] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Accepted: 12/07/2021] [Indexed: 01/07/2023] Open

Abstract

BACKGROUND

Pseudalcantarea grandis (Schltdl.) Pinzón & Barfuss is a tank bromeliad that grows on cliffs in the southernmost portion of the Chihuahuan desert. Phytotelmata are water bodies formed by plants that function as micro-ecosystems where bacteria, algae, protists, insects, fungi, and some vertebrates can develop. We hypothesized that the bacterial diversity contained in the phytotelma formed in a bromeliad from an arid zone would differ in sites with and without surrounding vegetation. Our study aimed to characterize the bacterial composition and putative metabolic functions in P. grandis phytotelmata collected in vegetated and non-vegetated sites.

METHODS

Water from 10 individuals was sampled. Five individuals had abundant surrounding vegetation, and five had little or no vegetation. We extracted DNA and amplified seven hypervariable regions of the 16S gene (V2, V4, V8, V3-6, 7-9). Metabarcoding sequencing was performed on the Ion Torrent PGM platform. Taxonomic identity was assigned by the binning reads and coverage between hit and query from the reference database of at least 90%. Putative metabolic functions of the bacterial families were assigned mainly using the FAPROTAX database. The dominance patterns in each site were visualized with rank/abundance curves using the number of Operational Taxonomic Units (OTUs) per family. A percentage similarity analysis (SIMPER) was used to estimate dissimilarity between the sites. Relationships among bacterial families (identified by the dominance analysis and SIMPER), sites, and their respective putative functions were analyzed with shade plots.

RESULTS

A total of 1.5 million useful bacterial sequences were obtained. Sequences were clustered into OTUs, and taxonomic assignment was conducted using BLAST in the Greengenes databases. Bacterial diversity was 23 phyla, 52 classes, 98 orders, 218 families, and 297 genera. Proteobacteria (37%), Actinobacteria (19%), and Firmicutes (15%) comprised the highest percentage (71%). There was a 68.3% similarity between the two sites at family level, with 149 families shared. Aerobic chemoheterotrophy and fermentation were the main metabolic functions in both sites, followed by ureolysis, nitrate reduction, aromatic compound degradation, and nitrogen fixation. The dominant bacteria shared most of the metabolic functions between sites. Some functions were recorded for one site only and were related to families with the lowest OTUs richness. Bacterial diversity in the P. grandis tanks included dominant phyla and families present at low percentage that could be considered part of a rare biosphere. A rare biosphere can form genetic reservoirs, the local abundance of which depends on external abiotic and biotic factors, while their interactions could favor micro-ecosystem resilience and resistance.

Collapse

Decoding gut microbiota by imaging analysis of fecal samples. iScience 2021;24:103481. [PMID: 34927025 PMCID: PMC8652011 DOI: 10.1016/j.isci.2021.103481] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2019] [Revised: 09/21/2021] [Accepted: 11/19/2021] [Indexed: 01/09/2023] Open

Deng Z, Zhang J, Li J, Zhang X. Application of Deep Learning in Plant-Microbiota Association Analysis. Front Genet 2021;12:697090. [PMID: 34691142 PMCID: PMC8531731 DOI: 10.3389/fgene.2021.697090] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 08/31/2021] [Indexed: 01/04/2023] Open

Gupta S, Aga D, Pruden A, Zhang L, Vikesland P. Data Analytics for Environmental Science and Engineering Research. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2021;55:10895-10907. [PMID: 34338518 DOI: 10.1021/acs.est.1c01026] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Young RB, Marcelino VR, Chonwerawong M, Gulliver EL, Forster SC. Key Technologies for Progressing Discovery of Microbiome-Based Medicines. Front Microbiol 2021;12:685935. [PMID: 34239510 PMCID: PMC8258393 DOI: 10.3389/fmicb.2021.685935] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 05/25/2021] [Indexed: 12/22/2022] Open

Ziemski M, Wisanwanichthan T, Bokulich NA, Kaehler BD. Beating Naive Bayes at Taxonomic Classification of 16S rRNA Gene Sequences. Front Microbiol 2021;12:644487. [PMID: 34220738 PMCID: PMC8249850 DOI: 10.3389/fmicb.2021.644487] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 05/31/2021] [Indexed: 12/28/2022] Open

Survey of artificial intelligence approaches in the study of anthropogenic impacts on symbiotic organisms – a holistic view. Symbiosis 2021. [DOI: 10.1007/s13199-021-00778-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Karagöz MA, Nalbantoglu OU. Taxonomic classification of metagenomic sequences from Relative Abundance Index profiles using deep learning. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102539] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

Kaden M, Bohnsack KS, Weber M, Kudła M, Gutowska K, Blazewicz J, Villmann T. Learning vector quantization as an interpretable classifier for the detection of SARS-CoV-2 types based on their RNA sequences. Neural Comput Appl 2021;34:67-78. [PMID: 33935376 PMCID: PMC8076884 DOI: 10.1007/s00521-021-06018-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Accepted: 04/07/2021] [Indexed: 02/06/2023]

Du Z, Xiao X, Uversky VN. Classification of Chromosomal DNA Sequences Using Hybrid Deep Learning Architectures. Curr Bioinform 2021. [DOI: 10.2174/1574893615666200224095531] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Abstract Background: Chromosomal DNA contains most of the genetic information of eukaryotes and plays an important role in the growth, development and reproduction of living organisms. Most chromosomal DNA sequences are known to wrap around histones, and distinguishing these DNA sequences from ordinary DNA sequences is important for understanding the genetic code of life. The main difficulty behind this problem is the feature selection process. DNA sequences have no explicit features, and the common representation methods, such as onehot coding, introduced the major drawback of high dimensionality. Recently, deep learning models have been proved to be able to automatically extract useful features from input patterns. Objective: We aim to investigate which deep learning networks could achieve notable improvements in the field of DNA sequence classification using only sequence information. Methods: In this paper, we present four different deep learning architectures using convolutional neural networks and long short-term memory networks for the purpose of chromosomal DNA sequence classification. Natural language model Word2vec was used to generate word embedding of sequence and learn features from it by deep learning. Results: The comparison of these four architectures is carried out on 10 chromosomal DNA datasets. The results show that the architecture of convolutional neural networks combined with long short-term memory networks is superior to other methods with regards to the accuracy of chromosomal DNA prediction. Conclusion: In this study, four deep learning models were compared for an automatic classification of chromosomal DNA sequences with no steps of sequence preprocessing. In particular, we have regarded DNA sequences as natural language and extracted word embedding with Word2Vec to represent DNA sequences. Results show a superiority of the CNN+LSTM model in the ten classification tasks. The reason for this success is that the CNN module captures the regulatory motifs, while the following LSTM layer captures the long-term dependencies between them. Collapse

Ghannam RB, Techtmann SM. Machine learning applications in microbial ecology, human microbiome studies, and environmental monitoring. Comput Struct Biotechnol J 2021;19:1092-1107. [PMID: 33680353 PMCID: PMC7892807 DOI: 10.1016/j.csbj.2021.01.028] [Citation(s) in RCA: 74] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 01/16/2021] [Accepted: 01/18/2021] [Indexed: 01/04/2023] Open

Imchen M, Kumavath R. Metagenomic insights into the antibiotic resistome of mangrove sediments and their association to socioeconomic status. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2021;268:115795. [PMID: 33068846 DOI: 10.1016/j.envpol.2020.115795] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 09/03/2020] [Accepted: 10/06/2020] [Indexed: 06/11/2023]

Abstract

Mangrove sediments are prone to anthropogenic activities that could enrich antibiotics resistance genes (ARGs). The emergence and dissemination of ARGs are of serious concern to public health worldwide. Therefore, a comprehensive resistome analysis of global mangrove sediment is of paramount importance. In this study, we have implemented a deep machine learning approach to analyze the resistome of mangrove sediments from Brazil, China, Saudi Arabia, India, and Malaysia. Geography (R_ANOSIM = 39.26%; p < 0.005) as well as human intervention (R_ANOSIM = 16.92%; p < 0.005) influenced the ARG diversity. ARG diversity was also inversely correlated to the human development index (HDI) of the host country (R = -0.53; p < 0.05) rather than antibiotics consumption (p > 0.05). Several genes including multidrug efflux pumps were significantly (p < 0.05) enriched in the sites with human intervention. Resistome was consistently dominated by rpoB2 (19.26 ± 0.01%), multidrug ABC transporter (10.40 ± 0.23%), macB (8.84 ± 0.36n%), tetA (4.13 ± 0.35%), mexF (3.26 ± 0.19%), CpxR (2.93 ± 0.2%), bcrA (2.38 ± 0.24%), acrB (2.37 ± 0.18%), mexW (2.19 ± 0.17%), and vanR (1.99 ± 0.11%). Besides, mobile ARGs such as vanA, tet(48), mcr, and tetX were also detected in the mangrove sediments. Comparative analysis against terrestrial and ocean resistomes showed that the ocean ecosystem harbored the lowest ARG diversity (Chao1 = 71.12) followed by mangroves (Chao1 = 258.07) and terrestrial ecosystem (Chao1 = 294.07). ARG subtypes such as abeS and qacG were detected exclusively in ocean datasets. Likewise, rpoB2, multidrug ABC transporter, and macB, detected in mangrove and terrestrial datasets, were not detected in the ocean datasets. This study shows that the socioeconomic factors strongly determine the antibiotic resistome in the mangrove. Direct anthropogenic intervention in the mangrove environment also enriches antibiotic resistome.

Collapse

Zheng D, Pang G, Liu B, Chen L, Yang J. Learning transferable deep convolutional neural networks for the classification of bacterial virulence factors. Bioinformatics 2020;36:3693-3702. [PMID: 32251507 DOI: 10.1093/bioinformatics/btaa230] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Revised: 03/25/2020] [Accepted: 04/01/2020] [Indexed: 12/23/2022] Open

Power spectrum and dynamic time warping for DNA sequences classification. EVOLVING SYSTEMS 2020. [DOI: 10.1007/s12530-019-09306-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Zhao Z, Cristian A, Rosen G. Keeping up with the genomes: efficient learning of our increasing knowledge of the tree of life. BMC Bioinformatics 2020;21:412. [PMID: 32957925 PMCID: PMC7507296 DOI: 10.1186/s12859-020-03744-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2020] [Accepted: 09/08/2020] [Indexed: 11/26/2022] Open

Han R, Wang S, Gao X. Novel algorithms for efficient subsequence searching and mapping in nanopore raw signals towards targeted sequencing. Bioinformatics 2020;36:1333-1343. [PMID: 31593235 DOI: 10.1093/bioinformatics/btz742] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Revised: 07/24/2019] [Accepted: 10/01/2019] [Indexed: 01/31/2023] Open

Abstract

MOTIVATION

Genome diagnostics have gradually become a prevailing routine for human healthcare. With the advances in understanding the causal genes for many human diseases, targeted sequencing provides a rapid, cost-efficient and focused option for clinical applications, such as single nucleotide polymorphism (SNP) detection and haplotype classification, in a specific genomic region. Although nanopore sequencing offers a perfect tool for targeted sequencing because of its mobility, PCR-freeness and long read properties, it poses a challenging computational problem of how to efficiently and accurately search and map genomic subsequences of interest in a pool of nanopore reads (or raw signals). Due to its relatively low sequencing accuracy, there is no reliable solution to this problem, especially at low sequencing coverage.

RESULTS

Here, we propose a brand new signal-based subsequence inquiry pipeline as well as two novel algorithms to tackle this problem. The proposed algorithms follow the principle of subsequence dynamic time warping and directly operate on the electrical current signals, without loss of information in base-calling. Therefore, the proposed algorithms can serve as a tool for sequence inquiry in targeted sequencing. Two novel criteria are offered for the consequent signal quality analysis and data classification. Comprehensive experiments on real-world nanopore datasets show the efficiency and effectiveness of the proposed algorithms. We further demonstrate the potential applications of the proposed algorithms in two typical tasks in nanopore-based targeted sequencing: SNP detection under low sequencing coverage, and haplotype classification under low sequencing accuracy.

AVAILABILITY AND IMPLEMENTATION

The project is accessible at https://github.com/icthrm/cwSDTWnano.git, and the presented bench data is available upon request.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Amato D, Bosco GL, Rizzo R. CORENup: a combination of convolutional and recurrent deep neural networks for nucleosome positioning identification. BMC Bioinformatics 2020;21:326. [PMID: 32938377 PMCID: PMC7493859 DOI: 10.1186/s12859-020-03627-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Accepted: 06/22/2020] [Indexed: 12/14/2022] Open

Urso A, Fiannaca A, La Rosa M, La Paglia L, Lo Bosco G, Rizzo R. BITS2019: the sixteenth annual meeting of the Italian society of bioinformatics. BMC Bioinformatics 2020;21:363. [PMID: 32938383 PMCID: PMC7493178 DOI: 10.1186/s12859-020-03708-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open

Su X, Jing G, Zhang Y, Wu S. Method development for cross-study microbiome data mining: Challenges and opportunities. Comput Struct Biotechnol J 2020;18:2075-2080. [PMID: 32802279 PMCID: PMC7419250 DOI: 10.1016/j.csbj.2020.07.020] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 07/22/2020] [Accepted: 07/24/2020] [Indexed: 01/26/2023] Open

Deep learning model for metagenome fragment classification using spaced k-mers feature extraction. JURNAL TEKNOLOGI DAN SISTEM KOMPUTER 2020. [DOI: 10.14710/jtsiskom.2020.13407] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open

Vu D, Groenewald M, Verkley G. Convolutional neural networks improve fungal classification. Sci Rep 2020;10:12628. [PMID: 32724224 PMCID: PMC7387343 DOI: 10.1038/s41598-020-69245-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Accepted: 07/06/2020] [Indexed: 01/30/2023] Open

Shang J, Sun Y. CHEER: HierarCHical taxonomic classification for viral mEtagEnomic data via deep leaRning. Methods 2020;189:95-103. [PMID: 32454212 PMCID: PMC7255349 DOI: 10.1016/j.ymeth.2020.05.018] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Revised: 05/05/2020] [Accepted: 05/17/2020] [Indexed: 02/07/2023] Open

Yan H, Bombarely A, Li S. DeepTE: a computational method for de novo classification of transposons with convolutional neural network. Bioinformatics 2020;36:4269-4275. [DOI: 10.1093/bioinformatics/btaa519] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Revised: 04/12/2020] [Accepted: 05/12/2020] [Indexed: 01/23/2023] Open

Sperlea T, Muth L, Martin R, Weigel C, Waldminghaus T, Heider D. gammaBOriS: Identification and Taxonomic Classification of Origins of Replication in Gammaproteobacteria using Motif-based Machine Learning. Sci Rep 2020;10:6727. [PMID: 32317695 PMCID: PMC7174414 DOI: 10.1038/s41598-020-63424-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Accepted: 03/31/2020] [Indexed: 01/23/2023] Open

Desai HP, Parameshwaran AP, Sunderraman R, Weeks M. Comparative Study Using Neural Networks for 16S Ribosomal Gene Classification. J Comput Biol 2020. [DOI: 10.1089/cmb.2019.0436] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Kumar H, Park W, Srikanth K, Choi BH, Cho ES, Lee KT, Kim JM, Kim K, Park J, Lim D, Park JE. Comparison of Bacterial Populations in the Ceca of Swine at Two Different Stages and their Functional Annotations. Genes (Basel) 2019;10:E382. [PMID: 31137556 PMCID: PMC6562920 DOI: 10.3390/genes10050382] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Revised: 05/16/2019] [Accepted: 05/16/2019] [Indexed: 12/18/2022] Open

Abstract

The microbial composition in the cecum of pig influences host health, immunity, nutrient digestion, and feeding requirements significantly. Advancements in metagenome sequencing technologies such as 16S rRNAs have made it possible to explore cecum microbial population. In this study, we performed a comparative analysis of cecum microbiota of crossbred Korean native pigs at two different growth stages (stage L = 10 weeks, and stage LD = 26 weeks) using 16S rRNA sequencing technology. Our results revealed remarkable differences in microbial composition, α and β diversity, and differential abundance between the two stages. Phylum composition analysis with respect to SILVA132 database showed Firmicutes to be present at 51.87% and 48.76% in stages L and LD, respectively. Similarly, Bacteroidetes were present at 37.28% and 45.98% in L and LD, respectively. The genera Prevotella, Anaerovibrio, Succinivibrio, Megasphaera were differentially enriched in stage L, whereas Clostridium, Terrisporobacter, Rikenellaceae were enriched in stage LD. Functional annotation of microbiome by level-three KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway analysis revealed that glycine, serine, threonine, valine, leucine, isoleucine arginine, proline, and tryptophan metabolism were differentially enriched in stage L, whereas alanine, aspartate, glutamate, cysteine, methionine, phenylalanine, tyrosine, and tryptophan biosynthesis metabolism were differentially enriched in stage LD. Through machine-learning approaches such as LEfSe (linear discriminant analysis effect size), random forest, and Pearson's correlation, we found pathways such as amino acid metabolism, transport systems, and genetic regulation of metabolism are commonly enriched in both stages. Our findings suggest that the bacterial compositions in cecum content of pigs are heavily involved in their nutrient digestion process. This study may help to meet the demand of human food and can play significant roles in medicinal application.

Collapse

Qu K, Guo F, Liu X, Lin Y, Zou Q. Application of Machine Learning in Microbiology. Front Microbiol 2019;10:827. [PMID: 31057526 PMCID: PMC6482238 DOI: 10.3389/fmicb.2019.00827] [Citation(s) in RCA: 89] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Accepted: 04/01/2019] [Indexed: 02/01/2023] Open

Di Gangi M, Lo Bosco G, Rizzo R. Deep learning architectures for prediction of nucleosome positioning from sequences data. BMC Bioinformatics 2018;19:418. [PMID: 30453896 PMCID: PMC6245688 DOI: 10.1186/s12859-018-2386-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open