1
|
Solymosi N, Tóth AG, Nagy SÁ, Csabai I, Feczkó C, Reibling T, Németh T. Clinical considerations on antimicrobial resistance potential of complex microbiological samples. PeerJ 2025; 13:e18802. [PMID: 39897495 PMCID: PMC11784533 DOI: 10.7717/peerj.18802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Accepted: 12/11/2024] [Indexed: 02/04/2025] Open
Abstract
Antimicrobial resistance (AMR) is one of our greatest public health challenges. Targeted use of antibiotics (ABs) can reduce the occurrence and spread of AMR and boost the effectiveness of treatment. This requires knowledge of the AB susceptibility of the pathogens involved in the disease. Therapeutic recommendations based on classical AB susceptibility testing (AST) are based on the analysis of only a fraction of the bacteria present in the disease process. Next and third generation sequencing technologies allow the identification of antimicrobial resistance genes (ARGs) present in a bacterial community. Using this metagenomic approach, we can map the antimicrobial resistance potential (AMRP) of a complex, multi-bacterial microbial sample. To understand the interpretiveness of AMRP, the concordance between phenotypic AMR properties and ARGs was investigated by analyzing data from 574 Escherichia coli strains of five different studies. The overall results show that for 44% of the studied ABs, phenotypically resistant strains are genotypically associated with a 90% probability of resistance, while for 92% of the ABs, the phenotypically susceptible strains are genotypically susceptible with a 90% probability. ARG detection showed a phenotypic prediction with at least 90% confidence in 67% of ABs. The probability of detecting a phenotypically susceptible strain as resistant based on genotype is below 5% for 92% of ABs. While the probability of detecting a phenotypically resistant strain as susceptible based on genotype is below 5% for 44% of ABs. We can assume that these strain-by-strain concordance results are also true for bacteria in complex microbial samples, and conclude that AMRP obtained from metagenomic ARG analysis can help choose efficient ABs. This is illustrated using AMRP by a canine external otitis sample.
Collapse
Affiliation(s)
- Norbert Solymosi
- Centre for Bioinformatics, University of Veterinary Medicine, Budapest, Hungary
- Department of Physics of Complex Systems, Eötvös Loránd University, Budapest, Hungary
| | - Adrienn Gréta Tóth
- Centre for Bioinformatics, University of Veterinary Medicine, Budapest, Hungary
- Department of Physics of Complex Systems, Eötvös Loránd University, Budapest, Hungary
| | - Sára Ágnes Nagy
- Department of Physics of Complex Systems, Eötvös Loránd University, Budapest, Hungary
| | - István Csabai
- Department of Physics of Complex Systems, Eötvös Loránd University, Budapest, Hungary
| | - Csongor Feczkó
- Centre for Bioinformatics, University of Veterinary Medicine, Budapest, Hungary
| | - Tamás Reibling
- Centre for Bioinformatics, University of Veterinary Medicine, Budapest, Hungary
| | - Tibor Németh
- Department and Clinic of Surgery and Ophthalmology, University of Veterinary Medicine, Budapest, Hungary
| |
Collapse
|
2
|
Liu W, Cen H, Wu Z, Zhou H, Chen S, Yang X, Zhao G, Zhang G. Mycobacteriaceae Phenome Atlas (MPA): A Standardized Atlas for the Mycobacteriaceae Phenome Based on Heterogeneous Sources. PHENOMICS (CHAM, SWITZERLAND) 2023; 3:439-456. [PMID: 37881319 PMCID: PMC10593683 DOI: 10.1007/s43657-023-00101-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Revised: 02/23/2023] [Accepted: 03/03/2023] [Indexed: 10/27/2023]
Abstract
The bacterial family Mycobacteriaceae includes pathogenic and nonpathogenic bacteria, and systematic research on their genome and phenome can give comprehensive perspectives for exploring their disease mechanism. In this study, the phenotypes of Mycobacteriaceae were inferred from available phenomic data, and 82 microbial phenotypic traits were recruited as data elements of the microbial phenome. This Mycobacteriaceae phenome contains five categories and 20 subcategories of polyphasic phenotypes, and three categories and eight subcategories of functional phenotypes, all of which are complementary to the existing data standards of microbial phenotypes. The phenomic data of Mycobacteriaceae strains were compiled by literature mining, third-party database integration, and bioinformatics annotation. The phenotypes were searchable and comparable from the website of the Mycobacteriaceae Phenome Atlas (MPA, https://www.biosino.org/mpa/). A topological data analysis of MPA revealed the co-evolution between Mycobacterium tuberculosis and virulence factors, and uncovered potential pathogenicity-associated phenotypes. Two hundred and sixty potential pathogen-enriched pathways were found by Fisher's exact test. The application of MPA may provide novel insights into the pathogenicity mechanism and antimicrobial targets of Mycobacteriaceae. Supplementary Information The online version contains supplementary material available at 10.1007/s43657-023-00101-5.
Collapse
Affiliation(s)
- Wan Liu
- National Genomics Data Center & Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031 China
| | - Hui Cen
- National Genomics Data Center & Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031 China
| | - Zhile Wu
- National Genomics Data Center & Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031 China
- Shanghai Southgene Technology Co., Ltd., Shanghai, 201210 China
| | - Haokui Zhou
- Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055 China
| | - Shuo Chen
- Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055 China
| | - Xilan Yang
- Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055 China
| | - Guoping Zhao
- National Genomics Data Center & Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031 China
- Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310024 China
| | - Guoqing Zhang
- National Genomics Data Center & Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031 China
| |
Collapse
|
3
|
Abstract
Within the next decade, the genomes of 1.8 million eukaryotic species will be sequenced. Identifying genes in these sequences is essential to understand the biology of the species. This is challenging due to the transcriptional complexity of eukaryotic genomes, which encode hundreds of thousands of transcripts of multiple types. Among these, a small set of protein-coding mRNAs play a disproportionately large role in defining phenotypes. Due to their sequence conservation, orthology can be established, making it possible to define the universal catalog of eukaryotic protein-coding genes. This catalog should substantially contribute to uncovering the genomic events underlying the emergence of eukaryotic phenotypes. This piece briefly reviews the basics of protein-coding gene prediction, discusses challenges in finalizing annotation of the human genome, and proposes strategies for producing annotations across the eukaryotic Tree of Life. This lays the groundwork for obtaining the catalog of all genes-the Earth's code of life.
Collapse
Affiliation(s)
- Roderic Guigó
- Bioinformatics and Genomics, Center for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology (BIST), Dr. Aiguader 88, 08003 Barcelona, Catalonia
- Universitat Pompeu Fabra (UPF), Barcelona, Catalonia
| |
Collapse
|
4
|
Dérozier S, Bossy R, Deléger L, Ba M, Chaix E, Harlé O, Loux V, Falentin H, Nédellec C. Omnicrobe, an open-access database of microbial habitats and phenotypes using a comprehensive text mining and data fusion approach. PLoS One 2023; 18:e0272473. [PMID: 36662691 PMCID: PMC9858090 DOI: 10.1371/journal.pone.0272473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 01/04/2023] [Indexed: 01/21/2023] Open
Abstract
The dramatic increase in the number of microbe descriptions in databases, reports, and papers presents a two-fold challenge for accessing the information: integration of heterogeneous data in a standard ontology-based representation and normalization of the textual descriptions by semantic analysis. Recent text mining methods offer powerful ways to extract textual information and generate ontology-based representation. This paper describes the design of the Omnicrobe application that gathers comprehensive information on habitats, phenotypes, and usages of microbes from scientific sources of high interest to the microbiology community. The Omnicrobe database contains around 1 million descriptions of microbe properties. These descriptions are created by analyzing and combining six information sources of various kinds, i.e. biological resource catalogs, sequence databases and scientific literature. The microbe properties are indexed by the Ontobiotope ontology and their taxa are indexed by an extended version of the taxonomy maintained by the National Center for Biotechnology Information. The Omnicrobe application covers all domains of microbiology. With simple or rich ontology-based queries, it provides easy-to-use support in the resolution of scientific questions related to the habitats, phenotypes, and uses of microbes. We illustrate the potential of Omnicrobe with a use case from the food innovation domain.
Collapse
Affiliation(s)
- Sandra Dérozier
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
| | - Robert Bossy
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
| | - Louise Deléger
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
| | - Mouhamadou Ba
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
- Université Paris-Saclay, INRAE, BioinfOmics, MIGALE Bioinformatics Facility, Jouy-en-Josas, France
| | - Estelle Chaix
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
| | | | - Valentin Loux
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
- Université Paris-Saclay, INRAE, BioinfOmics, MIGALE Bioinformatics Facility, Jouy-en-Josas, France
| | | | - Claire Nédellec
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
| |
Collapse
|
5
|
Vitali F, Zinno P, Schifano E, Gori A, Costa A, De Filippo C, Koroušić Seljak B, Panov P, Devirgiliis C, Cavalieri D. Semantics of Dairy Fermented Foods: A Microbiologist’s Perspective. Foods 2022; 11:foods11131939. [PMID: 35804753 PMCID: PMC9265904 DOI: 10.3390/foods11131939] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 06/17/2022] [Accepted: 06/26/2022] [Indexed: 01/05/2023] Open
Abstract
Food ontologies are acquiring a central role in human nutrition, providing a standardized terminology for a proper description of intervention and observational trials. In addition to bioactive molecules, several fermented foods, particularly dairy products, provide the host with live microorganisms, thus carrying potential “genetic/functional” nutrients. To date, a proper ontology to structure and formalize the concepts used to describe fermented foods is lacking. Here we describe a semantic representation of concepts revolving around what consuming fermented foods entails, both from a technological and health point of view, focusing actions on kefir and Parmigiano Reggiano, as representatives of fresh and ripened dairy products. We included concepts related to the connection of specific microbial taxa to the dairy fermentation process, demonstrating the potential of ontologies to formalize the various gene pathways involved in raw ingredient transformation, connect them to resulting metabolites, and finally to their consequences on the fermented product, including technological, health and sensory aspects. Our work marks an improvement in the ambition of creating a harmonized semantic model for integrating different aspects of modern nutritional science. Such a model, besides formalizing a multifaceted knowledge, will be pivotal for a rich annotation of data in public repositories, as a prerequisite to generalized meta-analysis.
Collapse
Affiliation(s)
- Francesco Vitali
- Institute of Agricultural Biology and Biotechnology (IBBA), National Research Council (CNR), Via Moruzzi 1, 56124 Pisa, Italy; (F.V.); (C.D.F.)
- Research Centre for Agriculture and Environment, CREA (Consiglio per la Ricerca in Agricoltura e l’Analisi dell’Economia Agraria), Via di Lanciola 12/A, 50125 Florence, Italy
| | - Paola Zinno
- Research Centre for Food and Nutrition, CREA (Consiglio per la Ricerca in Agricoltura e l’Analisi dell’Economia Agraria), Via Ardeatina 546, 00178 Rome, Italy; (P.Z.); (E.S.)
| | - Emily Schifano
- Research Centre for Food and Nutrition, CREA (Consiglio per la Ricerca in Agricoltura e l’Analisi dell’Economia Agraria), Via Ardeatina 546, 00178 Rome, Italy; (P.Z.); (E.S.)
| | - Agnese Gori
- Department of Biology, University of Florence, Via Madonna del Piano 6, 50019 Sesto Fiorentino, Italy; (A.G.); (A.C.)
| | - Ana Costa
- Department of Biology, University of Florence, Via Madonna del Piano 6, 50019 Sesto Fiorentino, Italy; (A.G.); (A.C.)
| | - Carlotta De Filippo
- Institute of Agricultural Biology and Biotechnology (IBBA), National Research Council (CNR), Via Moruzzi 1, 56124 Pisa, Italy; (F.V.); (C.D.F.)
| | - Barbara Koroušić Seljak
- Computer Systems Department, Jozef Stefan Institute, Jamova Cesta 39, 1000 Ljubljana, Slovenia;
| | - Panče Panov
- Department of Knowledge Technologies, Jozef Stefan Institute, Jamova Cesta 39, 1000 Ljubljana, Slovenia;
| | - Chiara Devirgiliis
- Research Centre for Food and Nutrition, CREA (Consiglio per la Ricerca in Agricoltura e l’Analisi dell’Economia Agraria), Via Ardeatina 546, 00178 Rome, Italy; (P.Z.); (E.S.)
- Correspondence: (C.D.); (D.C.)
| | - Duccio Cavalieri
- Department of Biology, University of Florence, Via Madonna del Piano 6, 50019 Sesto Fiorentino, Italy; (A.G.); (A.C.)
- Correspondence: (C.D.); (D.C.)
| |
Collapse
|
6
|
Djemiel C, Maron PA, Terrat S, Dequiedt S, Cottin A, Ranjard L. Inferring microbiota functions from taxonomic genes: a review. Gigascience 2022; 11:giab090. [PMID: 35022702 PMCID: PMC8756179 DOI: 10.1093/gigascience/giab090] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Revised: 12/02/2021] [Accepted: 12/02/2021] [Indexed: 12/13/2022] Open
Abstract
Deciphering microbiota functions is crucial to predict ecosystem sustainability in response to global change. High-throughput sequencing at the individual or community level has revolutionized our understanding of microbial ecology, leading to the big data era and improving our ability to link microbial diversity with microbial functions. Recent advances in bioinformatics have been key for developing functional prediction tools based on DNA metabarcoding data and using taxonomic gene information. This cheaper approach in every aspect serves as an alternative to shotgun sequencing. Although these tools are increasingly used by ecologists, an objective evaluation of their modularity, portability, and robustness is lacking. Here, we reviewed 100 scientific papers on functional inference and ecological trait assignment to rank the advantages, specificities, and drawbacks of these tools, using a scientific benchmarking. To date, inference tools have been mainly devoted to bacterial functions, and ecological trait assignment tools, to fungal functions. A major limitation is the lack of reference genomes-compared with the human microbiota-especially for complex ecosystems such as soils. Finally, we explore applied research prospects. These tools are promising and already provide relevant information on ecosystem functioning, but standardized indicators and corresponding repositories are still lacking that would enable them to be used for operational diagnosis.
Collapse
Affiliation(s)
- Christophe Djemiel
- Agroécologie, AgroSup Dijon, INRAE, Université de Bourgogne, Université de Bourgogne Franche-Comté, F-21000 Dijon, France
| | - Pierre-Alain Maron
- Agroécologie, AgroSup Dijon, INRAE, Université de Bourgogne, Université de Bourgogne Franche-Comté, F-21000 Dijon, France
| | - Sébastien Terrat
- Agroécologie, AgroSup Dijon, INRAE, Université de Bourgogne, Université de Bourgogne Franche-Comté, F-21000 Dijon, France
| | - Samuel Dequiedt
- Agroécologie, AgroSup Dijon, INRAE, Université de Bourgogne, Université de Bourgogne Franche-Comté, F-21000 Dijon, France
| | - Aurélien Cottin
- Agroécologie, AgroSup Dijon, INRAE, Université de Bourgogne, Université de Bourgogne Franche-Comté, F-21000 Dijon, France
| | - Lionel Ranjard
- Agroécologie, AgroSup Dijon, INRAE, Université de Bourgogne, Université de Bourgogne Franche-Comté, F-21000 Dijon, France
| |
Collapse
|
7
|
Buckley SJ, Harvey RJ. Lessons Learnt From Using the Machine Learning Random Forest Algorithm to Predict Virulence in Streptococcus pyogenes. Front Cell Infect Microbiol 2022; 11:809560. [PMID: 35004362 PMCID: PMC8739889 DOI: 10.3389/fcimb.2021.809560] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 12/13/2021] [Indexed: 11/13/2022] Open
Abstract
Group A Streptococcus is a globally significant human pathogen. The extensive variability of the GAS genome, virulence phenotypes and clinical outcomes, render it an excellent candidate for the application of genotype-phenotype association studies in the era of whole-genome sequencing. We have catalogued the distribution and diversity of the transcription regulators of GAS, and employed phylogenetics, concordance metrics and machine learning (ML) to test for associations. In this review, we communicate the lessons learnt in the context of the recent bacteria genotype-phenotype association studies of others that have utilised both genome-wide association studies (GWAS) and ML. We envisage a promising future for the application GWAS in bacteria genotype-phenotype association studies and foresee the increasing use of ML. However, progress in this field is hindered by several outstanding bottlenecks. These include the shortcomings that are observed when GWAS techniques that have been fine-tuned on human genomes, are applied to bacterial genomes. Furthermore, there is a deficit of easy-to-use end-to-end workflows, and a lag in the collection of detailed phenotype and clinical genomic metadata. We propose a novel quality control protocol for the collection of high-quality GAS virulence phenotype coupled to clinical outcome data. Finally, we incorporate this protocol into a workflow for testing genotype-phenotype associations using ML and ‘linked’ patient-microbe genome sets that better represent the infection event.
Collapse
Affiliation(s)
- Sean J Buckley
- School of Health and Behavioural Sciences, University of the Sunshine Coast, Maroochydore DC, QLD, Australia
| | - Robert J Harvey
- School of Health and Behavioural Sciences, University of the Sunshine Coast, Maroochydore DC, QLD, Australia.,Sunshine Coast Health Institute, Birtinya, QLD, Australia
| |
Collapse
|
8
|
Nadendla S, Jackson R, Munro J, Quaglia F, Mészáros B, Olley D, Hobbs ET, Goralski SM, Chibucos M, Mungall CJ, Tosatto SCE, Erill I, Giglio MG. ECO: the Evidence and Conclusion Ontology, an update for 2022. Nucleic Acids Res 2022; 50:D1515-D1521. [PMID: 34986598 PMCID: PMC8728134 DOI: 10.1093/nar/gkab1025] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/12/2021] [Accepted: 10/18/2021] [Indexed: 11/12/2022] Open
Abstract
The Evidence and Conclusion Ontology (ECO) is a community resource that provides an ontology of terms used to capture the type of evidence that supports biomedical annotations and assertions. Consistent capture of evidence information with ECO allows tracking of annotation provenance, establishment of quality control measures, and evidence-based data mining. ECO is in use by dozens of data repositories and resources with both specific and general areas of focus. ECO is continually being expanded and enhanced in response to user requests as well as our aim to adhere to community best-practices for ontology development. The ECO support team engages in multiple collaborations with other ontologies and annotating groups. Here we report on recent updates to the ECO ontology itself as well as associated resources that are available through this project. ECO project products are freely available for download from the project website (https://evidenceontology.org/) and GitHub (https://github.com/evidenceontology/evidenceontology). ECO is released into the public domain under a CC0 1.0 Universal license.
Collapse
Affiliation(s)
- Suvarna Nadendla
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Rebecca Jackson
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - James Munro
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Federica Quaglia
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR-IBIOM), Bari, Italy.,Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Bálint Mészáros
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg 69117, Germany
| | - Dustin Olley
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Elizabeth T Hobbs
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, Maryland, United States
| | - Stephen M Goralski
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, Maryland, United States
| | - Marcus Chibucos
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Christopher John Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Lab, Berkeley, California, USA
| | | | - Ivan Erill
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, Maryland, United States
| | - Michelle G Giglio
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA
| |
Collapse
|
9
|
Gupta G, Ndiaye A, Filteau M. Leveraging Experimental Strategies to Capture Different Dimensions of Microbial Interactions. Front Microbiol 2021; 12:700752. [PMID: 34646243 PMCID: PMC8503676 DOI: 10.3389/fmicb.2021.700752] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Accepted: 08/31/2021] [Indexed: 12/27/2022] Open
Abstract
Microorganisms are a fundamental part of virtually every ecosystem on earth. Understanding how collectively they interact, assemble, and function as communities has become a prevalent topic both in fundamental and applied research. Owing to multiple advances in technology, answering questions at the microbial system or network level is now within our grasp. To map and characterize microbial interaction networks, numerous computational approaches have been developed; however, experimentally validating microbial interactions is no trivial task. Microbial interactions are context-dependent, and their complex nature can result in an array of outcomes, not only in terms of fitness or growth, but also in other relevant functions and phenotypes. Thus, approaches to experimentally capture microbial interactions involve a combination of culture methods and phenotypic or functional characterization methods. Here, through our perspective of food microbiologists, we highlight the breadth of innovative and promising experimental strategies for their potential to capture the different dimensions of microbial interactions and their high-throughput application to answer the question; are microbial interaction patterns or network architecture similar along different contextual scales? We further discuss the experimental approaches used to build various types of networks and study their architecture in the context of cell biology and how they translate at the level of microbial ecosystem.
Collapse
Affiliation(s)
- Gunjan Gupta
- Département des Sciences des aliments, Université Laval, Québec, QC, Canada
- Institut sur la Nutrition et les Aliments Fonctionnels (INAF), Québec, QC, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
| | - Amadou Ndiaye
- Département des Sciences des aliments, Université Laval, Québec, QC, Canada
- Institut sur la Nutrition et les Aliments Fonctionnels (INAF), Québec, QC, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
| | - Marie Filteau
- Département des Sciences des aliments, Université Laval, Québec, QC, Canada
- Institut sur la Nutrition et les Aliments Fonctionnels (INAF), Québec, QC, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
| |
Collapse
|
10
|
Hobbs ET, Goralski SM, Mitchell A, Simpson A, Leka D, Kotey E, Sekira M, Munro JB, Nadendla S, Jackson R, Gonzalez-Aguirre A, Krallinger M, Giglio M, Erill I. ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts. Front Res Metr Anal 2021; 6:674205. [PMID: 34327299 PMCID: PMC8313968 DOI: 10.3389/frma.2021.674205] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Accepted: 06/28/2021] [Indexed: 11/20/2022] Open
Abstract
Analysis of high-throughput experiments in the life sciences frequently relies upon standardized information about genes, gene products, and other biological entities. To provide this information, expert curators are increasingly relying on text mining tools to identify, extract and harmonize statements from biomedical journal articles that discuss findings of interest. For determining reliability of the statements, curators need the evidence used by the authors to support their assertions. It is important to annotate the evidence directly used by authors to qualify their findings rather than simply annotating mentions of experimental methods without the context of what findings they support. Text mining tools require tuning and adaptation to achieve accurate performance. Many annotated corpora exist to enable developing and tuning text mining tools; however, none currently provides annotations of evidence based on the extensive and widely used Evidence and Conclusion Ontology. We present the ECO-CollecTF corpus, a novel, freely available, biomedical corpus of 84 documents that captures high-quality, evidence-based statements annotated with the Evidence and Conclusion Ontology.
Collapse
Affiliation(s)
- Elizabeth T Hobbs
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United States
| | - Stephen M Goralski
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United States
| | - Ashley Mitchell
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United States
| | - Andrew Simpson
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United States
| | - Dorjan Leka
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United States
| | - Emmanuel Kotey
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United States
| | - Matt Sekira
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United States
| | - James B Munro
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, United States
| | - Suvarna Nadendla
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, United States
| | - Rebecca Jackson
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, United States
| | | | - Martin Krallinger
- Barcelona Supercomputing Center (BSC), Barcelona, Spain.,Centro Nacional de Investigaciones Oncológicas (CNIO), Madrid, Spain
| | - Michelle Giglio
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, United States
| | - Ivan Erill
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United States
| |
Collapse
|
11
|
Wu PIF, Ross C, Siegele DA, Hu JC. Insights from the reanalysis of high-throughput chemical genomics data for Escherichia coli K-12. G3-GENES GENOMES GENETICS 2021; 11:6044125. [PMID: 33561236 PMCID: PMC8022724 DOI: 10.1093/g3journal/jkaa035] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 11/11/2020] [Indexed: 11/14/2022]
Abstract
Despite the demonstrated success of genome-wide genetic screens and chemical genomics studies at predicting functions for genes of unknown function or predicting new functions for well-characterized genes, their potential to provide insights into gene function has not been fully explored. We systematically reanalyzed a published high-throughput phenotypic dataset for the model Gram-negative bacterium Escherichia coli K-12. The availability of high-quality annotation sets allowed us to compare the power of different metrics for measuring phenotypic profile similarity to correctly infer gene function. We conclude that there is no single best method; the three metrics tested gave comparable results for most gene pairs. We also assessed how converting quantitative phenotypes to discrete, qualitative phenotypes affected the association between phenotype and function. Our results indicate that this approach may allow phenotypic data from different studies to be combined to produce a larger dataset that may reveal functional connections between genes not detected in individual studies.
Collapse
Affiliation(s)
- Peter I-Fan Wu
- Department of Biochemistry and Biophysics, Texas A&M University and Texas Agrilife Research, College Station, TX 77843-2128, USA
| | - Curtis Ross
- Department of Biochemistry and Biophysics, Texas A&M University and Texas Agrilife Research, College Station, TX 77843-2128, USA
| | - Deborah A Siegele
- Department of Biology, Texas A&M University, College Station, TX 77843-3258, USA
| | - James C Hu
- Department of Biochemistry and Biophysics, Texas A&M University and Texas Agrilife Research, College Station, TX 77843-2128, USA
| |
Collapse
|
12
|
Giglio M, Tauber R, Nadendla S, Munro J, Olley D, Ball S, Mitraka E, Schriml LM, Gaudet P, Hobbs ET, Erill I, Siegele DA, Hu JC, Mungall C, Chibucos MC. ECO, the Evidence & Conclusion Ontology: community standard for evidence information. Nucleic Acids Res 2020; 47:D1186-D1194. [PMID: 30407590 PMCID: PMC6323956 DOI: 10.1093/nar/gky1036] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 10/16/2018] [Indexed: 12/03/2022] Open
Abstract
The Evidence and Conclusion Ontology (ECO) contains terms (classes) that describe types of evidence and assertion methods. ECO terms are used in the process of biocuration to capture the evidence that supports biological assertions (e.g. gene product X has function Y as supported by evidence Z). Capture of this information allows tracking of annotation provenance, establishment of quality control measures and query of evidence. ECO contains over 1500 terms and is in use by many leading biological resources including the Gene Ontology, UniProt and several model organism databases. ECO is continually being expanded and revised based on the needs of the biocuration community. The ontology is freely available for download from GitHub (https://github.com/evidenceontology/) or the project’s website (http://evidenceontology.org/). Users can request new terms or changes to existing terms through the project’s GitHub site. ECO is released into the public domain under CC0 1.0 Universal.
Collapse
Affiliation(s)
- Michelle Giglio
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Rebecca Tauber
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Suvarna Nadendla
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - James Munro
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Dustin Olley
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Shoshannah Ball
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Elvira Mitraka
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Lynn M Schriml
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Pascale Gaudet
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Elizabeth T Hobbs
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD 21250, USA
| | - Ivan Erill
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD 21250, USA
| | - Deborah A Siegele
- Department of Biology, Texas A&M University, College Station, TX 77840, USA
| | - James C Hu
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, TX 77840, USA
| | - Chris Mungall
- Molecular Ecosystems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Marcus C Chibucos
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| |
Collapse
|
13
|
Bartley BA, Beal J, Karr JR, Strychalski EA. Organizing genome engineering for the gigabase scale. Nat Commun 2020; 11:689. [PMID: 32019919 PMCID: PMC7000699 DOI: 10.1038/s41467-020-14314-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2019] [Accepted: 12/18/2019] [Indexed: 12/11/2022] Open
Abstract
Genome-scale engineering holds great potential to impact science, industry, medicine, and society, and recent improvements in DNA synthesis have enabled the manipulation of megabase genomes. However, coordinating and integrating the workflows and large teams necessary for gigabase genome engineering remains a considerable challenge. We examine this issue and recommend a path forward by: 1) adopting and extending existing representations for designs, assembly plans, samples, data, and workflows; 2) developing new technologies for data curation and quality control; 3) conducting fundamental research on genome-scale modeling and design; and 4) developing new legal and contractual infrastructure to facilitate collaboration.
Collapse
Affiliation(s)
| | - Jacob Beal
- Raytheon BBN Technologies, Cambridge, MA, 02138, USA.
| | - Jonathan R Karr
- Icahn Institute and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10128, USA
| | | |
Collapse
|
14
|
San JE, Baichoo S, Kanzi A, Moosa Y, Lessells R, Fonseca V, Mogaka J, Power R, de Oliveira T. Current Affairs of Microbial Genome-Wide Association Studies: Approaches, Bottlenecks and Analytical Pitfalls. Front Microbiol 2020; 10:3119. [PMID: 32082269 PMCID: PMC7002396 DOI: 10.3389/fmicb.2019.03119] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Accepted: 12/24/2019] [Indexed: 12/12/2022] Open
Abstract
Microbial genome-wide association studies (mGWAS) are a new and exciting research field that is adapting human GWAS methods to understand how variations in microbial genomes affect host or pathogen phenotypes, such as drug resistance, virulence, host specificity and prognosis. Several computational tools and methods have been developed or adapted from human GWAS to facilitate the discovery of novel mutations and structural variations that are associated with the phenotypes of interest. However, no comprehensive, end-to-end, user-friendly tool is currently available. The development of a broadly applicable pipeline presents a real opportunity among computational biologists. Here, (i) we review the prominent and promising tools, (ii) discuss analytical pitfalls and bottlenecks in mGWAS, (iii) provide insights into the selection of appropriate tools, (iv) highlight the gaps that still need to be filled and how users and developers can work together to overcome these bottlenecks. Use of mGWAS research can inform drug repositioning decisions as well as accelerate the discovery and development of more effective vaccines and antimicrobials for pressing infectious diseases of global health significance, such as HIV, TB, influenza, and malaria.
Collapse
Affiliation(s)
- James Emmanuel San
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Shakuntala Baichoo
- Department of Digital Technologies, FoICDT, University of Mauritius, Réduit, Mauritius
| | - Aquillah Kanzi
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Yumna Moosa
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Richard Lessells
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Vagner Fonseca
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
- Laboratório de Genética Celular e Molecular, ICB, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - John Mogaka
- Discipline of Public Health, University of Kwazulu-Natal, Durban, South Africa
| | - Robert Power
- St Edmund Hall, Oxford University, Oxford, United Kingdom
| | - Tulio de Oliveira
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
- Department of Global Health, University of Washington, Seattle, WA, United States
| |
Collapse
|
15
|
He Y, Wang H, Zheng J, Beiting DP, Masci AM, Yu H, Liu K, Wu J, Curtis JL, Smith B, Alekseyenko AV, Obeid JS. OHMI: the ontology of host-microbiome interactions. J Biomed Semantics 2019; 10:25. [PMID: 31888755 PMCID: PMC6937947 DOI: 10.1186/s13326-019-0217-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2019] [Accepted: 12/04/2019] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Host-microbiome interactions (HMIs) are critical for the modulation of biological processes and are associated with several diseases. Extensive HMI studies have generated large amounts of data. We propose that the logical representation of the knowledge derived from these data and the standardized representation of experimental variables and processes can foster integration of data and reproducibility of experiments and thereby further HMI knowledge discovery. METHODS Through a multi-institutional collaboration, a community-based Ontology of Host-Microbiome Interactions (OHMI) was developed following the Open Biological/Biomedical Ontologies (OBO) Foundry principles. As an OBO library ontology, OHMI leverages established ontologies to create logically structured representations of (1) microbiomes, microbial taxonomy, host species, host anatomical entities, and HMIs under different conditions and (2) associated study protocols and types of data analysis and experimental results. RESULTS Aligned with the Basic Formal Ontology, OHMI comprises over 1000 terms, including terms imported from more than 10 existing ontologies together with some 500 OHMI-specific terms. A specific OHMI design pattern was generated to represent typical host-microbiome interaction studies. As one major OHMI use case, drawing on data from over 50 peer-reviewed publications, we identified over 100 bacteria and fungi from the gut, oral cavity, skin, and airway that are associated with six rheumatic diseases including rheumatoid arthritis. Our ontological study identified new high-level microbiota taxonomical structures. Two microbiome-related competency questions were also designed and addressed. We were also able to use OHMI to represent statistically significant results identified from a large existing microbiome database data analysis. CONCLUSION OHMI represents entities and relations in the domain of HMIs. It supports shared knowledge representation, data and metadata standardization and integration, and can be used in formulation of advanced queries for purposes of data analysis.
Collapse
Affiliation(s)
- Yongqun He
- University of Michigan Medical School, Ann Arbor, MI 48109 USA
| | - Haihe Wang
- University of Michigan Medical School, Ann Arbor, MI 48109 USA
- Daqing Branch of Harbin Medical University, Daqing, 163319 Heilongjiang China
| | - Jie Zheng
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104 USA
| | - Daniel P. Beiting
- University of Pennsylvania School of Veterinary Medicine, Philadelphia, PA 19104 USA
| | - Anna Maria Masci
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710 USA
| | - Hong Yu
- University of Michigan Medical School, Ann Arbor, MI 48109 USA
- People’s Hospital of Guizhou Province, Guiyang, 550025 Guizhou China
| | - Kaiyong Liu
- School of Public Health, Anhui Medical University, No 81 Meishan Road, Hefei, 230032 Anhui China
| | - Jianmin Wu
- Center for Cancer Bioinformatics, Peking University Cancer Hospital & Institute, Beijing, 100142 China
| | - Jeffrey L. Curtis
- University of Michigan Medical School, Ann Arbor, MI 48109 USA
- Pulmonary & Critical Care Medicine Section, Medical Service, VA Ann Arbor Healthcare System, Ann Arbor, MI 48105 USA
| | - Barry Smith
- University at Buffalo, Buffalo, NY 14260 USA
| | - Alexander V. Alekseyenko
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC 29425 USA
| | - Jihad S. Obeid
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC 29425 USA
| |
Collapse
|
16
|
Siegele DA, LaBonte SA, Wu PIF, Chibucos MC, Nandendla S, Giglio MG, Hu JC. Phenotype annotation with the ontology of microbial phenotypes (OMP). J Biomed Semantics 2019; 10:13. [PMID: 31307550 PMCID: PMC6631659 DOI: 10.1186/s13326-019-0205-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2018] [Accepted: 06/19/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Microbial genetics has formed a foundation for understanding many aspects of biology. Systematic annotation that supports computational data mining should reveal further insights for microbes, microbiomes, and conserved functions beyond microbes. The Ontology of Microbial Phenotypes (OMP) was created to support such annotation. RESULTS We define standards for an OMP-based annotation framework that supports the capture of a variety of phenotypes and provides flexibility for different levels of detail based on a combination of pre- and post-composition using OMP and other Open Biomedical Ontology (OBO) projects. A system for entering and viewing OMP annotations has been added to our online, public, web-based data portal. CONCLUSIONS The annotation framework described here is ready to support projects to capture phenotypes from the experimental literature for a variety of microbes. Defining the OMP annotation standard should support the development of new software tools for data mining and analysis in comparative phenomics.
Collapse
Affiliation(s)
- Deborah A Siegele
- Department of Biology, Texas A&M University, College Station, TX, USA
| | - Sandra A LaBonte
- Department of Biochemistry and Biophysics, Texas A&M University and Texas AgriLife Research, College Station, TX, USA
| | - Peter I-Fan Wu
- Department of Biochemistry and Biophysics, Texas A&M University and Texas AgriLife Research, College Station, TX, USA
| | - Marcus C Chibucos
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Suvarna Nandendla
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Michelle G Giglio
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - James C Hu
- Department of Biochemistry and Biophysics, Texas A&M University and Texas AgriLife Research, College Station, TX, USA.
| |
Collapse
|
17
|
Lajoie G, Kembel SW. Making the Most of Trait-Based Approaches for Microbial Ecology. Trends Microbiol 2019; 27:814-823. [PMID: 31296406 DOI: 10.1016/j.tim.2019.06.003] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 06/12/2019] [Accepted: 06/13/2019] [Indexed: 12/13/2022]
Abstract
There is an increasing interest in applying trait-based approaches to microbial ecology, but the question of how and why to do it is still lagging behind. By anchoring our discussion of these questions in a framework derived from epistemology, we broaden the scope of trait-based approaches to microbial ecology from one oriented mostly around explanation towards one inclusive of the predictive and integrative potential of these approaches. We use case studies from macro-organismal ecology to concretely show how these goals for knowledge development can be fulfilled and propose clear directions, adapted to the biological reality of microbes, to make the most of recent advancements in the measurement of microbial phenotypes and traits.
Collapse
Affiliation(s)
- Geneviève Lajoie
- Département des Sciences Biologiques, Université du Québec à Montréal, 141 Avenue du Président-Kennedy, Montréal, Canada, H2X 1Y4.
| | - Steven W Kembel
- Département des Sciences Biologiques, Université du Québec à Montréal, 141 Avenue du Président-Kennedy, Montréal, Canada, H2X 1Y4
| |
Collapse
|
18
|
Weissman JL, Laljani RMR, Fagan WF, Johnson PLF. Visualization and prediction of CRISPR incidence in microbial trait-space to identify drivers of antiviral immune strategy. ISME JOURNAL 2019; 13:2589-2602. [PMID: 31239539 PMCID: PMC6776019 DOI: 10.1038/s41396-019-0411-2] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Revised: 03/15/2019] [Accepted: 03/24/2019] [Indexed: 01/21/2023]
Abstract
Bacteria and archaea are locked in a near-constant battle with their viral pathogens. Despite previous mechanistic characterization of numerous prokaryotic defense strategies, the underlying ecological drivers of different strategies remain largely unknown and predicting which species will take which strategies remains a challenge. Here, we focus on the CRISPR immune strategy and develop a phylogenetically-corrected machine learning approach to build a predictive model of CRISPR incidence using data on over 100 traits across over 2600 species. We discover a strong but hitherto-unknown negative interaction between CRISPR and aerobicity, which we hypothesize may result from interference between CRISPR-associated proteins and non-homologous end-joining DNA repair due to oxidative stress. Our predictive model also quantitatively confirms previous observations of an association between CRISPR and temperature. Finally, we contrast the environmental associations of different CRISPR system types (I, II, III) and restriction modification systems, all of which act as intracellular immune systems.
Collapse
Affiliation(s)
- Jake L Weissman
- Department of Biology, University of Maryland, College Park, MD, USA
| | - Rohan M R Laljani
- Department of Biology, University of Maryland, College Park, MD, USA
| | - William F Fagan
- Department of Biology, University of Maryland, College Park, MD, USA
| | | |
Collapse
|
19
|
Röttjers L, Faust K. From hairballs to hypotheses-biological insights from microbial networks. FEMS Microbiol Rev 2018; 42:761-780. [PMID: 30085090 PMCID: PMC6199531 DOI: 10.1093/femsre/fuy030] [Citation(s) in RCA: 297] [Impact Index Per Article: 42.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 07/24/2018] [Indexed: 12/19/2022] Open
Abstract
Microbial networks are an increasingly popular tool to investigate microbial community structure, as they integrate multiple types of information and may represent systems-level behaviour. Interpreting these networks is not straightforward, and the biological implications of network properties are unclear. Analysis of microbial networks allows researchers to predict hub species and species interactions. Additionally, such analyses can help identify alternative community states and niches. Here, we review factors that can result in spurious predictions and address emergent properties that may be meaningful in the context of the microbiome. We also give an overview of studies that analyse microbial networks to identify new hypotheses. Moreover, we show in a simulation how network properties are affected by tool choice and environmental factors. For example, hub species are not consistent across tools, and environmental heterogeneity induces modularity. We highlight the need for robust microbial network inference and suggest strategies to infer networks more reliably.
Collapse
Affiliation(s)
- Lisa Röttjers
- KU Leuven, Department of Microbiology and Immunology, Rega Institute, Laboratory of Molecular Bacteriology, Leuven, Belgium
| | - Karoline Faust
- KU Leuven, Department of Microbiology and Immunology, Rega Institute, Laboratory of Molecular Bacteriology, Leuven, Belgium
| |
Collapse
|
20
|
Gkoutos GV, Schofield PN, Hoehndorf R. The anatomy of phenotype ontologies: principles, properties and applications. Brief Bioinform 2018; 19:1008-1021. [PMID: 28387809 PMCID: PMC6169674 DOI: 10.1093/bib/bbx035] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2017] [Revised: 02/05/2017] [Indexed: 12/14/2022] Open
Abstract
The past decade has seen an explosion in the collection of genotype data in domains as diverse as medicine, ecology, livestock and plant breeding. Along with this comes the challenge of dealing with the related phenotype data, which is not only large but also highly multidimensional. Computational analysis of phenotypes has therefore become critical for our ability to understand the biological meaning of genomic data in the biological sciences. At the heart of computational phenotype analysis are the phenotype ontologies. A large number of these ontologies have been developed across many domains, and we are now at a point where the knowledge captured in the structure of these ontologies can be used for the integration and analysis of large interrelated data sets. The Phenotype And Trait Ontology framework provides a method for formal definitions of phenotypes and associated data sets and has proved to be key to our ability to develop methods for the integration and analysis of phenotype data. Here, we describe the development and products of the ontological approach to phenotype capture, the formal content of phenotype ontologies and how their content can be used computationally.
Collapse
Affiliation(s)
| | | | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, King Abdullah University of Science and Technology, Thuwal
| |
Collapse
|
21
|
Chaix E, Deléger L, Bossy R, Nédellec C. Text mining tools for extracting information about microbial biodiversity in food. Food Microbiol 2018; 81:63-75. [PMID: 30910089 PMCID: PMC6460834 DOI: 10.1016/j.fm.2018.04.011] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Revised: 03/26/2018] [Accepted: 04/17/2018] [Indexed: 12/20/2022]
Abstract
Information on food microbial diversity is scattered across millions of scientific papers. Researchers need tools to assist their bibliographic search in such large collections. Text mining and knowledge engineering methods are useful to automatically and efficiently find relevant information in Life Science. This work describes how the Alvis text mining platform has been applied to a large collection of PubMed abstracts of scientific papers in the food microbiology domain. The information targeted by our work is microorganisms, their habitats and phenotypes. Two knowledge resources, the NCBI taxonomy and the OntoBiotope ontology were used to detect this information in texts. The result of the text mining process was indexed and is presented through the AlvisIR Food on-line semantic search engine. In this paper, we also show through two illustrative examples the great potential of this new tool to assist in studies on ecological diversity and the origin of microbial presence in food. We present new text-mining tools to extract information in food microbiology. The results of the extraction are available in an on-line semantic search engine. Taxa, habitats, phenotypes and links between them can be queried in PubMed abstracts. Text-mining tools could assist to browse past and recent scientific literature. Two use-cases are presented: fruit microbiota and spore-forming bacteria in food.
Collapse
Affiliation(s)
- Estelle Chaix
- MaIAGE, INRA, Université Paris-Saclay, 78350 Jouy-en-Josas, France.
| | - Louise Deléger
- MaIAGE, INRA, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - Robert Bossy
- MaIAGE, INRA, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - Claire Nédellec
- MaIAGE, INRA, Université Paris-Saclay, 78350 Jouy-en-Josas, France.
| |
Collapse
|
22
|
Henry VJ, Goelzer A, Ferré A, Fischer S, Dinh M, Loux V, Froidevaux C, Fromion V. The bacterial interlocked process ONtology (BiPON): a systemic multi-scale unified representation of biological processes in prokaryotes. J Biomed Semantics 2017; 8:53. [PMID: 29169408 PMCID: PMC5701433 DOI: 10.1186/s13326-017-0165-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2016] [Accepted: 11/10/2017] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND High-throughput technologies produce huge amounts of heterogeneous biological data at all cellular levels. Structuring these data together with biological knowledge is a critical issue in biology and requires integrative tools and methods such as bio-ontologies to extract and share valuable information. In parallel, the development of recent whole-cell models using a systemic cell description opened alternatives for data integration. Integrating a systemic cell description within a bio-ontology would help to progress in whole-cell data integration and modeling synergistically. RESULTS We present BiPON, an ontology integrating a multi-scale systemic representation of bacterial cellular processes. BiPON consists in of two sub-ontologies, bioBiPON and modelBiPON. bioBiPON organizes the systemic description of biological information while modelBiPON describes the mathematical models (including parameters) associated with biological processes. bioBiPON and modelBiPON are related using bridge rules on classes during automatic reasoning. Biological processes are thus automatically related to mathematical models. 37% of BiPON classes stem from different well-established bio-ontologies, while the others have been manually defined and curated. Currently, BiPON integrates the main processes involved in bacterial gene expression processes. CONCLUSIONS BiPON is a proof of concept of the way to combine formally systems biology and bio-ontology. The knowledge formalization is highly flexible and generic. Most of the known cellular processes, new participants or new mathematical models could be inserted in BiPON. Altogether, BiPON opens up promising perspectives for knowledge integration and sharing and can be used by biologists, systems and computational biologists, and the emerging community of whole-cell modeling.
Collapse
Affiliation(s)
- Vincent J. Henry
- Laboratoire de Recherche en Informatique (LRI), UMR 8623, CNRS, Université Paris-Sud/Université Paris-Saclay, Orsay, France
- INRA, UR1404, MaIAGE, Université Paris-Saclay, Jouy-en-Josas, France
| | - Anne Goelzer
- INRA, UR1404, MaIAGE, Université Paris-Saclay, Jouy-en-Josas, France
| | - Arnaud Ferré
- Laboratoire de Recherche en Informatique (LRI), UMR 8623, CNRS, Université Paris-Sud/Université Paris-Saclay, Orsay, France
| | - Stephan Fischer
- INRA, UR1404, MaIAGE, Université Paris-Saclay, Jouy-en-Josas, France
| | - Marc Dinh
- INRA, UR1404, MaIAGE, Université Paris-Saclay, Jouy-en-Josas, France
| | - Valentin Loux
- INRA, UR1404, MaIAGE, Université Paris-Saclay, Jouy-en-Josas, France
| | - Christine Froidevaux
- Laboratoire de Recherche en Informatique (LRI), UMR 8623, CNRS, Université Paris-Sud/Université Paris-Saclay, Orsay, France
| | - Vincent Fromion
- INRA, UR1404, MaIAGE, Université Paris-Saclay, Jouy-en-Josas, France
| |
Collapse
|
23
|
Thompson LR, Sanders JG, McDonald D, Amir A, Ladau J, Locey KJ, Prill RJ, Tripathi A, Gibbons SM, Ackermann G, Navas-Molina JA, Janssen S, Kopylova E, Vázquez-Baeza Y, González A, Morton JT, Mirarab S, Zech Xu Z, Jiang L, Haroon MF, Kanbar J, Zhu Q, Jin Song S, Kosciolek T, Bokulich NA, Lefler J, Brislawn CJ, Humphrey G, Owens SM, Hampton-Marcell J, Berg-Lyons D, McKenzie V, Fierer N, Fuhrman JA, Clauset A, Stevens RL, Shade A, Pollard KS, Goodwin KD, Jansson JK, Gilbert JA, Knight R. A communal catalogue reveals Earth's multiscale microbial diversity. Nature 2017; 551:457-463. [PMID: 29088705 PMCID: PMC6192678 DOI: 10.1038/nature24621] [Citation(s) in RCA: 1446] [Impact Index Per Article: 180.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2017] [Accepted: 10/10/2017] [Indexed: 02/07/2023]
Abstract
Our growing awareness of the microbial world's importance and diversity contrasts starkly with our limited understanding of its fundamental structure. Despite recent advances in DNA sequencing, a lack of standardized protocols and common analytical frameworks impedes comparisons among studies, hindering the development of global inferences about microbial life on Earth. Here we present a meta-analysis of microbial community samples collected by hundreds of researchers for the Earth Microbiome Project. Coordinated protocols and new analytical methods, particularly the use of exact sequences instead of clustered operational taxonomic units, enable bacterial and archaeal ribosomal RNA gene sequences to be followed across multiple studies and allow us to explore patterns of diversity at an unprecedented scale. The result is both a reference database giving global context to DNA sequence data and a framework for incorporating data from future studies, fostering increasingly complete characterization of Earth's microbial diversity.
Collapse
Affiliation(s)
- Luke R Thompson
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA.,Department of Biological Sciences and Northern Gulf Institute, University of Southern Mississippi, Hattiesburg, Mississippi, USA.,Ocean Chemistry and Ecosystems Division, Atlantic Oceanographic and Meteorological Laboratory, National Oceanic and Atmospheric Administration, stationed at Southwest Fisheries Science Center, La Jolla, California, USA
| | - Jon G Sanders
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Daniel McDonald
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Amnon Amir
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Joshua Ladau
- The Gladstone Institutes and University of California San Francisco, San Francisco, California, USA
| | - Kenneth J Locey
- Department of Biology, Indiana University, Bloomington, Indiana, USA
| | - Robert J Prill
- Industrial and Applied Genomics, IBM Almaden Research Center, San Jose, California, USA
| | - Anupriya Tripathi
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA.,Division of Biological Sciences, University of California San Diego, La Jolla, California, USA.,Skaggs School of Pharmacy, University of California San Diego, La Jolla, California, USA
| | - Sean M Gibbons
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.,The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Gail Ackermann
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Jose A Navas-Molina
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA.,Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, USA
| | - Stefan Janssen
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Evguenia Kopylova
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Yoshiki Vázquez-Baeza
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA.,Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, USA
| | - Antonio González
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - James T Morton
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA.,Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, California, USA
| | - Zhenjiang Zech Xu
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Lingjing Jiang
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA.,Department of Family Medicine and Public Health, University of California San Diego, La Jolla, California, USA
| | - Mohamed F Haroon
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, USA
| | - Jad Kanbar
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Qiyun Zhu
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Se Jin Song
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Tomasz Kosciolek
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Nicholas A Bokulich
- Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, USA
| | - Joshua Lefler
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Colin J Brislawn
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Gregory Humphrey
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA
| | - Sarah M Owens
- Biosciences Division, Argonne National Laboratory, Argonne, Illinois, USA
| | - Jarrad Hampton-Marcell
- Biosciences Division, Argonne National Laboratory, Argonne, Illinois, USA.,Department of Biological Sciences, University of Illinois at Chicago, Chicago, Illinois, USA
| | - Donna Berg-Lyons
- BioFrontiers Institute, University of Colorado, Boulder, Colorado, USA
| | - Valerie McKenzie
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, Colorado, USA
| | - Noah Fierer
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, Colorado, USA.,Cooperative Institute for Research in Environmental Sciences, University of Colorado, Boulder, Colorado, USA
| | - Jed A Fuhrman
- Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
| | - Aaron Clauset
- BioFrontiers Institute, University of Colorado, Boulder, Colorado, USA.,Department of Computer Science, University of Colorado, Boulder, Colorado, USA
| | - Rick L Stevens
- Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, Illinois, USA.,Department of Computer Science, University of Chicago, Chicago, Illinois, USA
| | - Ashley Shade
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, USA.,Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, Michigan, USA.,Program in Ecology, Evolutionary Biology and Behavior, Michigan State University, East Lansing, Michigan, USA
| | - Katherine S Pollard
- The Gladstone Institutes and University of California San Francisco, San Francisco, California, USA
| | - Kelly D Goodwin
- Ocean Chemistry and Ecosystems Division, Atlantic Oceanographic and Meteorological Laboratory, National Oceanic and Atmospheric Administration, stationed at Southwest Fisheries Science Center, La Jolla, California, USA
| | - Janet K Jansson
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Jack A Gilbert
- Biosciences Division, Argonne National Laboratory, Argonne, Illinois, USA.,Department of Surgery, University of Chicago, Chicago, Illinois, USA
| | - Rob Knight
- Department of Pediatrics, University of California San Diego, La Jolla, California, USA.,Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, USA.,Center for Microbiome Innovation, University of California San Diego, La Jolla, California, USA
| | | |
Collapse
|
24
|
Modelling plankton ecosystems in the meta-omics era. Are we ready? Mar Genomics 2017; 32:1-17. [DOI: 10.1016/j.margen.2017.02.006] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2016] [Revised: 02/24/2017] [Accepted: 02/25/2017] [Indexed: 12/30/2022]
|
25
|
Chibucos MC, Siegele DA, Hu JC, Giglio M. The Evidence and Conclusion Ontology (ECO): Supporting GO Annotations. Methods Mol Biol 2017; 1446:245-259. [PMID: 27812948 DOI: 10.1007/978-1-4939-3743-1_18] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
The Evidence and Conclusion Ontology (ECO) is a community resource for describing the various types of evidence that are generated during the course of a scientific study and which are typically used to support assertions made by researchers. ECO describes multiple evidence types, including evidence resulting from experimental (i.e., wet lab) techniques, evidence arising from computational methods, statements made by authors (whether or not supported by evidence), and inferences drawn by researchers curating the literature. In addition to summarizing the evidence that supports a particular assertion, ECO also offers a means to document whether a computer or a human performed the process of making the annotation. Incorporating ECO into an annotation system makes it possible to leverage the structure of the ontology such that associated data can be grouped hierarchically, users can select data associated with particular evidence types, and quality control pipelines can be optimized. Today, over 30 resources, including the Gene Ontology, use the Evidence and Conclusion Ontology to represent both evidence and how annotations are made.
Collapse
Affiliation(s)
- Marcus C Chibucos
- Department of Microbiology and Immunology, Institute for Genome Sciences, University of Maryland School of Medicine, 801 W. Baltimore Street, Baltimore, MD, 21201, USA.
| | - Deborah A Siegele
- Department of Biology, Texas A&M University, College Station, TX, 77843, USA
| | - James C Hu
- Department of Biochemistry and Biophysics, Texas A&M University and Texas AgriLife Research, College Station, TX, 77843, USA
| | - Michelle Giglio
- Department of Medicine, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| |
Collapse
|
26
|
Blank CE, Cui H, Moore LR, Walls RL. MicrO: an ontology of phenotypic and metabolic characters, assays, and culture media found in prokaryotic taxonomic descriptions. J Biomed Semantics 2016; 7:18. [PMID: 27076900 PMCID: PMC4830071 DOI: 10.1186/s13326-016-0060-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2015] [Accepted: 04/02/2016] [Indexed: 12/03/2022] Open
Abstract
Background MicrO is an ontology of microbiological terms, including prokaryotic qualities and processes, material entities (such as cell components), chemical entities (such as microbiological culture media and medium ingredients), and assays. The ontology was built to support the ongoing development of a natural language processing algorithm, MicroPIE (or, Microbial Phenomics Information Extractor). During the MicroPIE design process, we realized there was a need for a prokaryotic ontology which would capture the evolutionary diversity of phenotypes and metabolic processes across the tree of life, capture the diversity of synonyms and information contained in the taxonomic literature, and relate microbiological entities and processes to terms in a large number of other ontologies, most particularly the Gene Ontology (GO), the Phenotypic Quality Ontology (PATO), and the Chemical Entities of Biological Interest (ChEBI). We thus constructed MicrO to be rich in logical axioms and synonyms gathered from the taxonomic literature. Results MicrO currently has ~14550 classes (~2550 of which are new, the remainder being microbiologically-relevant classes imported from other ontologies), connected by ~24,130 logical axioms (5,446 of which are new), and is available at (http://purl.obolibrary.org/obo/MicrO.owl) and on the project website at https://github.com/carrineblank/MicrO. MicrO has been integrated into the OBO Foundry Library (http://www.obofoundry.org/ontology/micro.html), so that other ontologies can borrow and re-use classes. Term requests and user feedback can be made using MicrO’s Issue Tracker in GitHub. We designed MicrO such that it can support the ongoing and future development of algorithms that can leverage the controlled vocabulary and logical inference power provided by the ontology. Conclusions By connecting microbial classes with large numbers of chemical entities, material entities, biological processes, molecular functions, and qualities using a dense array of logical axioms, we intend MicrO to be a powerful new tool to increase the computing power of bioinformatics tools such as the automated text mining of prokaryotic taxonomic descriptions using natural language processing. We also intend MicrO to support the development of new bioinformatics tools that aim to develop new connections between microbial phenotypes and genotypes (i.e., the gene content in genomes). Future ontology development will include incorporation of pathogenic phenotypes and prokaryotic habitats. Electronic supplementary material The online version of this article (doi:10.1186/s13326-016-0060-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Carrine E Blank
- Department of Geosciences, University of Montana, Missoula, MT 59812 USA
| | - Hong Cui
- School of Information, University of Arizona, Tucson, AZ 85719 USA
| | - Lisa R Moore
- Department of Biological Sciences, University of Southern Maine, Portland, ME 04104 USA
| | | |
Collapse
|
27
|
Feldbauer R, Schulz F, Horn M, Rattei T. Prediction of microbial phenotypes based on comparative genomics. BMC Bioinformatics 2015; 16 Suppl 14:S1. [PMID: 26451672 PMCID: PMC4603748 DOI: 10.1186/1471-2105-16-s14-s1] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
The accessibility of almost complete genome sequences of uncultivable microbial species from metagenomes necessitates computational methods predicting microbial phenotypes solely based on genomic data. Here we investigate how comparative genomics can be utilized for the prediction of microbial phenotypes. The PICA framework facilitates application and comparison of different machine learning techniques for phenotypic trait prediction. We have improved and extended PICA's support vector machine plug-in and suggest its applicability to large-scale genome databases and incomplete genome sequences. We have demonstrated the stability of the predictive power for phenotypic traits, not perturbed by the rapid growth of genome databases. A new software tool facilitates the in-depth analysis of phenotype models, which associate expected and unexpected protein functions with particular traits. Most of the traits can be reliably predicted in only 60-70% complete genomes. We have established a new phenotypic model that predicts intracellular microorganisms. Thereby we could demonstrate that also independently evolved phenotypic traits, characterized by genome reduction, can be reliably predicted based on comparative genomics. Our results suggest that the extended PICA framework can be used to automatically annotate phenotypes in near-complete microbial genome sequences, as generated in large numbers in current metagenomics studies.
Collapse
|