1
|
De Filippis GM, Amalfitano D, Russo C, Tommasino C, Rinaldi AM. A systematic mapping study of semantic technologies in multi-omics data integration. J Biomed Inform 2025; 165:104809. [PMID: 40154721 DOI: 10.1016/j.jbi.2025.104809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Revised: 02/03/2025] [Accepted: 03/07/2025] [Indexed: 04/01/2025]
Abstract
OBJECTIVE The integration of multi-omics data is essential for understanding complex biological systems, providing insights beyond single-omics approaches. However, challenges related to data heterogeneity, standardization, and computational scalability persist. This study explores the interdisciplinary application of semantic technologies to enhance data integration, standardization, and analysis in multi-omics research. METHODS We performed a systematic mapping study assessing literature from 2014 to 2024, focusing on the utilization of ontologies, knowledge graphs, and graph-based methods for multi-omics integration. RESULTS Our findings indicate a growing number of publications in this field, predominantly appearing in high-impact journals. The deployment of semantic technologies has notably improved data visualization, querying, and management, thus enhancing gene and pathway discovery, and providing deeper disease insights and more accurate predictive modeling. CONCLUSION The study underscores the significance of semantic technologies in overcoming multi-omics integration challenges. Future research should focus on integrating diverse data types, developing advanced computational tools, and incorporating AI and machine learning to foster personalized medicine applications.
Collapse
Affiliation(s)
- Giovanni Maria De Filippis
- Department of Electrical Engineering and Information Technology DIETI, University of Naples Federico II, Via Claudio, 21, Naples, 80125, Italy.
| | - Domenico Amalfitano
- Department of Electrical Engineering and Information Technology DIETI, University of Naples Federico II, Via Claudio, 21, Naples, 80125, Italy.
| | - Cristiano Russo
- Department of Electrical Engineering and Information Technology DIETI, University of Naples Federico II, Via Claudio, 21, Naples, 80125, Italy.
| | - Cristian Tommasino
- Department of Electrical Engineering and Information Technology DIETI, University of Naples Federico II, Via Claudio, 21, Naples, 80125, Italy.
| | - Antonio Maria Rinaldi
- Department of Electrical Engineering and Information Technology DIETI, University of Naples Federico II, Via Claudio, 21, Naples, 80125, Italy.
| |
Collapse
|
2
|
Imbert B, Kreplak J, Flores RG, Aubert G, Burstin J, Tayeh N. Development of a knowledge graph framework to ease and empower translational approaches in plant research: a use-case on grain legumes. Front Artif Intell 2023; 6:1191122. [PMID: 37601035 PMCID: PMC10435283 DOI: 10.3389/frai.2023.1191122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 07/10/2023] [Indexed: 08/22/2023] Open
Abstract
While the continuing decline in genotyping and sequencing costs has largely benefited plant research, some key species for meeting the challenges of agriculture remain mostly understudied. As a result, heterogeneous datasets for different traits are available for a significant number of these species. As gene structures and functions are to some extent conserved through evolution, comparative genomics can be used to transfer available knowledge from one species to another. However, such a translational research approach is complex due to the multiplicity of data sources and the non-harmonized description of the data. Here, we provide two pipelines, referred to as structural and functional pipelines, to create a framework for a NoSQL graph-database (Neo4j) to integrate and query heterogeneous data from multiple species. We call this framework Orthology-driven knowledge base framework for translational research (Ortho_KB). The structural pipeline builds bridges across species based on orthology. The functional pipeline integrates biological information, including QTL, and RNA-sequencing datasets, and uses the backbone from the structural pipeline to connect orthologs in the database. Queries can be written using the Neo4j Cypher language and can, for instance, lead to identify genes controlling a common trait across species. To explore the possibilities offered by such a framework, we populated Ortho_KB to obtain OrthoLegKB, an instance dedicated to legumes. The proposed model was evaluated by studying the conservation of a flowering-promoting gene. Through a series of queries, we have demonstrated that our knowledge graph base provides an intuitive and powerful platform to support research and development programmes.
Collapse
Affiliation(s)
- Baptiste Imbert
- Agroécologie, INRAE, Institut Agro, Univ. Bourgogne, Univ. Bourgogne Franche-Comté, Dijon, France
| | - Jonathan Kreplak
- Agroécologie, INRAE, Institut Agro, Univ. Bourgogne, Univ. Bourgogne Franche-Comté, Dijon, France
| | - Raphaël-Gauthier Flores
- Université Paris-Saclay, INRAE, URGI, Versailles, France
- Université Paris-Saclay, INRAE, BioinfOmics, Plant Bioinformatics Facility, Versailles, France
| | - Grégoire Aubert
- Agroécologie, INRAE, Institut Agro, Univ. Bourgogne, Univ. Bourgogne Franche-Comté, Dijon, France
| | - Judith Burstin
- Agroécologie, INRAE, Institut Agro, Univ. Bourgogne, Univ. Bourgogne Franche-Comté, Dijon, France
| | - Nadim Tayeh
- Agroécologie, INRAE, Institut Agro, Univ. Bourgogne, Univ. Bourgogne Franche-Comté, Dijon, France
| |
Collapse
|
3
|
Wang Z, Kim W, Wang YW, Yakubovich E, Dong C, Trail F, Townsend JP, Yarden O. The Sordariomycetes: an expanding resource with Big Data for mining in evolutionary genomics and transcriptomics. FRONTIERS IN FUNGAL BIOLOGY 2023; 4:1214537. [PMID: 37746130 PMCID: PMC10512317 DOI: 10.3389/ffunb.2023.1214537] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Accepted: 06/06/2023] [Indexed: 09/26/2023]
Abstract
Advances in genomics and transcriptomics accompanying the rapid accumulation of omics data have provided new tools that have transformed and expanded the traditional concepts of model fungi. Evolutionary genomics and transcriptomics have flourished with the use of classical and newer fungal models that facilitate the study of diverse topics encompassing fungal biology and development. Technological advances have also created the opportunity to obtain and mine large datasets. One such continuously growing dataset is that of the Sordariomycetes, which exhibit a richness of species, ecological diversity, economic importance, and a profound research history on amenable models. Currently, 3,574 species of this class have been sequenced, comprising nearly one-third of the available ascomycete genomes. Among these genomes, multiple representatives of the model genera Fusarium, Neurospora, and Trichoderma are present. In this review, we examine recently published studies and data on the Sordariomycetes that have contributed novel insights to the field of fungal evolution via integrative analyses of the genetic, pathogenic, and other biological characteristics of the fungi. Some of these studies applied ancestral state analysis of gene expression among divergent lineages to infer regulatory network models, identify key genetic elements in fungal sexual development, and investigate the regulation of conidial germination and secondary metabolism. Such multispecies investigations address challenges in the study of fungal evolutionary genomics derived from studies that are often based on limited model genomes and that primarily focus on the aspects of biology driven by knowledge drawn from a few model species. Rapidly accumulating information and expanding capabilities for systems biological analysis of Big Data are setting the stage for the expansion of the concept of model systems from unitary taxonomic species/genera to inclusive clusters of well-studied models that can facilitate both the in-depth study of specific lineages and also investigation of trait diversity across lineages. The Sordariomycetes class, in particular, offers abundant omics data and a large and active global research community. As such, the Sordariomycetes can form a core omics clade, providing a blueprint for the expansion of our knowledge of evolution at the genomic scale in the exciting era of Big Data and artificial intelligence, and serving as a reference for the future analysis of different taxonomic levels within the fungal kingdom.
Collapse
Affiliation(s)
- Zheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States
| | - Wonyong Kim
- Korean Lichen Research Institute, Sunchon National University, Suncheon, Republic of Korea
| | - Yen-Wen Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States
| | - Elizabeta Yakubovich
- Department of Plant Pathology and Microbiology, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Caihong Dong
- Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Frances Trail
- Department of Plant Biology, Michigan State University, East Lansing, MI, United States
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI, United States
| | - Jeffrey P. Townsend
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States
- Department of Ecology and Evolutionary Biology, Program in Microbiology, and Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States
| | - Oded Yarden
- Department of Plant Pathology and Microbiology, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| |
Collapse
|
4
|
Lobanov V, Gobet A, Joyce A. Ecosystem-specific microbiota and microbiome databases in the era of big data. ENVIRONMENTAL MICROBIOME 2022; 17:37. [PMID: 35842686 PMCID: PMC9287977 DOI: 10.1186/s40793-022-00433-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 06/29/2022] [Indexed: 05/05/2023]
Abstract
The rapid development of sequencing methods over the past decades has accelerated both the potential scope and depth of microbiota and microbiome studies. Recent developments in the field have been marked by an expansion away from purely categorical studies towards a greater investigation of community functionality. As in-depth genomic and environmental coverage is often distributed unequally across major taxa and ecosystems, it can be difficult to identify or substantiate relationships within microbial communities. Generic databases containing datasets from diverse ecosystems have opened a new era of data accessibility despite costs in terms of data quality and heterogeneity. This challenge is readily embodied in the integration of meta-omics data alongside habitat-specific standards which help contextualise datasets both in terms of sample processing and background within the ecosystem. A special case of large genomic repositories, ecosystem-specific databases (ES-DB's), have emerged to consolidate and better standardise sample processing and analysis protocols around individual ecosystems under study, allowing independent studies to produce comparable datasets. Here, we provide a comprehensive review of this emerging tool for microbial community analysis in relation to current trends in the field. We focus on the factors leading to the formation of ES-DB's, their comparison to traditional microbial databases, the potential for ES-DB integration with meta-omics platforms, as well as inherent limitations in the applicability of ES-DB's.
Collapse
Affiliation(s)
- Victor Lobanov
- Department of Marine Sciences, University of Gothenburg, Box 461, 405 30, Gothenburg, Sweden
| | | | - Alyssa Joyce
- Department of Marine Sciences, University of Gothenburg, Box 461, 405 30, Gothenburg, Sweden.
| |
Collapse
|
5
|
Cerullo AR, Lai TY, Allam B, Baer A, Barnes WJP, Barrientos Z, Deheyn DD, Fudge DS, Gould J, Harrington MJ, Holford M, Hung CS, Jain G, Mayer G, Medina M, Monge-Nájera J, Napolitano T, Espinosa EP, Schmidt S, Thompson EM, Braunschweig AB. Comparative Animal Mucomics: Inspiration for Functional Materials from Ubiquitous and Understudied Biopolymers. ACS Biomater Sci Eng 2020; 6:5377-5398. [DOI: 10.1021/acsbiomaterials.0c00713] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Antonio R. Cerullo
- The PhD Program in Biochemistry, Graduate Center of the City University of New York, 365 Fifth Avenue, New York, New York 10016, United States
- The Advanced Science Research Center, Graduate Center of the City University of New York, 85 St. Nicholas Terrace, New York, New York 10031, United States
- Department of Chemistry and Biochemistry, Hunter College, 695 Park Avenue, New York, New York 10065, United States
| | - Tsoi Ying Lai
- The Advanced Science Research Center, Graduate Center of the City University of New York, 85 St. Nicholas Terrace, New York, New York 10031, United States
| | - Bassem Allam
- School of Marine and Atmospheric Sciences, Stony Brook University, Stony Brook, New York 11794-5000, United States
| | - Alexander Baer
- Department of Zoology, Institute of Biology, University of Kassel, Heinrich-Plett-Strasse 40, 34132 Kassel, Germany
| | - W. Jon P. Barnes
- Centre for Cell Engineering, Joseph Black Building, University of Glasgow, Glasgow G12 8QQ, Scotland, U.K
| | - Zaidett Barrientos
- Laboratorio de Ecología Urbana, Universidad Estatal a Distancia, Mercedes de Montes de Oca, San José 474-2050, Costa Rica
| | - Dimitri D. Deheyn
- Marine Biology Research Division-0202, Scripps Institute of Oceanography, UCSD, 9500 Gilman Drive, La Jolla, California 92093, United States
| | - Douglas S. Fudge
- Schmid College of Science and Technology, Chapman University, 1 University Drive, Orange, California 92866, United States
| | - John Gould
- School of Environmental and Life Sciences, University of Newcastle, University Drive, Callaghan, New South Wales 2308, Australia
| | - Matthew J. Harrington
- Department of Chemistry, McGill University, 801 Sherbrooke Street West, Montreal, Quebec H3A 0B8, Canada
| | - Mandë Holford
- The PhD Program in Biochemistry, Graduate Center of the City University of New York, 365 Fifth Avenue, New York, New York 10016, United States
- Department of Chemistry and Biochemistry, Hunter College, 695 Park Avenue, New York, New York 10065, United States
- Department of Invertebrate Zoology, The American Museum of Natural History, New York, New York 10024, United States
- The PhD Program in Chemistry, Graduate Center of the City University of New York, 365 Fifth Avenue, New York, New York 10016, United States
- The PhD Program in Biology, Graduate Center of the City University of New York, 365 Fifth Avenue, New York, New York 10016, United States
| | - Chia-Suei Hung
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson Air Force Base, Dayton, Ohio 45433, United States
| | - Gaurav Jain
- Schmid College of Science and Technology, Chapman University, 1 University Drive, Orange, California 92866, United States
| | - Georg Mayer
- Department of Zoology, Institute of Biology, University of Kassel, Heinrich-Plett-Strasse 40, 34132 Kassel, Germany
| | - Mónica Medina
- Department of Biology, Pennsylvania State University, 208 Mueller Lab, University Park, Pennsylvania 16802, United States
| | - Julian Monge-Nájera
- Laboratorio de Ecología Urbana, Universidad Estatal a Distancia, Mercedes de Montes de Oca, San José 474-2050, Costa Rica
| | - Tanya Napolitano
- The PhD Program in Biochemistry, Graduate Center of the City University of New York, 365 Fifth Avenue, New York, New York 10016, United States
- Department of Chemistry and Biochemistry, Hunter College, 695 Park Avenue, New York, New York 10065, United States
| | - Emmanuelle Pales Espinosa
- School of Marine and Atmospheric Sciences, Stony Brook University, Stony Brook, New York 11794-5000, United States
| | - Stephan Schmidt
- Institute of Organic and Macromolecular Chemistry, Heinrich-Heine-Universität Düsseldorf, Universitätsstrasse 1, 40225 Düsseldorf, Germany
| | - Eric M. Thompson
- Sars Centre for Marine Molecular Biology, Thormøhlensgt. 55, 5020 Bergen, Norway
- Department of Biological Sciences, University of Bergen, N-5006 Bergen, Norway
| | - Adam B. Braunschweig
- The PhD Program in Biochemistry, Graduate Center of the City University of New York, 365 Fifth Avenue, New York, New York 10016, United States
- The Advanced Science Research Center, Graduate Center of the City University of New York, 85 St. Nicholas Terrace, New York, New York 10031, United States
- Department of Chemistry and Biochemistry, Hunter College, 695 Park Avenue, New York, New York 10065, United States
- The PhD Program in Chemistry, Graduate Center of the City University of New York, 365 Fifth Avenue, New York, New York 10016, United States
| |
Collapse
|
6
|
Bolduc B, Hodgkins SB, Varner RK, Crill PM, McCalley CK, Chanton JP, Tyson GW, Riley WJ, Palace M, Duhaime MB, Hough MA, Saleska SR, Sullivan MB, Rich VI. The IsoGenie database: an interdisciplinary data management solution for ecosystems biology and environmental research. PeerJ 2020. [DOI: 10.7717/peerj.9467] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Modern microbial and ecosystem sciences require diverse interdisciplinary teams that are often challenged in “speaking” to one another due to different languages and data product types. Here we introduce the IsoGenie Database (IsoGenieDB; https://isogenie-db.asc.ohio-state.edu/), a de novo developed data management and exploration platform, as a solution to this challenge of accurately representing and integrating heterogenous environmental and microbial data across ecosystem scales. The IsoGenieDB is a public and private data infrastructure designed to store and query data generated by the IsoGenie Project, a ~10 year DOE-funded project focused on discovering ecosystem climate feedbacks in a thawing permafrost landscape. The IsoGenieDB provides (i) a platform for IsoGenie Project members to explore the project’s interdisciplinary datasets across scales through the inherent relationships among data entities, (ii) a framework to consolidate and harmonize the datasets needed by the team’s modelers, and (iii) a public venue that leverages the same spatially explicit, disciplinarily integrated data structure to share published datasets. The IsoGenieDB is also being expanded to cover the NASA-funded Archaea to Atmosphere (A2A) project, which scales the findings of IsoGenie to a broader suite of Arctic peatlands, via the umbrella A2A Database (A2A-DB). The IsoGenieDB’s expandability and flexible architecture allow it to serve as an example ecosystems database.
Collapse
Affiliation(s)
- Benjamin Bolduc
- Department of Microbiology, The Ohio State University, Columbus, OH, USA
| | | | - Ruth K. Varner
- Earth Systems Research Center, Institute for the Study of Earth, Oceans and Space, University of New Hampshire, Durham, NH, USA
- Department of Earth Sciences, College of Engineering and Physical Sciences, University of New Hampshire, Durham, NH, USA
| | - Patrick M. Crill
- Department of Geological Sciences and Bolin Centre for Climate Research, Stockholm University, Stockholm, Sweden
| | - Carmody K. McCalley
- Thomas H. Gosnell School of Life Sciences, Rochester Institute of Technology, Rochester, NY, USA
| | - Jeffrey P. Chanton
- Department of Earth, Ocean, and Atmospheric Science, Florida State University, Tallahassee, FL, USA
| | - Gene W. Tyson
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | - William J. Riley
- Climate and Ecosystem Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Michael Palace
- Earth Systems Research Center, Institute for the Study of Earth, Oceans and Space, University of New Hampshire, Durham, NH, USA
- Department of Earth Sciences, College of Engineering and Physical Sciences, University of New Hampshire, Durham, NH, USA
| | - Melissa B. Duhaime
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| | - Moira A. Hough
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - Scott R. Saleska
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - Matthew B. Sullivan
- Department of Microbiology, The Ohio State University, Columbus, OH, USA
- Department of Civil, Environmental and Geodetic Engineering, The Ohio State University, Columbus, OH, USA
| | - Virginia I. Rich
- Department of Microbiology, The Ohio State University, Columbus, OH, USA
| | | |
Collapse
|
7
|
Liu Z, He X, Wang L, Zhang Y, Hai Y, Gao R. Chinese Herbal Medicine Hepatotoxicity: The Evaluation and Recognization Based on Large-scale Evidence Database. Curr Drug Metab 2019; 20:138-146. [PMID: 30101702 PMCID: PMC6635764 DOI: 10.2174/1389200219666180813144114] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Revised: 05/28/2018] [Accepted: 06/27/2018] [Indexed: 12/17/2022]
Abstract
Background: Due to the special nature of Chinese Herbal medicine and the complexity of its clinical use, it is difficult to identify and evaluate its toxicity and resulting herb induced liver injury (HILI). Methods: First, the database would provide full profile of HILI from the basic ingredients to clinical out-comes by the most advanced algorithms of artificial intelligence, and it is also possible that we can predict possibilities of HILI after patients taking Chinese herbs by individual patient evaluation and prediction. Second, the database would solve the chaos and lack of the relevant data faced by the current basic re-search and clinical practice of Chinese Herbal Medicine. Third, we can also screen the susceptible patients from the database and thus prevent the accidents of HILI from the very beginning. Results: The Roussel Uclaf Causality Assessment Method (RUCAM) is the most accepted method to evalu-ate DILI, but at present before using the RUCAM evaluation method, data resource collection and analysis are yet to be perfected. Based on existing research on drug-metabolizing enzymes mediating reactive me-tabolites (RMs), the aim of this study is to explore the possibilities and methods of building multidimen-sional hierarchical database composing of RMs evidence library, Chinese herbal evidence library, and indi-vidualized reports evidence library of herb induced liver injury HILI. Conclusion: The potential benefits lie in its ability to organize, use vast amounts of evidence and use big data mining techniques at the center for Chinese herbal medicine liver toxicity research, which is the most difficult key point of scientific research to be investigated in the next few years.
Collapse
Affiliation(s)
- Zhi Liu
- Tianjin University of Traditional Chinese Medicine, Tianjin 300193, China.,Tianjin State Key Laboratory of Modern Chinese Medicine, Tianjin 300193, China
| | - Xin He
- Tianjin University of Traditional Chinese Medicine, Tianjin 300193, China.,Tianjin State Key Laboratory of Modern Chinese Medicine, Tianjin 300193, China
| | - Lili Wang
- Tianjin University of Traditional Chinese Medicine, Tianjin 300193, China.,Tianjin State Key Laboratory of Modern Chinese Medicine, Tianjin 300193, China
| | - Yunhua Zhang
- Tianjin Clinda Medical Technology Co., Ltd., Tianjin, China
| | - Yue Hai
- Tianjin University of Traditional Chinese Medicine, Tianjin 300193, China
| | - Rui Gao
- Tianjin University of Traditional Chinese Medicine, Tianjin 300193, China
| |
Collapse
|
8
|
Misra BB, Langefeld CD, Olivier M, Cox LA. Integrated Omics: Tools, Advances, and Future Approaches. J Mol Endocrinol 2018; 62:JME-18-0055. [PMID: 30006342 DOI: 10.1530/jme-18-0055] [Citation(s) in RCA: 253] [Impact Index Per Article: 36.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/24/2018] [Revised: 07/02/2018] [Accepted: 07/12/2018] [Indexed: 12/13/2022]
Abstract
With the rapid adoption of high-throughput omic approaches to analyze biological samples such as genomics, transcriptomics, proteomics, and metabolomics, each analysis can generate tera- to peta-byte sized data files on a daily basis. These data file sizes, together with differences in nomenclature among these data types, make the integration of these multi-dimensional omics data into biologically meaningful context challenging. Variously named as integrated omics, multi-omics, poly-omics, trans-omics, pan-omics, or shortened to just 'omics', the challenges include differences in data cleaning, normalization, biomolecule identification, data dimensionality reduction, biological contextualization, statistical validation, data storage and handling, sharing, and data archiving. The ultimate goal is towards the holistic realization of a 'systems biology' understanding of the biological question in hand. Commonly used approaches in these efforts are currently limited by the 3 i's - integration, interpretation, and insights. Post integration, these very large datasets aim to yield unprecedented views of cellular systems at exquisite resolution for transformative insights into processes, events, and diseases through various computational and informatics frameworks. With the continued reduction in costs and processing time for sample analyses, and increasing types of omics datasets generated such as glycomics, lipidomics, microbiomics, and phenomics, an increasing number of scientists in this interdisciplinary domain of bioinformatics face these challenges. We discuss recent approaches, existing tools, and potential caveats in the integration of omics datasets for development of standardized analytical pipelines that could be adopted by the global omics research community.
Collapse
Affiliation(s)
- Biswapriya B Misra
- B Misra, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Carl D Langefeld
- C Langefeld, Biostatistical Sciences, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Michael Olivier
- M Olivier, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Laura A Cox
- L Cox, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| |
Collapse
|
9
|
Nelson M, Guhlin J, Epstein B, Tiffin P, Sadowsky MJ. The complete replicons of 16 Ensifer meliloti strains offer insights into intra- and inter-replicon gene transfer, transposon-associated loci, and repeat elements. Microb Genom 2018; 4. [PMID: 29671722 PMCID: PMC5994717 DOI: 10.1099/mgen.0.000174] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Ensifer meliloti (formerly Rhizobium meliloti and Sinorhizobium meliloti) is a model bacterium for understanding legume–rhizobial symbioses. The tripartite genome of E. meliloti consists of a chromosome, pSymA and pSymB, and in some instances strain-specific accessory plasmids. The majority of previous sequencing studies have relied on the use of assemblies generated from short read sequencing, which leads to gaps and assembly errors. Here we used PacBio-based, long-read assemblies and were able to assemble, de novo, complete circular replicons. In this study, we sequenced, de novo-assembled and analysed 10 E. meliloti strains. Sequence comparisons were also done with data from six previously published genomes. We identified genome differences between the replicons, including mol% G+C and gene content, nucleotide repeats, and transposon-associated loci. Additionally, genomic rearrangements both within and between replicons were identified, providing insight into evolutionary processes at the structural level. There were few cases of inter-replicon gene transfer of core genes between the main replicons. Accessory plasmids were more similar to pSymA than to either pSymB or the chromosome, with respect to gene content, transposon content and G+C content. In our population, the accessory plasmids appeared to share an open genome with pSymA, which contains many nodulation- and nitrogen fixation-related genes. This may explain previous observations that horizontal gene transfer has a greater effect on the content of pSymA than pSymB, or the chromosome, and why some rhizobia show unstable nodulation phenotypes on legume hosts.
Collapse
Affiliation(s)
- Matthew Nelson
- 1Biotechnology Institute and Department of Soil, Water, and Climate, University of Minnesota, St. Paul, MN 55108, USA
| | - Joseph Guhlin
- 2Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN 55108, USA
| | - Brendan Epstein
- 2Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN 55108, USA
| | - Peter Tiffin
- 2Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN 55108, USA
| | - Michael J Sadowsky
- 1Biotechnology Institute and Department of Soil, Water, and Climate, University of Minnesota, St. Paul, MN 55108, USA
| |
Collapse
|
10
|
Misra BB. New tools and resources in metabolomics: 2016-2017. Electrophoresis 2018; 39:909-923. [PMID: 29292835 DOI: 10.1002/elps.201700441] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2017] [Revised: 12/17/2017] [Accepted: 12/18/2017] [Indexed: 01/07/2023]
Abstract
Rapid advances in mass spectrometry (MS) and nuclear magnetic resonance (NMR)-based platforms for metabolomics have led to an upsurge of data every single year. Newer high-throughput platforms, hyphenated technologies, miniaturization, and tool kits in data acquisition efforts in metabolomics have led to additional challenges in metabolomics data pre-processing, analysis, interpretation, and integration. Thanks to the informatics, statistics, and computational community, new resources continue to develop for metabolomics researchers. The purpose of this review is to provide a summary of the metabolomics tools, software, and databases that were developed or improved during 2016-2017, thus, enabling readers, developers, and researchers access to a succinct but thorough list of resources for further improvisation, implementation, and application in due course of time.
Collapse
Affiliation(s)
- Biswapriya B Misra
- Department of Internal Medicine, Section of Molecular Medicine, Medical Center Boulevard, Winston-Salem, NC, USA
| |
Collapse
|