2
|
Ruscheweyh HJ, Milanese A, Paoli L, Karcher N, Clayssen Q, Keller MI, Wirbel J, Bork P, Mende DR, Zeller G, Sunagawa S. Cultivation-independent genomes greatly expand taxonomic-profiling capabilities of mOTUs across various environments. MICROBIOME 2022; 10:212. [PMID: 36464731 PMCID: PMC9721005 DOI: 10.1186/s40168-022-01410-z] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Accepted: 11/03/2022] [Indexed: 06/17/2023]
Abstract
BACKGROUND Taxonomic profiling is a fundamental task in microbiome research that aims to detect and quantify the relative abundance of microorganisms in biological samples. Available methods using shotgun metagenomic data generally depend on the deposition of sequenced and taxonomically annotated genomes, usually from cultures of isolated strains, in reference databases (reference genomes). However, the majority of microorganisms have not been cultured yet. Thus, a substantial fraction of microbial community members remains unaccounted for during taxonomic profiling, particularly in samples from underexplored environments. To address this issue, we developed the mOTU profiler, a tool that enables reference genome-independent species-level profiling of metagenomes. As such, it supports the identification and quantification of both "known" and "unknown" species based on a set of select marker genes. RESULTS We present mOTUs3, a command line tool that enables the profiling of metagenomes for >33,000 species-level operational taxonomic units. To achieve this, we leveraged the reconstruction of >600,000 draft genomes, most of which are metagenome-assembled genomes (MAGs), from diverse microbiomes, including soil, freshwater systems, and the gastrointestinal tract of ruminants and other animals, which we found to be underrepresented by reference genomes. Overall, two thirds of all species-level taxa lacked a reference genome. The cumulative relative abundance of these newly included taxa was low in well-studied microbiomes, such as the human body sites (6-11%). By contrast, they accounted for substantial proportions (ocean, freshwater, soil: 43-63%) or even the majority (pig, fish, cattle: 60-80%) of the relative abundance across diverse non-human-associated microbiomes. Using community-developed benchmarks and datasets, we found mOTUs3 to be more accurate than other methods and to be more congruent with 16S rRNA gene-based methods for taxonomic profiling. Furthermore, we demonstrate that mOTUs3 increases the resolution of well-known microbial groups into species-level taxa and helps identify new differentially abundant taxa in comparative metagenomic studies. CONCLUSIONS We developed mOTUs3 to enable accurate species-level profiling of metagenomes. Compared to other methods, it provides a more comprehensive view of prokaryotic community diversity, in particular for currently underexplored microbiomes. To facilitate comparative analyses by the research community, it is released with >11,000 precomputed profiles for publicly available metagenomes and is freely available at: https://github.com/motu-tool/mOTUs . Video Abstract.
Collapse
Affiliation(s)
- Hans-Joachim Ruscheweyh
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, 8093 Zürich, Switzerland
| | - Alessio Milanese
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, 8093 Zürich, Switzerland
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Lucas Paoli
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, 8093 Zürich, Switzerland
| | - Nicolai Karcher
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Quentin Clayssen
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, 8093 Zürich, Switzerland
| | - Marisa Isabell Keller
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Jakob Wirbel
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
- Max Delbrück Centre for Molecular Medicine, Robert-Rössle-Str. 10, 13092 Berlin, Germany
- Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, 97074 Würzburg, Germany
| | - Daniel R. Mende
- Department of Medical Microbiology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Georg Zeller
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Shinichi Sunagawa
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, 8093 Zürich, Switzerland
| |
Collapse
|
4
|
Pan B, Ren L, Onuchic V, Guan M, Kusko R, Bruinsma S, Trigg L, Scherer A, Ning B, Zhang C, Glidewell-Kenney C, Xiao C, Donaldson E, Sedlazeck FJ, Schroth G, Yavas G, Grunenwald H, Chen H, Meinholz H, Meehan J, Wang J, Yang J, Foox J, Shang J, Miclaus K, Dong L, Shi L, Mohiyuddin M, Pirooznia M, Gong P, Golshani R, Wolfinger R, Lababidi S, Sahraeian SME, Sherry S, Han T, Chen T, Shi T, Hou W, Ge W, Zou W, Guo W, Bao W, Xiao W, Fan X, Gondo Y, Yu Y, Zhao Y, Su Z, Liu Z, Tong W, Xiao W, Zook JM, Zheng Y, Hong H. Assessing reproducibility of inherited variants detected with short-read whole genome sequencing. Genome Biol 2022; 23:2. [PMID: 34980216 PMCID: PMC8722114 DOI: 10.1186/s13059-021-02569-8] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 12/06/2021] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. RESULTS To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when > 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30×. CONCLUSIONS Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS.
Collapse
Affiliation(s)
- Bohu Pan
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
- Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | | | | | | | | | - Len Trigg
- Real Time Genomics, Hamilton, New Zealand
| | - Andreas Scherer
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- EATRIS ERIC- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
| | - Baitang Ning
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, 39406, USA
| | | | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Eric Donaldson
- Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD, 20993, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | | | - Gokhan Yavas
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | | | | | | | - Joe Meehan
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Jing Wang
- Center for Advanced Measurement Science, National Institute of Metrology, Beijing, 100013, China
| | - Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
- Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Jonathan Foox
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Jun Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
- Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | | | - Lianhua Dong
- Center for Advanced Measurement Science, National Institute of Metrology, Beijing, 100013, China
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
- Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | | | - Mehdi Pirooznia
- Bioinformatics and Computational Biology Laboratory, National Heart Lung and Blood Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Ping Gong
- Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS, 39180, USA
| | | | | | - Samir Lababidi
- Office of Health Informatics, Office of the Commissioner, US Food and Drug Administration, Silver Spring, MD, 20993, USA
| | | | - Steve Sherry
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Tao Han
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Tao Chen
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Tieliu Shi
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Wanwan Hou
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
- Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Weigong Ge
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Wen Zou
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Wenjing Guo
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Wenjun Bao
- SAS Institute Inc., Cary, NC, 27513, USA
| | - Wenzhong Xiao
- Stanford Genome Technology Center, Stanford University School of Medicine, Palo Alto, CA, 94305, USA
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Yoichi Gondo
- Department of Molecular Life Sciences, Tokai University School of Medicine, 143 Shimokasuya, Isehara, 259-1193, Japan
| | - Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
- Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Yongmei Zhao
- CCR-SF Bioinformatics Group, Advanced Biomedical and Computational Sciences, Biomedical Informatics and Data Science, Frederick National Laboratory for Cancer Research, Frederick, MD, 21701, USA
| | - Zhenqiang Su
- Takeda Pharmaceuticals, Cambridge, MA, 02139, USA
| | - Zhichao Liu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Wenming Xiao
- Division of Molecular Genetics and Pathology, Center for Device and Radiological Health, US Food and Drug Administration, Silver Spring, MD, 20993, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, 20899, USA.
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China.
- Human Phenome Institute, Fudan University, Shanghai, 200438, China.
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA.
| |
Collapse
|
5
|
Decamps C, Arnaud A, Petitprez F, Ayadi M, Baurès A, Armenoult L, Escalera S, Guyon I, Nicolle R, Tomasini R, de Reyniès A, Cros J, Blum Y, Richard M. DECONbench: a benchmarking platform dedicated to deconvolution methods for tumor heterogeneity quantification. BMC Bioinformatics 2021; 22:473. [PMID: 34600479 PMCID: PMC8487526 DOI: 10.1186/s12859-021-04381-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 09/20/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Quantification of tumor heterogeneity is essential to better understand cancer progression and to adapt therapeutic treatments to patient specificities. Bioinformatic tools to assess the different cell populations from single-omic datasets as bulk transcriptome or methylome samples have been recently developed, including reference-based and reference-free methods. Improved methods using multi-omic datasets are yet to be developed in the future and the community would need systematic tools to perform a comparative evaluation of these algorithms on controlled data. RESULTS We present DECONbench, a standardized unbiased benchmarking resource, applied to the evaluation of computational methods quantifying cell-type heterogeneity in cancer. DECONbench includes gold standard simulated benchmark datasets, consisting of transcriptome and methylome profiles mimicking pancreatic adenocarcinoma molecular heterogeneity, and a set of baseline deconvolution methods (reference-free algorithms inferring cell-type proportions). DECONbench performs a systematic performance evaluation of each new methodological contribution and provides the possibility to publicly share source code and scoring. CONCLUSION DECONbench allows continuous submission of new methods in a user-friendly fashion, each novel contribution being automatically compared to the reference baseline methods, which enables crowdsourced benchmarking. DECONbench is designed to serve as a reference platform for the benchmarking of deconvolution methods in the evaluation of cancer heterogeneity. We believe it will contribute to leverage the benchmarking practices in the biomedical and life science communities. DECONbench is hosted on the open source Codalab competition platform. It is freely available at: https://competitions.codalab.org/competitions/27453 .
Collapse
Affiliation(s)
- Clémentine Decamps
- Laboratory TIMC-IMAG, UMR 5525, CNRS, Univ. Grenoble Alpes, Grenoble, France
| | - Alexis Arnaud
- Data Institute, Univ. Grenoble Alpes, Grenoble, France
| | - Florent Petitprez
- Programme Cartes d'Identité des Tumeurs (CIT), Ligue Nationale Contre le Cancer, Paris, France
| | - Mira Ayadi
- Programme Cartes d'Identité des Tumeurs (CIT), Ligue Nationale Contre le Cancer, Paris, France
| | - Aurélia Baurès
- Programme Cartes d'Identité des Tumeurs (CIT), Ligue Nationale Contre le Cancer, Paris, France
| | - Lucile Armenoult
- Programme Cartes d'Identité des Tumeurs (CIT), Ligue Nationale Contre le Cancer, Paris, France
| | | | - Sergio Escalera
- Universitat de Barcelona and Computer Vision Center, Barcelona, Spain
| | - Isabelle Guyon
- LISN (INRIA/CNRS), Université Paris-Saclay, Gif-sur-Yvette, France
| | - Rémy Nicolle
- Programme Cartes d'Identité des Tumeurs (CIT), Ligue Nationale Contre le Cancer, Paris, France
| | | | - Aurélien de Reyniès
- Programme Cartes d'Identité des Tumeurs (CIT), Ligue Nationale Contre le Cancer, Paris, France
| | - Jérôme Cros
- Dpt of Pathology, Beaujon Hospital, Univ. Paris-INSERM U1149, Clichy, France
| | - Yuna Blum
- Programme Cartes d'Identité des Tumeurs (CIT), Ligue Nationale Contre le Cancer, Paris, France. .,IGDR UMR 6290, CNRS, Université de Rennes 1, Rennes, France.
| | - Magali Richard
- Laboratory TIMC-IMAG, UMR 5525, CNRS, Univ. Grenoble Alpes, Grenoble, France.
| |
Collapse
|