1
|
Qayyum H, Talib MS, Ali A, Kayani MUR. Evaluating the potential of assembler-binner combinations in recovering low-abundance and strain-resolved genomes from human metagenomes. Heliyon 2025; 11:e41938. [PMID: 39897886 PMCID: PMC11786835 DOI: 10.1016/j.heliyon.2025.e41938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 01/08/2025] [Accepted: 01/13/2025] [Indexed: 02/04/2025] Open
Abstract
Human-associated microbial communities are a complex mixture of bacterial species and diverse strains prevalent at varying abundances. Due to the inherent limitations of metagenomic assemblers and genome binning tools in recovering low-abundance species (<1 %) and strains, we lack comprehensive insight into these communities. Although many bioinformatics approaches are available for recovering metagenome-assembled genomes, their effectiveness in recovering low-abundance species and strains is often questioned. Moreover, each tool has its trade-offs, making selecting the right tools challenging. In this study, we investigated the combinatory effect of various assemblers and binning tools on the recovery of low-abundance species and strain-resolved genomes from real and simulated human metagenomes. We evaluated the performance of nine combinations of metagenome assemblers and genome binning tools for their potential to recover genomes of useable quality. Our results revealed that the metaSPAdes-MetaBAT2 combination is highly effective in recovering low-abundance species, while MEGAHIT-MetaBAT2 excels in recovering strain-resolved genomes. These findings highlight the significant variation in the performance of different combinations, even when aiming for the same objective. This suggests the profound impact of selecting the right assembler-binner combination for metagenome analyses. We believe this study will be a cornerstone for the scientific community, guiding the choice of tools by highlighting their complementary effects. Furthermore, it underscores the potential of existing tools to address the current challenges in the field improving the recovery of information from metagenomes.
Collapse
Affiliation(s)
- Hajra Qayyum
- Integrative Biology Laboratory, Department of Microbiology and Biotechnology, Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Srinagar Highway, Sector H-12, Islamabad, Pakistan
- Capital University of Science & Technology, Islamabad Expressway, Kahuta Road Zone-V Sihala, Islamabad, Pakistan
| | - Muhammad Sarfraz Talib
- Integrative Biology Laboratory, Department of Microbiology and Biotechnology, Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Srinagar Highway, Sector H-12, Islamabad, Pakistan
| | - Amjad Ali
- Integrative Biology Laboratory, Department of Microbiology and Biotechnology, Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Srinagar Highway, Sector H-12, Islamabad, Pakistan
| | - Masood Ur Rehman Kayani
- Metagenomics Discovery Lab, School of Interdisciplinary Engineering and Sciences (SINES), National University of Sciences and Technology (NUST), Srinagar Highway, Sector H-12, Islamabad, Pakistan
| |
Collapse
|
2
|
Song H, Tithi SS, Brown C, Aylward FO, Jensen R, Zhang L. Virseqimprover: an integrated pipeline for viral contig error correction, extension, and annotation. PeerJ 2025; 13:e18515. [PMID: 39807156 PMCID: PMC11727651 DOI: 10.7717/peerj.18515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Accepted: 10/21/2024] [Indexed: 01/16/2025] Open
Abstract
Despite the recent surge of viral metagenomic studies, it remains a significant challenge to recover complete virus genomes from metagenomic data. The majority of viral contigs generated from de novo assembly programs are highly fragmented, presenting significant challenges to downstream analysis and inference. To address this issue, we have developed Virseqimprover, a computational pipeline that can extend assembled contigs to complete or nearly complete genomes while maintaining extension quality. Virseqimprover first examines whether there is any chimeric sequence based on read coverage, breaks the sequence into segments if there is, then extends the longest segment with uniform depth of coverage, and repeats these procedures until the sequence cannot be extended. Finally, Virseqimprover annotates the gene content of the resulting sequence. Results show that Virseqimprover has good performances on correcting and extending viral contigs to their full lengths, hence can be a useful tool to improve the completeness and minimize the assembly errors of viral contigs. Both a web server and a conda package for Virseqimprover are provided to the research community free of charge.
Collapse
Affiliation(s)
- Haoqiu Song
- Department of Computer Science, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, United States of America
| | - Saima Sultana Tithi
- Department of Cell & Molecular Biology, St. Jude Children’s Research Hospital, Memphis, TN, United States of America
| | - Connor Brown
- Department of Civil and Environmental Engineering, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, United States of America
| | - Frank O. Aylward
- Department of Biological Sciences, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, United States of America
| | - Roderick Jensen
- Department of Biological Sciences, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, United States of America
| | - Liqing Zhang
- Department of Computer Science, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, United States of America
| |
Collapse
|
3
|
Ejaz MR, Badr K, Hassan ZU, Al-Thani R, Jaoua S. Metagenomic approaches and opportunities in arid soil research. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 953:176173. [PMID: 39260494 DOI: 10.1016/j.scitotenv.2024.176173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 09/04/2024] [Accepted: 09/07/2024] [Indexed: 09/13/2024]
Abstract
Arid soils present unique challenges and opportunities for studying microbial diversity and bioactive potential due to the extreme environmental conditions they bear. This review article investigates soil metagenomics as an emerging tool to explore complex microbial dynamics and unexplored bioactive potential in harsh environments. Utilizing advanced metagenomic techniques, diverse microbial populations that grow under extreme conditions such as high temperatures, salinity, high pH levels, and exposure to metals and radiation can be studied. The use of extremophiles to discover novel natural products and biocatalysts emphasizes the role of functional metagenomics in identifying enzymes and secondary metabolites for industrial and pharmaceutical purposes. Metagenomic sequencing uncovers a complex network of microbial diversity, offering significant potential for discovering new bioactive compounds. Functional metagenomics, connecting taxonomic diversity to genetic capabilities, provides a pathway to identify microbes' mechanisms to synthesize valuable secondary metabolites and other bioactive substances. Contrary to the common perception of desert soil as barren land, the metagenomic analysis reveals a rich diversity of life forms adept at extreme survival. It provides valuable findings into their resilience and potential applications in biotechnology. Moreover, the challenges associated with metagenomics in arid soils, such as low microbial biomass, high DNA degradation rates, and DNA extraction inhibitors and strategies to overcome these issues, outline the latest advancements in extraction methods, high-throughput sequencing, and bioinformatics. The importance of metagenomics for investigating diverse environments opens the way for future research to develop sustainable solutions in agriculture, industry, and medicine. Extensive studies are necessary to utilize the full potential of these powerful microbial communities. This research will significantly improve our understanding of microbial ecology and biotechnology in arid environments.
Collapse
Affiliation(s)
- Muhammad Riaz Ejaz
- Environmental Science Program, Department of Biological and Environmental Sciences, College of Arts and Science, Qatar University, P.O. Box 2713, Doha, Qatar
| | - Kareem Badr
- Environmental Science Program, Department of Biological and Environmental Sciences, College of Arts and Science, Qatar University, P.O. Box 2713, Doha, Qatar
| | - Zahoor Ul Hassan
- Environmental Science Program, Department of Biological and Environmental Sciences, College of Arts and Science, Qatar University, P.O. Box 2713, Doha, Qatar
| | - Roda Al-Thani
- Environmental Science Program, Department of Biological and Environmental Sciences, College of Arts and Science, Qatar University, P.O. Box 2713, Doha, Qatar
| | - Samir Jaoua
- Environmental Science Program, Department of Biological and Environmental Sciences, College of Arts and Science, Qatar University, P.O. Box 2713, Doha, Qatar.
| |
Collapse
|
4
|
Abramova A, Karkman A, Bengtsson-Palme J. Metagenomic assemblies tend to break around antibiotic resistance genes. BMC Genomics 2024; 25:959. [PMID: 39402510 PMCID: PMC11479545 DOI: 10.1186/s12864-024-10876-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Accepted: 10/08/2024] [Indexed: 10/19/2024] Open
Abstract
BACKGROUND Assembly of metagenomic samples can provide essential information about the mobility potential and taxonomic origin of antibiotic resistance genes (ARGs) and inform interventions to prevent further spread of resistant bacteria. However, similar to other conserved regions, such as ribosomal RNA genes and mobile genetic elements, almost identical ARGs typically occur in multiple genomic contexts across different species, representing a considerable challenge for the assembly process. Usually, this results in many fragmented contigs of unclear origin, complicating the risk assessment of ARG detections. To systematically investigate the impact of this issue on detection, quantification and contextualization of ARGs, we evaluated the performance of different assembly approaches, including genomic-, metagenomic- and transcriptomic-specialized assemblers. We quantified recovery and accuracy rates of each tool for ARGs both from in silico spiked metagenomic samples as well as real samples sequenced using both long- and short-read sequencing technologies. RESULTS The results revealed that none of the investigated tools can accurately capture genomic contexts present in samples of high complexity. The transcriptomic assembler Trinity showed a better performance in terms of reconstructing longer and fewer contigs matching unique genomic contexts, which can be beneficial for deciphering the taxonomic origin of ARGs. The currently commonly used metagenomic assembly tools metaSPAdes and MEGAHIT were able to identify the ARG repertoire but failed to fully recover the diversity of genomic contexts present in a sample. On top of that, in a complex scenario MEGAHIT produced very short contigs, which can lead to considerable underestimation of the resistome in a given sample. CONCLUSIONS Our study shows that metaSPAdes and Trinity would be the preferable tools in terms of accuracy to recover correct genomic contexts around ARGs in metagenomic samples characterized by uneven coverages. Overall, the inability of assemblers to reconstruct long ARG-containing contigs has impacts on ARG quantification, suggesting that directly mapping reads to an ARG database should be performed as a complementary strategy to get accurate ARG abundance and diversity measures.
Collapse
Affiliation(s)
- Anna Abramova
- Department of Infectious Diseases, Institute of Biomedicine, The Sahlgrenska Academy, University of Gothenburg, Guldhedsgatan 10A, Gothenburg, 413 46, Sweden.
- Division of Systems and Synthetic Biology, Department of Life Sciences, SciLifeLab, Chalmers University of Technology, Gothenburg, 412 96, Sweden.
- Centre for Antibiotic Resistance Research (CARe), Gothenburg, Sweden.
| | - Antti Karkman
- Department of Microbiology, University of Helsinki, Helsinki, Finland
| | - Johan Bengtsson-Palme
- Department of Infectious Diseases, Institute of Biomedicine, The Sahlgrenska Academy, University of Gothenburg, Guldhedsgatan 10A, Gothenburg, 413 46, Sweden
- Division of Systems and Synthetic Biology, Department of Life Sciences, SciLifeLab, Chalmers University of Technology, Gothenburg, 412 96, Sweden
- Centre for Antibiotic Resistance Research (CARe), Gothenburg, Sweden
| |
Collapse
|
5
|
Goussarov G, Mysara M, Cleenwerck I, Claesen J, Leys N, Vandamme P, Van Houdt R. Benchmarking short-, long- and hybrid-read assemblers for metagenome sequencing of complex microbial communities. MICROBIOLOGY (READING, ENGLAND) 2024; 170:001469. [PMID: 38916949 PMCID: PMC11261854 DOI: 10.1099/mic.0.001469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 05/23/2024] [Indexed: 06/26/2024]
Abstract
Metagenome community analyses, driven by the continued development in sequencing technology, is rapidly providing insights in many aspects of microbiology and becoming a cornerstone tool. Illumina, Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) are the leading technologies, each with their own advantages and drawbacks. Illumina provides accurate reads at a low cost, but their length is too short to close bacterial genomes. Long reads overcome this limitation, but these technologies produce reads with lower accuracy (ONT) or with lower throughput (PacBio high-fidelity reads). In a critical first analysis step, reads are assembled to reconstruct genomes or individual genes within the community. However, to date, the performance of existing assemblers has never been challenged with a complex mock metagenome. Here, we evaluate the performance of current assemblers that use short, long or both read types on a complex mock metagenome consisting of 227 bacterial strains with varying degrees of relatedness. We show that many of the current assemblers are not suited to handle such a complex metagenome. In addition, hybrid assemblies do not fulfil their potential. We conclude that ONT reads assembled with CANU and Illumina reads assembled with SPAdes offer the best value for reconstructing genomes and individual genes of complex metagenomes, respectively.
Collapse
Affiliation(s)
- Gleb Goussarov
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Mohamed Mysara
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
- Bioinformatics group, Information Technology & Computer Science, Nile University, Giza, Egypt
| | - Ilse Cleenwerck
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Jürgen Claesen
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
| | - Natalie Leys
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
| | - Peter Vandamme
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Rob Van Houdt
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
| |
Collapse
|
6
|
Dindhoria K, Kumar R, Bhargava B, Kumar R. Metagenomic assembled genomes indicated the potential application of hypersaline microbiome for plant growth promotion and stress alleviation in salinized soils. mSystems 2024; 9:e0105023. [PMID: 38377278 PMCID: PMC10949518 DOI: 10.1128/msystems.01050-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 01/19/2024] [Indexed: 02/22/2024] Open
Abstract
Climate change is causing unpredictable seasonal variations globally. Due to the continuously increasing earth's surface temperature, the rate of water evaporation is enhanced, conceiving a problem of soil salinization, especially in arid and semi-arid regions. The accumulation of salt degrades soil quality, impairs plant growth, and reduces agricultural yields. Salt-tolerant, plant-growth-promoting microorganisms may offer a solution, enhancing crop productivity and soil fertility in salinized areas. In the current study, genome-resolved metagenomic analysis has been performed to investigate the salt-tolerating and plant growth-promoting potential of two hypersaline ecosystems, Sambhar Lake and Drang Mine. The samples were co-assembled independently by Megahit, MetaSpades, and IDBA-UD tools. A total of 67 metagenomic assembled genomes (MAGs) were reconstructed following the binning process, including 15 from Megahit, 26 from MetaSpades, and 26 from IDBA_UD assembly tools. As compared to other assemblers, the MAGs obtained by MetaSpades were of superior quality, with a completeness range of 12.95%-96.56% and a contamination range of 0%-8.65%. The medium and high-quality MAGs from MetaSpades, upon functional annotation, revealed properties such as salt tolerance (91.3%), heavy metal tolerance (95.6%), exopolysaccharide (95.6%), and antioxidant (60.86%) biosynthesis. Several plant growth-promoting attributes, including phosphate solubilization and indole-3-acetic acid (IAA) production, were consistently identified across all obtained MAGs. Conversely, characteristics such as iron acquisition and potassium solubilization were observed in a substantial majority, specifically 91.3%, of the MAGs. The present study indicates that hypersaline microflora can be used as bio-fertilizing agents for agricultural practices in salinized areas by alleviating prevalent stresses. IMPORTANCE The strategic implementation of metagenomic assembled genomes (MAGs) in exploring the properties and harnessing microorganisms from ecosystems like hypersaline niches has transformative potential in agriculture. This approach promises to redefine our comprehension of microbial diversity and its ecosystem roles. Recovery and decoding of MAGs unlock genetic resources, enabling the development of new solutions for agricultural challenges. Enhanced understanding of these microbial communities can lead to more efficient nutrient cycling, pest control, and soil health maintenance. Consequently, traditional agricultural practices can be improved, resulting in increased yields, reduced environmental impacts, and heightened sustainability. MAGs offer a promising avenue for sustainable agriculture, bridging the gap between cutting-edge genomics and practical field applications.
Collapse
Affiliation(s)
- Kiran Dindhoria
- Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - Raghawendra Kumar
- Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
| | - Bhavya Bhargava
- Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
| | - Rakshak Kumar
- Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| |
Collapse
|
7
|
Sun J, Xie F, Wang J, Luo J, Chen T, Jiang Q, Xi Q, Liu GE, Zhang Y. Integrated meta-omics reveals the regulatory landscape involved in lipid metabolism between pig breeds. MICROBIOME 2024; 12:33. [PMID: 38374121 PMCID: PMC10877772 DOI: 10.1186/s40168-023-01743-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Accepted: 12/19/2023] [Indexed: 02/21/2024]
Abstract
BACKGROUND Domesticated pigs serve as an ideal animal model for biomedical research and also provide the majority of meat for human consumption in China. Porcine intramuscular fat content associates with human health and diseases and is essential in pork quality. The molecular mechanisms controlling lipid metabolism and intramuscular fat accretion across tissues in pigs, and how these changes in response to pig breeds, remain largely unknown. RESULTS We surveyed the tissue-resident cell types of the porcine jejunum, colon, liver, and longissimus dorsi muscle between Lantang and Landrace breeds by single-cell RNA sequencing. Combining lipidomics and metagenomics approaches, we also characterized gene signatures and determined key discriminating markers of lipid digestibility, absorption, conversion, and deposition across tissues in two pig breeds. In Landrace, lean-meat swine mainly exhibited breed-specific advantages in lipid absorption and oxidation for energy supply in small and large intestinal epitheliums, nascent high-density lipoprotein synthesis for reverse cholesterol transport in enterocytes and hepatocytes, bile acid formation, and secretion for fat emulsification in hepatocytes, as well as intestinal-microbiota gene expression involved in lipid accumulation product. In Lantang, obese-meat swine showed a higher synthesis capacity of chylomicrons responsible for high serum triacylglycerol levels in small intestinal epitheliums, the predominant characteristics of lipid absorption in muscle tissue, and greater intramuscular adipcytogenesis potentials from muscular fibro-adipogenic progenitor subpopulation. CONCLUSIONS The findings enhanced our understanding of the cellular biology of lipid metabolism and opened new avenues to improve animal production and human diseases. Video Abstract.
Collapse
Affiliation(s)
- Jiajie Sun
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, National Engineering Research Center for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong, 510642, China
| | - Fang Xie
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, National Engineering Research Center for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong, 510642, China
| | - Jing Wang
- Institute of Animal Husbandry and Veterinary Medicine, Henan Academy of Agricultural Sciences, Zhengzhou, 450002, China
| | - Junyi Luo
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, National Engineering Research Center for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong, 510642, China
| | - Ting Chen
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, National Engineering Research Center for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong, 510642, China
| | - Qingyan Jiang
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, National Engineering Research Center for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong, 510642, China
| | - Qianyun Xi
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, National Engineering Research Center for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong, 510642, China.
| | - George E Liu
- Animal Genomics and Improvement Laboratory, USDA-ARS, BARC-East, Beltsville, MD, 20705, USA.
| | - Yongliang Zhang
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, National Engineering Research Center for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong, 510642, China.
| |
Collapse
|
8
|
Walsh LH, Coakley M, Walsh AM, O'Toole PW, Cotter PD. Bioinformatic approaches for studying the microbiome of fermented food. Crit Rev Microbiol 2023; 49:693-725. [PMID: 36287644 DOI: 10.1080/1040841x.2022.2132850] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 08/11/2022] [Accepted: 09/28/2022] [Indexed: 11/03/2022]
Abstract
High-throughput DNA sequencing-based approaches continue to revolutionise our understanding of microbial ecosystems, including those associated with fermented foods. Metagenomic and metatranscriptomic approaches are state-of-the-art biological profiling methods and are employed to investigate a wide variety of characteristics of microbial communities, such as taxonomic membership, gene content and the range and level at which these genes are expressed. Individual groups and consortia of researchers are utilising these approaches to produce increasingly large and complex datasets, representing vast populations of microorganisms. There is a corresponding requirement for the development and application of appropriate bioinformatic tools and pipelines to interpret this data. This review critically analyses the tools and pipelines that have been used or that could be applied to the analysis of metagenomic and metatranscriptomic data from fermented foods. In addition, we critically analyse a number of studies of fermented foods in which these tools have previously been applied, to highlight the insights that these approaches can provide.
Collapse
Affiliation(s)
- Liam H Walsh
- Teagasc Food Research Centre, Moorepark, Fermoy, Cork, Ireland
- School of Microbiology, University College Cork, Ireland
| | - Mairéad Coakley
- Teagasc Food Research Centre, Moorepark, Fermoy, Cork, Ireland
| | - Aaron M Walsh
- Teagasc Food Research Centre, Moorepark, Fermoy, Cork, Ireland
| | - Paul W O'Toole
- School of Microbiology, University College Cork, Ireland
- APC Microbiome Ireland, University College Cork, Ireland
| | - Paul D Cotter
- Teagasc Food Research Centre, Moorepark, Fermoy, Cork, Ireland
- APC Microbiome Ireland, University College Cork, Ireland
- VistaMilk SFI Research Centre, Teagasc, Moorepark, Fermoy, Cork, Ireland
| |
Collapse
|
9
|
Orlov YL, Orlova NG. Bioinformatics tools for the sequence complexity estimates. Biophys Rev 2023; 15:1367-1378. [PMID: 37974990 PMCID: PMC10643780 DOI: 10.1007/s12551-023-01140-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 09/01/2023] [Indexed: 11/19/2023] Open
Abstract
We review current methods and bioinformatics tools for the text complexity estimates (information and entropy measures). The search DNA regions with extreme statistical characteristics such as low complexity regions are important for biophysical models of chromosome function and gene transcription regulation in genome scale. We discuss the complexity profiling for segmentation and delineation of genome sequences, search for genome repeats and transposable elements, and applications to next-generation sequencing reads. We review the complexity methods and new applications fields: analysis of mutation hotspots loci, analysis of short sequencing reads with quality control, and alignment-free genome comparisons. The algorithms implementing various numerical measures of text complexity estimates including combinatorial and linguistic measures have been developed before genome sequencing era. The series of tools to estimate sequence complexity use compression approaches, mainly by modification of Lempel-Ziv compression. Most of the tools are available online providing large-scale service for whole genome analysis. Novel machine learning applications for classification of complete genome sequences also include sequence compression and complexity algorithms. We present comparison of the complexity methods on the different sequence sets, the applications for gene transcription regulatory regions analysis. Furthermore, we discuss approaches and application of sequence complexity for proteins. The complexity measures for amino acid sequences could be calculated by the same entropy and compression-based algorithms. But the functional and evolutionary roles of low complexity regions in protein have specific features differing from DNA. The tools for protein sequence complexity aimed for protein structural constraints. It was shown that low complexity regions in protein sequences are conservative in evolution and have important biological and structural functions. Finally, we summarize recent findings in large scale genome complexity comparison and applications for coronavirus genome analysis.
Collapse
Affiliation(s)
- Yuriy L. Orlov
- The Digital Health Institute, I.M. Sechenov First Moscow State Medical University of the Russian Ministry of Health (Sechenov University), Moscow, 119991 Russia
- Institute of Cytology and Genetics SB RAS, 630090 Novosibirsk, Russia
- Agrarian and Technological Institute, Peoples’ Friendship University of Russia, 117198 Moscow, Russia
| | - Nina G. Orlova
- Department of Mathematics, Financial University under the Government of the Russian Federation, Moscow, 125167 Russia
| |
Collapse
|
10
|
Wang Y, Zhou J, Ye J, Sun Z, He Y, Zhao Y, Ren S, Zhang G, Liu M, Zheng P, Wang G, Yang J. Multi-omics reveal microbial determinants impacting the treatment outcome of antidepressants in major depressive disorder. MICROBIOME 2023; 11:195. [PMID: 37641148 PMCID: PMC10464022 DOI: 10.1186/s40168-023-01635-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Accepted: 07/30/2023] [Indexed: 08/31/2023]
Abstract
BACKGROUND There is a growing body of evidence suggesting that disturbance of the gut-brain axis may be one of the potential causes of major depressive disorder (MDD). However, the effects of antidepressants on the gut microbiota, and the role of gut microbiota in influencing antidepressant efficacy are still not fully understood. RESULTS To address this knowledge gap, a multi-omics study was undertaken involving 110 MDD patients treated with escitalopram (ESC) for a period of 12 weeks. This study was conducted within a cohort and compared to a reference group of 166 healthy individuals. It was found that ESC ameliorated abnormal blood metabolism by upregulating MDD-depleted amino acids and downregulating MDD-enriched fatty acids. On the other hand, the use of ESC showed a relatively weak inhibitory effect on the gut microbiota, leading to a reduction in microbial richness and functions. Machine learning-based multi-omics integrative analysis revealed that gut microbiota contributed to the changes in plasma metabolites and was associated with several amino acids such as tryptophan and its gut microbiota-derived metabolite, indole-3-propionic acid (I3PA). Notably, a significant correlation was observed between the baseline microbial richness and clinical remission at week 12. Compared to non-remitters, individuals who achieved remission had a higher baseline microbial richness, a lower dysbiosis score, and a more complex and well-organized community structure and bacterial networks within their microbiota. These findings indicate a more resilient microbiota community in remitters. Furthermore, we also demonstrated that it was not the composition of the gut microbiota itself, but rather the presence of sporulation genes at baseline that could predict the likelihood of clinical remission following ESC treatment. The predictive model based on these genes revealed an area under the curve (AUC) performance metric of 0.71. CONCLUSION This study provides valuable insights into the role of the gut microbiota in the mechanism of ESC treatment efficacy for patients with MDD. The findings represent a significant advancement in understanding the intricate relationship among antidepressants, gut microbiota, and the blood metabolome. Additionally, this study offers a microbiota-centered perspective that can potentially improve antidepressant efficacy in clinical practice. By shedding light on the interplay between these factors, this research contributes to our broader understanding of the complex mechanisms underlying the treatment of MDD and opens new avenues for optimizing therapeutic approaches. Video Abstract.
Collapse
Affiliation(s)
- Yaping Wang
- Beijing Key Laboratory of Mental Disorders, National Clinical Research Center for Mental Disorders & National Center for Mental Disorders, Beijing Anding Hospital, Capital Medical University, Beijing, 100088, China
- Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing, 100069, China
| | - Jingjing Zhou
- Beijing Key Laboratory of Mental Disorders, National Clinical Research Center for Mental Disorders & National Center for Mental Disorders, Beijing Anding Hospital, Capital Medical University, Beijing, 100088, China
- Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing, 100069, China
| | - Junbin Ye
- Beijing WeGenome Paradigm Co., Ltd, Beijing, China
| | - Zuoli Sun
- Beijing Key Laboratory of Mental Disorders, National Clinical Research Center for Mental Disorders & National Center for Mental Disorders, Beijing Anding Hospital, Capital Medical University, Beijing, 100088, China
- Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing, 100069, China
| | - Yi He
- Beijing Key Laboratory of Mental Disorders, National Clinical Research Center for Mental Disorders & National Center for Mental Disorders, Beijing Anding Hospital, Capital Medical University, Beijing, 100088, China
- Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing, 100069, China
| | - Yingxin Zhao
- Beijing Key Laboratory of Mental Disorders, National Clinical Research Center for Mental Disorders & National Center for Mental Disorders, Beijing Anding Hospital, Capital Medical University, Beijing, 100088, China
- Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing, 100069, China
| | - Siyu Ren
- Beijing Key Laboratory of Mental Disorders, National Clinical Research Center for Mental Disorders & National Center for Mental Disorders, Beijing Anding Hospital, Capital Medical University, Beijing, 100088, China
- Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing, 100069, China
| | - Guofu Zhang
- Beijing Key Laboratory of Mental Disorders, National Clinical Research Center for Mental Disorders & National Center for Mental Disorders, Beijing Anding Hospital, Capital Medical University, Beijing, 100088, China
- Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing, 100069, China
| | - Min Liu
- Beijing Key Laboratory of Mental Disorders, National Clinical Research Center for Mental Disorders & National Center for Mental Disorders, Beijing Anding Hospital, Capital Medical University, Beijing, 100088, China
- Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing, 100069, China
| | - Peng Zheng
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment On Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Gang Wang
- Beijing Key Laboratory of Mental Disorders, National Clinical Research Center for Mental Disorders & National Center for Mental Disorders, Beijing Anding Hospital, Capital Medical University, Beijing, 100088, China.
- Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing, 100069, China.
| | - Jian Yang
- Beijing Key Laboratory of Mental Disorders, National Clinical Research Center for Mental Disorders & National Center for Mental Disorders, Beijing Anding Hospital, Capital Medical University, Beijing, 100088, China.
- Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing, 100069, China.
| |
Collapse
|
11
|
Wu Y, Li A, Liu H, Zhang Z, Zhang C, Ma C, Zhang L, Zhang J. Lactobacillus plantarum HNU082 alleviates dextran sulfate sodium-induced ulcerative colitis in mice through regulating gut microbiome. Food Funct 2022; 13:10171-10185. [PMID: 36111438 DOI: 10.1039/d2fo02303b] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Probiotics have shown good efficacy in the prevention of ulcerative colitis (UC), but the specific mechanism remains unclear. Therefore, shotgun metagenomic and transcriptome analyses were performed to explore the preventive effect of a potential probiotic Lactobacillus plantarum HNU082 (Lp082) on UC and its specific mechanism. The results showed that Lp082 intervention ameliorated dextran sulfate sodium (DSS)-induced UC in mice, which was manifested in the increase in body weight, water intake, food intake, and colon length and the decrease in the DAI index, immune organ index, inflammatory factors and histopathological scores after Lp082 intake. The mechanism is deeply studied and it is discovered that Lp082 improves the intestinal mucosal barrier by co-optimizing biological barriers, chemical barriers, mechanical barriers, and immune barriers. Specifically, Lp082 improved the biological barrier by increasing the diversity, optimizing the species composition and the structure of the gut microbiota, increasing bacteria producing short chain fatty acids (SCFAs), and activating microbial metabolic pathways producing SCFAs so as to enhance the content of SCFAs. Lp082 optimized the chemical barrier by decreasing the mRNA expression of ICAM-1 and VCAM and by increasing the content of goblet cells and the mRNA expression and immunofluorescent protein content of mucin2. Lp082 ameliorated the mechanical barrier by decreasing the mRNA expression of claudin-1 and claudin-2, and by increasing the mRNA expression of ZO-1 and ZO-2 and the immunofluorescent protein content of ZO-1. Lp082 also optimized the immune barrier by increasing the mRNA expression of IL-10, TGF-β1, and TGF-β2 and by decreasing the mRNA expression and protein contents of IL-6, tumour necrosis factor-alpha (TNF-α) and myeloperoxidase (MPO). In addition, Lp082 can also regulate the metabolic pathways of inflammation and disease in mice, and notably, Lp082 inhibits the NF-κB signaling pathway by inhibiting NF-κB signaling molecules to alleviate UC. In conclusion, improving gut microbiota dysbiosis, protecting the intestinal mucosal barrier, regulating inflammatory and disease pathways, and affecting neutrophil infiltration are the potential mechanisms of probiotic Lp082 in alleviating UC. Our study enriches the mechanism and provides a new prospect for Lactobacillus plantarum HNU082 in the prevention of colitis, provides support for the development of probiotic-based microbial products as an alternative prevention strategy for UC, and provides guidance for the future probiotic prevention of human colitis.
Collapse
Affiliation(s)
- Yuqing Wu
- Key Laboratory of Food Nutrition and Functional Food of Hainan Province, College of Food Science and Engineering, Hainan University, Haikou 570228, China.
| | - Ao Li
- Key Laboratory of Food Nutrition and Functional Food of Hainan Province, College of Food Science and Engineering, Hainan University, Haikou 570228, China.
| | - Huanwei Liu
- Key Laboratory of Food Nutrition and Functional Food of Hainan Province, College of Food Science and Engineering, Hainan University, Haikou 570228, China.
| | - Zeng Zhang
- Key Laboratory of Food Nutrition and Functional Food of Hainan Province, College of Food Science and Engineering, Hainan University, Haikou 570228, China.
| | - Chengcheng Zhang
- School of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, China
| | - Chenchen Ma
- Key Laboratory of Food Nutrition and Functional Food of Hainan Province, College of Food Science and Engineering, Hainan University, Haikou 570228, China.
| | - Lin Zhang
- Key Laboratory of Food Nutrition and Functional Food of Hainan Province, College of Food Science and Engineering, Hainan University, Haikou 570228, China.
| | - Jiachao Zhang
- Key Laboratory of Food Nutrition and Functional Food of Hainan Province, College of Food Science and Engineering, Hainan University, Haikou 570228, China. .,One Health Institute, Hainan University, Haikou, Hainan 570228, China
| |
Collapse
|
12
|
Karr AF, Hauzel J, Porter AA, Schaefer M. Measuring quality of DNA sequence data via degradation. PLoS One 2022; 17:e0271970. [PMID: 35921272 PMCID: PMC9348684 DOI: 10.1371/journal.pone.0271970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 07/11/2022] [Indexed: 11/26/2022] Open
Abstract
We formulate and apply a novel paradigm for characterization of genome data quality, which quantifies the effects of intentional degradation of quality. The rationale is that the higher the initial quality, the more fragile the genome and the greater the effects of degradation. We demonstrate that this phenomenon is ubiquitous, and that quantified measures of degradation can be used for multiple purposes, illustrated by outlier detection. We focus on identifying outliers that may be problematic with respect to data quality, but might also be true anomalies or even attempts to subvert the database.
Collapse
Affiliation(s)
- Alan F. Karr
- Fraunhofer USA Center Mid-Atlantic, Riverdale, MD, United States of America
- * E-mail:
| | - Jason Hauzel
- Fraunhofer USA Center Mid-Atlantic, Riverdale, MD, United States of America
| | - Adam A. Porter
- Fraunhofer USA Center Mid-Atlantic, Riverdale, MD, United States of America
- Department of Computer Science, University of Maryland, College Park, MD, United States of America
| | - Marcel Schaefer
- Fraunhofer USA Center Mid-Atlantic, Riverdale, MD, United States of America
| |
Collapse
|
13
|
Fernández-López M, Sánchez-Reyes A, Barcelos C, Sidón-Ceseña K, Leite RB, Lago-Lestón A. Deep-Sea Sediments from the Southern Gulf of Mexico Harbor a Wide Diversity of PKS I Genes. Antibiotics (Basel) 2022; 11:antibiotics11070887. [PMID: 35884142 PMCID: PMC9311598 DOI: 10.3390/antibiotics11070887] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Revised: 06/08/2022] [Accepted: 06/20/2022] [Indexed: 11/19/2022] Open
Abstract
The excessive use of antibiotics has triggered the appearance of new resistant strains, which is why great interest has been taken in the search for new bioactive compounds capable of overcoming this emergency in recent years. Massive sequencing tools have enabled the detection of new microorganisms that cannot be cultured in a laboratory, thus opening the door to the search for new biosynthetic genes. The great variety in oceanic environments in terms of pressure, salinity, temperature, and nutrients enables marine microorganisms to develop unique biochemical and physiological properties for their survival, enhancing the production of secondary metabolites that can vary from those produced by terrestrial microorganisms. We performed a search for type I PKS genes in metagenomes obtained from the marine sediments of the deep waters of the Gulf of Mexico using Hidden Markov Models. More than 2000 candidate genes were detected in the metagenomes that code for type I PKS domains, while biosynthetic pathways that may code for other secondary metabolites were also detected. Our research demonstrates the great potential use of the marine sediments of the Gulf of Mexico for identifying genes that code for new secondary metabolites.
Collapse
Affiliation(s)
- Maikel Fernández-López
- Centro de Investigación en Dinámica Celular, Instituto de Investigación en Ciencias Básicas y Aplicadas, Universidad Autónoma del Estado de Morelos, Av. Universidad 1001, Col. Chamilpa, Cuernavaca 62209, Mexico;
| | - Ayixon Sánchez-Reyes
- CONACYT-Instituto de Biotecnología, Universidad Nacional Autónoma de México (UNAM), Av. Universidad 2001, Col. Chamilpa, Cuernavaca 62210, Mexico;
| | - Clara Barcelos
- Posgrado de Ciencias de la Vida, Centro de Investigación Científica y de Educación Superior de Ensenada, Carretera Ensenada-Tijuana No. 3918, Zona Playitas, Ensenada 22860, Mexico; (C.B.); (K.S.-C.)
- Departamento de Innovación Biomédica, Centro de Investigación Científica y de Educación Superior de Ensenada, Carretera Ensenada-Tijuana No. 3918, Zona Playitas, Ensenada 22860, Mexico
| | - Karla Sidón-Ceseña
- Posgrado de Ciencias de la Vida, Centro de Investigación Científica y de Educación Superior de Ensenada, Carretera Ensenada-Tijuana No. 3918, Zona Playitas, Ensenada 22860, Mexico; (C.B.); (K.S.-C.)
- Departamento de Innovación Biomédica, Centro de Investigación Científica y de Educación Superior de Ensenada, Carretera Ensenada-Tijuana No. 3918, Zona Playitas, Ensenada 22860, Mexico
| | - Ricardo B. Leite
- Instituto Gulbenkian de Ciência, Rua da Quinta Grande, 6, 2780-156 Oeiras, Portugal;
| | - Asunción Lago-Lestón
- Departamento de Innovación Biomédica, Centro de Investigación Científica y de Educación Superior de Ensenada, Carretera Ensenada-Tijuana No. 3918, Zona Playitas, Ensenada 22860, Mexico
- Correspondence:
| |
Collapse
|
14
|
Goussarov G, Mysara M, Vandamme P, Van Houdt R. Introduction to the principles and methods underlying the recovery of metagenome-assembled genomes from metagenomic data. Microbiologyopen 2022; 11:e1298. [PMID: 35765182 PMCID: PMC9179125 DOI: 10.1002/mbo3.1298] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 05/19/2022] [Accepted: 05/19/2022] [Indexed: 11/18/2022] Open
Abstract
The rise of metagenomics offers a leap forward for understanding the genetic diversity of microorganisms in many different complex environments by providing a platform that can identify potentially unlimited numbers of known and novel microorganisms. As such, it is impossible to imagine new major initiatives without metagenomics. Nevertheless, it represents a relatively new discipline with various levels of complexity and demands on bioinformatics. The underlying principles and methods used in metagenomics are often seen as common knowledge and often not detailed or fragmented. Therefore, we reviewed these to guide microbiologists in taking the first steps into metagenomics. We specifically focus on a workflow aimed at reconstructing individual genomes, that is, metagenome-assembled genomes, integrating DNA sequencing, assembly, binning, identification and annotation.
Collapse
Affiliation(s)
- Gleb Goussarov
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN)MolBelgium
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of SciencesGhent UniversityGhentBelgium
| | - Mohamed Mysara
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN)MolBelgium
| | - Peter Vandamme
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of SciencesGhent UniversityGhentBelgium
| | - Rob Van Houdt
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN)MolBelgium
| |
Collapse
|
15
|
Robinson SL, Piel J, Sunagawa S. A roadmap for metagenomic enzyme discovery. Nat Prod Rep 2021; 38:1994-2023. [PMID: 34821235 PMCID: PMC8597712 DOI: 10.1039/d1np00006c] [Citation(s) in RCA: 73] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Indexed: 12/13/2022]
Abstract
Covering: up to 2021Metagenomics has yielded massive amounts of sequencing data offering a glimpse into the biosynthetic potential of the uncultivated microbial majority. While genome-resolved information about microbial communities from nearly every environment on earth is now available, the ability to accurately predict biocatalytic functions directly from sequencing data remains challenging. Compared to primary metabolic pathways, enzymes involved in secondary metabolism often catalyze specialized reactions with diverse substrates, making these pathways rich resources for the discovery of new enzymology. To date, functional insights gained from studies on environmental DNA (eDNA) have largely relied on PCR- or activity-based screening of eDNA fragments cloned in fosmid or cosmid libraries. As an alternative, shotgun metagenomics holds underexplored potential for the discovery of new enzymes directly from eDNA by avoiding common biases introduced through PCR- or activity-guided functional metagenomics workflows. However, inferring new enzyme functions directly from eDNA is similar to searching for a 'needle in a haystack' without direct links between genotype and phenotype. The goal of this review is to provide a roadmap to navigate shotgun metagenomic sequencing data and identify new candidate biosynthetic enzymes. We cover both computational and experimental strategies to mine metagenomes and explore protein sequence space with a spotlight on natural product biosynthesis. Specifically, we compare in silico methods for enzyme discovery including phylogenetics, sequence similarity networks, genomic context, 3D structure-based approaches, and machine learning techniques. We also discuss various experimental strategies to test computational predictions including heterologous expression and screening. Finally, we provide an outlook for future directions in the field with an emphasis on meta-omics, single-cell genomics, cell-free expression systems, and sequence-independent methods.
Collapse
Affiliation(s)
| | - Jörn Piel
- Eidgenössische Technische Hochschule (ETH), Zürich, Switzerland.
| | | |
Collapse
|
16
|
Jouffret V, Miotello G, Culotta K, Ayrault S, Pible O, Armengaud J. Increasing the power of interpretation for soil metaproteomics data. MICROBIOME 2021; 9:195. [PMID: 34587999 PMCID: PMC8482631 DOI: 10.1186/s40168-021-01139-1] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Accepted: 07/29/2021] [Indexed: 05/07/2023]
Abstract
BACKGROUND Soil and sediment microorganisms are highly phylogenetically diverse but are currently largely under-represented in public molecular databases. Their functional characterization by means of metaproteomics is usually performed using metagenomic sequences acquired for the same sample. However, such hugely diverse metagenomic datasets are difficult to assemble; in parallel, theoretical proteomes from isolates available in generic databases are of high quality. Both these factors advocate for the use of theoretical proteomes in metaproteomics interpretation pipelines. Here, we examined a number of database construction strategies with a view to increasing the outputs of metaproteomics studies performed on soil samples. RESULTS The number of peptide-spectrum matches was found to be of comparable magnitude when using public or sample-specific metagenomics-derived databases. However, numbers were significantly increased when a combination of both types of information was used in a two-step cascaded search. Our data also indicate that the functional annotation of the metaproteomics dataset can be maximized by using a combination of both types of databases. CONCLUSIONS A two-step strategy combining sample-specific metagenome database and public databases such as the non-redundant NCBI database and a massive soil gene catalog allows maximizing the metaproteomic interpretation both in terms of ratio of assigned spectra and retrieval of function-derived information. Video abstract.
Collapse
Affiliation(s)
- Virginie Jouffret
- Université Paris-Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé (DMTS), SPI, F-30200, Bagnols-sur-Cèze, France
- Laboratoire des Sciences et de l'Environnement (LSCE-IPSL), UMR 8212 (CEA/CNRS/UVSQ), CEA Saclay, Université Paris-Saclay, Orme des Merisiers, F-91191, Gif-sur-Yvette, France
- Laboratoire Innovations technologiques pour la Détection et le Diagnostic (Li2D), Université de Montpellier, F-30207, Bagnols-sur-Cèze, France
| | - Guylaine Miotello
- Université Paris-Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé (DMTS), SPI, F-30200, Bagnols-sur-Cèze, France
| | - Karen Culotta
- Université Paris-Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé (DMTS), SPI, F-30200, Bagnols-sur-Cèze, France
| | - Sophie Ayrault
- Laboratoire des Sciences et de l'Environnement (LSCE-IPSL), UMR 8212 (CEA/CNRS/UVSQ), CEA Saclay, Université Paris-Saclay, Orme des Merisiers, F-91191, Gif-sur-Yvette, France
| | - Olivier Pible
- Université Paris-Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé (DMTS), SPI, F-30200, Bagnols-sur-Cèze, France
| | - Jean Armengaud
- Université Paris-Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé (DMTS), SPI, F-30200, Bagnols-sur-Cèze, France.
| |
Collapse
|
17
|
Hofmeyr S, Egan R, Georganas E, Copeland AC, Riley R, Clum A, Eloe-Fadrosh E, Roux S, Goltsman E, Buluç A, Rokhsar D, Oliker L, Yelick K. Terabase-scale metagenome coassembly with MetaHipMer. Sci Rep 2020; 10:10689. [PMID: 32612216 PMCID: PMC7329831 DOI: 10.1038/s41598-020-67416-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Accepted: 06/05/2020] [Indexed: 01/13/2023] Open
Abstract
Metagenome sequence datasets can contain terabytes of reads, too many to be coassembled together on a single shared-memory computer; consequently, they have only been assembled sample by sample (multiassembly) and combining the results is challenging. We can now perform coassembly of the largest datasets using MetaHipMer, a metagenome assembler designed to run on supercomputers and large clusters of compute nodes. We have reported on the implementation of MetaHipMer previously; in this paper we focus on analyzing the impact of very large coassembly. In particular, we show that coassembly recovers a larger genome fraction than multiassembly and enables the discovery of more complete genomes, with lower error rates, whereas multiassembly recovers more dominant strain variation. Being able to coassemble a large dataset does not preclude one from multiassembly; rather, having a fast, scalable metagenome assembler enables a user to more easily perform coassembly and multiassembly, and assemble both abundant, high strain variation genomes, and low-abundance, rare genomes. We present several assemblies of terabyte datasets that could never be coassembled before, demonstrating MetaHipMer’s scaling power. MetaHipMer is available for public use under an open source license and all datasets used in the paper are available for public download.
Collapse
Affiliation(s)
- Steven Hofmeyr
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
| | - Rob Egan
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Alex C Copeland
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Robert Riley
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Alicia Clum
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Emiley Eloe-Fadrosh
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Simon Roux
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Eugene Goltsman
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Aydın Buluç
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.,Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, 94720, USA
| | - Daniel Rokhsar
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.,Department of Molecular and Cellular Biology, University of California, Berkeley, CA, 94720, USA
| | - Leonid Oliker
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Katherine Yelick
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.,Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, 94720, USA
| |
Collapse
|
18
|
Nawaz A, Purahong W, Herrmann M, Küsel K, Buscot F, Wubet T. DNA- and RNA- Derived Fungal Communities in Subsurface Aquifers Only Partly Overlap but React Similarly to Environmental Factors. Microorganisms 2019; 7:microorganisms7090341. [PMID: 31514383 PMCID: PMC6780912 DOI: 10.3390/microorganisms7090341] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Revised: 09/08/2019] [Accepted: 09/09/2019] [Indexed: 12/15/2022] Open
Abstract
Recent advances in high-throughput sequencing (HTS) technologies have revolutionized our understanding of microbial diversity and composition in relation to their environment. HTS-based characterization of metabolically active (RNA-derived) and total (DNA-derived) fungal communities in different terrestrial habitats has revealed profound differences in both richness and community compositions. However, such DNA- and RNA-based HTS comparisons are widely missing for fungal communities of groundwater aquifers in the terrestrial biogeosphere. Therefore, in this study, we extracted DNA and RNA from groundwater samples of two pristine aquifers in the Hainich CZE and employed paired-end Illumina sequencing of the fungal nuclear ribosomal internal transcribed spacer 2 (ITS2) region to comprehensively test difference/similarities in the “total” and “active” fungal communities. We found no significant differences in the species richness between the DNA- and RNA-derived fungal communities, but the relative abundances of various fungal operational taxonomic units (OTUs) appeared to differ. We also found the same set of environmental parameters to shape the “total” and “active” fungal communities in the targeted aquifers. Furthermore, our comparison also underlined that about 30%–40% of the fungal OTUs were only detected in RNA-derived communities. This implies that the active fungal communities analyzed by HTS methods in the subsurface aquifers are actually not a subset of supposedly total fungal communities. In general, our study highlights the importance of differentiating the potential (DNA-derived) and expressed (RNA-derived) members of the fungal communities in aquatic ecosystems.
Collapse
Affiliation(s)
- Ali Nawaz
- Helmholtz Centre for Environmental Research-UFZ, Department of Soil Ecology, 06120 Halle (Saale), Germany.
- Helmholtz Centre for Environmental Research-UFZ, Department of Community Ecology, 06120 Halle (Saale), Germany.
- Institute of Biology, University of Leipzig, 04103 Leipzig, Germany.
| | - Witoon Purahong
- Helmholtz Centre for Environmental Research-UFZ, Department of Soil Ecology, 06120 Halle (Saale), Germany.
| | - Martina Herrmann
- Institute of Biodiversity, Friedrich Schiller University Jena, Dornburger Straße 159, 07743 Jena, Germany.
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Deutscher Platz 5e, 04103 Leipzig, Germany.
| | - Kirsten Küsel
- Institute of Biodiversity, Friedrich Schiller University Jena, Dornburger Straße 159, 07743 Jena, Germany.
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Deutscher Platz 5e, 04103 Leipzig, Germany.
| | - François Buscot
- Helmholtz Centre for Environmental Research-UFZ, Department of Soil Ecology, 06120 Halle (Saale), Germany.
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Deutscher Platz 5e, 04103 Leipzig, Germany.
| | - Tesfaye Wubet
- Helmholtz Centre for Environmental Research-UFZ, Department of Soil Ecology, 06120 Halle (Saale), Germany.
- Helmholtz Centre for Environmental Research-UFZ, Department of Community Ecology, 06120 Halle (Saale), Germany.
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Deutscher Platz 5e, 04103 Leipzig, Germany.
| |
Collapse
|