1
|
Abstract
Invertebrates, particularly sponges, have been a dominant source of new marine natural products. For example, lasonolide A (LSA) is a potential anticancer molecule isolated from the marine sponge Forcepia sp., with nanomolar growth inhibitory activity and a unique cytotoxicity profile against the National Cancer Institute 60-cell-line screen. Here, we identified the putative biosynthetic pathway for LSA. Genomic binning of the Forcepia sponge metagenome revealed a Gram-negative bacterium belonging to the phylum Verrucomicrobia as the candidate producer of LSA. Phylogenetic analysis showed that this bacterium, here named "Candidatus Thermopylae lasonolidus," only has 88.78% 16S rRNA identity with the closest relative, Pedosphaera parvula Ellin514, indicating that it represents a new genus. The lasonolide A (las) biosynthetic gene cluster (BGC) was identified as a trans-acyltransferase (AT) polyketide synthase (PKS) pathway. Compared with its host genome, the las BGC exhibits a significantly different GC content and pentanucleotide frequency, suggesting a potential horizontal acquisition of the gene cluster. Furthermore, three copies of the putative las pathway were identified in the candidate producer genome. Differences between the three las repeats were observed, including the presence of three insertions, two single-nucleotide polymorphisms, and the absence of a stand-alone acyl carrier protein in one of the repeats. Even though the verrucomicrobial producer shows signs of genome reduction, its genome size is still fairly large (about 5 Mbp), and, compared to its closest free-living relative, it contains most of the primary metabolic pathways, suggesting that it is in the early stages of reduction. IMPORTANCE While sponges are valuable sources of bioactive natural products, a majority of these compounds are produced in small quantities by uncultured symbionts, hampering the study and clinical development of these unique compounds. Lasonolide A (LSA), isolated from marine sponge Forcepia sp., is a cytotoxic molecule active at nanomolar concentrations, which causes premature chromosome condensation, blebbing, cell contraction, and loss of cell adhesion, indicating a novel mechanism of action and making it a potential anticancer drug lead. However, its limited supply hampers progression to clinical trials. We investigated the microbiome of Forcepia sp. using culture-independent DNA sequencing, identified genes likely responsible for LSA synthesis in an uncultured bacterium, and assembled the symbiont's genome. These insights provide future opportunities for heterologous expression and cultivation efforts that may minimize LSA's supply problem.
Collapse
|
2
|
Smith AR, Mueller R, Fisk MR, Colwell FS. Ancient Metabolisms of a Thermophilic Subseafloor Bacterium. Front Microbiol 2021; 12:764631. [PMID: 34925271 PMCID: PMC8671834 DOI: 10.3389/fmicb.2021.764631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 10/22/2021] [Indexed: 11/13/2022] Open
Abstract
The ancient origins of metabolism may be rooted deep in oceanic crust, and these early metabolisms may have persisted in the habitable thermal anoxic aquifer where conditions remain similar to those when they first appeared. The Wood–Ljungdahl pathway for acetogenesis is a key early biosynthetic pathway with the potential to influence ocean chemistry and productivity, but its contemporary role in oceanic crust is not well established. Here, we describe the genome of a novel acetogen from a thermal suboceanic aquifer olivine biofilm in the basaltic crust of the Juan de Fuca Ridge (JdFR) whose genome suggests it may utilize an ancient chemosynthetic lifestyle. This organism encodes the genes for the complete canonical Wood–Ljungdahl pathway, but is potentially unable to use sulfate and certain organic carbon sources such as lipids and carbohydrates to supplement its energy requirements, unlike other known acetogens. Instead, this organism may use peptides and amino acids for energy or as organic carbon sources. Additionally, genes involved in surface adhesion, the import of metallic cations found in Fe-bearing minerals, and use of molecular hydrogen, a product of serpentinization reactions between water and olivine, are prevalent within the genome. These adaptations are likely a reflection of local environmental micro-niches, where cells are adapted to life in biofilms using ancient chemosynthetic metabolisms dependent on H2 and iron minerals. Since this organism is phylogenetically distinct from a related acetogenic group of Clostridiales, we propose it as a new species, Candidatus Acetocimmeria pyornia.
Collapse
Affiliation(s)
- Amy R Smith
- Department of Science, Mathematics, and Computing, Bard College at Simon's Rock, Great Barrington, MA, United States.,Department of Marine Chemistry and Geochemistry, Woods Hole Oceanographic Institution, Woods Hole, MA, United States.,College of Earth, Ocean, and Atmospheric Sciences, Oregon State University, Corvallis, OR, United States
| | - Ryan Mueller
- College of Earth, Ocean, and Atmospheric Sciences, Oregon State University, Corvallis, OR, United States
| | - Martin R Fisk
- College of Earth, Ocean, and Atmospheric Sciences, Oregon State University, Corvallis, OR, United States
| | - Frederick S Colwell
- College of Earth, Ocean, and Atmospheric Sciences, Oregon State University, Corvallis, OR, United States
| |
Collapse
|
3
|
Discovering multiple types of DNA methylation from bacteria and microbiome using nanopore sequencing. Nat Methods 2021; 18:491-498. [PMID: 33820988 PMCID: PMC8107137 DOI: 10.1038/s41592-021-01109-3] [Citation(s) in RCA: 66] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Accepted: 03/03/2021] [Indexed: 01/09/2023]
Abstract
Bacterial DNA methylation occurs at diverse sequence contexts and plays important functional roles in cellular defense and gene regulation. Existing methods for detecting DNA modification from nanopore sequencing data do not effectively support de novo study of unknown bacterial methylomes. In this work, we observed that a nanopore sequencing signal displays complex heterogeneity across methylation events of the same type. To enable nanopore sequencing for broadly applicable methylation discovery, we generated a training dataset from an assortment of bacterial species and developed a method, named nanodisco ( https://github.com/fanglab/nanodisco ), that couples the identification and fine mapping of the three forms of methylation into a multi-label classification framework. We applied it to individual bacteria and the mouse gut microbiome for reliable methylation discovery. In addition, we demonstrated the use of DNA methylation for binning metagenomic contigs, associating mobile genetic elements with their host genomes and identifying misassembled metagenomic contigs.
Collapse
|
4
|
Wickramarachchi A, Mallawaarachchi V, Rajan V, Lin Y. MetaBCC-LR: metagenomics binning by coverage and composition for long reads. Bioinformatics 2020; 36:i3-i11. [PMID: 32657364 PMCID: PMC7355282 DOI: 10.1093/bioinformatics/btaa441] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION Metagenomics studies have provided key insights into the composition and structure of microbial communities found in different environments. Among the techniques used to analyse metagenomic data, binning is considered a crucial step to characterize the different species of micro-organisms present. The use of short-read data in most binning tools poses several limitations, such as insufficient species-specific signal, and the emergence of long-read sequencing technologies offers us opportunities to surmount them. However, most current metagenomic binning tools have been developed for short reads. The few tools that can process long reads either do not scale with increasing input size or require a database with reference genomes that are often unknown. In this article, we present MetaBCC-LR, a scalable reference-free binning method which clusters long reads directly based on their k-mer coverage histograms and oligonucleotide composition. RESULTS We evaluate MetaBCC-LR on multiple simulated and real metagenomic long-read datasets with varying coverages and error rates. Our experiments demonstrate that MetaBCC-LR substantially outperforms state-of-the-art reference-free binning tools, achieving ∼13% improvement in F1-score and ∼30% improvement in ARI compared to the best previous tools. Moreover, we show that using MetaBCC-LR before long-read assembly helps to enhance the assembly quality while significantly reducing the assembly cost in terms of time and memory usage. The efficiency and accuracy of MetaBCC-LR pave the way for more effective long-read-based metagenomics analyses to support a wide range of applications. AVAILABILITY AND IMPLEMENTATION The source code is freely available at: https://github.com/anuradhawick/MetaBCC-LR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anuradha Wickramarachchi
- Research School of Computer Science, College of Engineering and Computer Science, Australian National University, Canberra, ACT 0200, Australia
| | - Vijini Mallawaarachchi
- Research School of Computer Science, College of Engineering and Computer Science, Australian National University, Canberra, ACT 0200, Australia
| | - Vaibhav Rajan
- Department of Information Systems and Analytics, School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Yu Lin
- Research School of Computer Science, College of Engineering and Computer Science, Australian National University, Canberra, ACT 0200, Australia
| |
Collapse
|
5
|
Miller IJ, Rees ER, Ross J, Miller I, Baxa J, Lopera J, Kerby RL, Rey FE, Kwan JC. Autometa: automated extraction of microbial genomes from individual shotgun metagenomes. Nucleic Acids Res 2019; 47:e57. [PMID: 30838416 PMCID: PMC6547426 DOI: 10.1093/nar/gkz148] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 02/15/2019] [Accepted: 02/21/2019] [Indexed: 12/28/2022] Open
Abstract
Shotgun metagenomics is a powerful, high-resolution technique enabling the study of microbial communities in situ. However, species-level resolution is only achieved after a process of 'binning' where contigs predicted to originate from the same genome are clustered. Such culture-independent sequencing frequently unearths novel microbes, and so various methods have been devised for reference-free binning. As novel microbiomes of increasing complexity are explored, sometimes associated with non-model hosts, robust automated binning methods are required. Existing methods struggle with eukaryotic contamination and cannot handle highly complex single metagenomes. We therefore developed an automated binning pipeline, termed 'Autometa', to address these issues. This command-line application integrates sequence homology, nucleotide composition, coverage and the presence of single-copy marker genes to separate microbial genomes from non-model host genomes and other eukaryotic contaminants, before deconvoluting individual genomes from single metagenomes. The method is able to effectively separate over 1000 genomes from a metagenome, allowing the study of previously intractably complex environments at the level of single species. Autometa is freely available at https://bitbucket.org/jason_c_kwan/autometa and as a docker image at https://hub.docker.com/r/jasonkwan/autometa under the GNU Affero General Public License 3 (AGPL 3).
Collapse
Affiliation(s)
- Ian J Miller
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin–Madison, 777 Highland Avenue, Madison, WI 53705, USA
| | - Evan R Rees
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin–Madison, 777 Highland Avenue, Madison, WI 53705, USA
| | - Jennifer Ross
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin–Madison, 777 Highland Avenue, Madison, WI 53705, USA
| | - Izaak Miller
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin–Madison, 777 Highland Avenue, Madison, WI 53705, USA
| | - Jared Baxa
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin–Madison, 777 Highland Avenue, Madison, WI 53705, USA
| | - Juan Lopera
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin–Madison, 777 Highland Avenue, Madison, WI 53705, USA
| | - Robert L Kerby
- Department of Bacteriology, University of Wisconsin–Madison, 1550 Linden Drive, Madison, WI 53706, USA
| | - Federico E Rey
- Department of Bacteriology, University of Wisconsin–Madison, 1550 Linden Drive, Madison, WI 53706, USA
| | - Jason C Kwan
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin–Madison, 777 Highland Avenue, Madison, WI 53705, USA
| |
Collapse
|
6
|
Carbon fixation and energy metabolisms of a subseafloor olivine biofilm. ISME JOURNAL 2019; 13:1737-1749. [PMID: 30867546 DOI: 10.1038/s41396-019-0385-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Revised: 02/15/2019] [Accepted: 02/28/2019] [Indexed: 11/08/2022]
Abstract
Earth's largest aquifer ecosystem resides in igneous oceanic crust, where chemosynthesis and water-rock reactions provide the carbon and energy that support an active deep biosphere. The Calvin Cycle is the predominant carbon fixation pathway in cool, oxic, crust; however, the energy and carbon metabolisms in the deep thermal basaltic aquifer are poorly understood. Anaerobic carbon fixation pathways such as the Wood-Ljungdahl pathway, which uses hydrogen (H2) and CO2, may be common in thermal aquifers since water-rock reactions can produce H2 in hydrothermal environments and bicarbonate is abundant in seawater. To test this, we reconstructed the metabolisms of eleven bacterial and archaeal metagenome-assembled genomes from an olivine biofilm obtained from a Juan de Fuca Ridge basaltic aquifer. We found that the dominant carbon fixation pathway was the Wood-Ljungdahl pathway, which was present in seven of the eight bacterial genomes. Anaerobic respiration appears to be driven by sulfate reduction, and one bacterial genome contained a complete nitrogen fixation pathway. This study reveals the potential pathways for carbon and energy flux in the deep anoxic thermal aquifer ecosystem, and suggests that ancient H2-based chemolithoautotrophy, which once dominated Earth's early biosphere, may thus remain one of the dominant metabolisms in the suboceanic aquifer today.
Collapse
|
7
|
Wampach L, Heintz-Buschart A, Fritz JV, Ramiro-Garcia J, Habier J, Herold M, Narayanasamy S, Kaysen A, Hogan AH, Bindl L, Bottu J, Halder R, Sjöqvist C, May P, Andersson AF, de Beaufort C, Wilmes P. Birth mode is associated with earliest strain-conferred gut microbiome functions and immunostimulatory potential. Nat Commun 2018; 9:5091. [PMID: 30504906 PMCID: PMC6269548 DOI: 10.1038/s41467-018-07631-x] [Citation(s) in RCA: 203] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2017] [Accepted: 11/13/2018] [Indexed: 01/07/2023] Open
Abstract
The rate of caesarean section delivery (CSD) is increasing worldwide. It remains unclear whether disruption of mother-to-neonate transmission of microbiota through CSD occurs and whether it affects human physiology. Here we perform metagenomic analysis of earliest gut microbial community structures and functions. We identify differences in encoded functions between microbiomes of vaginally delivered (VD) and CSD neonates. Several functional pathways are over-represented in VD neonates, including lipopolysaccharide (LPS) biosynthesis. We link these enriched functions to individual-specific strains, which are transmitted from mothers to neonates in case of VD. The stimulation of primary human immune cells with LPS isolated from early stool samples of VD neonates results in higher levels of tumour necrosis factor (TNF-α) and interleukin 18 (IL-18). Accordingly, the observed levels of TNF-α and IL-18 in neonatal blood plasma are higher after VD. Taken together, our results support that CSD disrupts mother-to-neonate transmission of specific microbial strains, linked functional repertoires and immune-stimulatory potential during a critical window for neonatal immune system priming.
Collapse
Affiliation(s)
- Linda Wampach
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, avenue des Hauts-Fourneaux 7, 4362, Esch-sur-Alzette, Luxembourg
- Laboratoire National de Santé, rue Louis Rech 1, 3555, Dudelange, Luxembourg
| | - Anna Heintz-Buschart
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, avenue des Hauts-Fourneaux 7, 4362, Esch-sur-Alzette, Luxembourg
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Deutscher Platz 5e, 04103, Leipzig, Germany
- Helmholtz Centre for Environmental Research GmbH - UFZ, Theodor-Lieser-Str. 4, 06120, Halle (Saale), Germany
| | - Joëlle V Fritz
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, avenue des Hauts-Fourneaux 7, 4362, Esch-sur-Alzette, Luxembourg
- Centre Hospitalier de Luxembourg, rue Nicolas Ernest Barblé 4, 1210, Luxembourg, Luxembourg
| | - Javier Ramiro-Garcia
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, avenue des Hauts-Fourneaux 7, 4362, Esch-sur-Alzette, Luxembourg
| | - Janine Habier
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, avenue des Hauts-Fourneaux 7, 4362, Esch-sur-Alzette, Luxembourg
| | - Malte Herold
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, avenue des Hauts-Fourneaux 7, 4362, Esch-sur-Alzette, Luxembourg
| | - Shaman Narayanasamy
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, avenue des Hauts-Fourneaux 7, 4362, Esch-sur-Alzette, Luxembourg
- Megeno S.A., avenue des Hauts-Fourneaux 9, 4362, Esch-sur-Alzette, Luxembourg
| | - Anne Kaysen
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, avenue des Hauts-Fourneaux 7, 4362, Esch-sur-Alzette, Luxembourg
- Centre Hospitalier de Luxembourg, rue Nicolas Ernest Barblé 4, 1210, Luxembourg, Luxembourg
| | - Angela H Hogan
- Integrated BioBank of Luxembourg, rue Louis Rech 1, 3555, Dudelange, Luxembourg
| | - Lutz Bindl
- Centre Hospitalier de Luxembourg, rue Nicolas Ernest Barblé 4, 1210, Luxembourg, Luxembourg
| | - Jean Bottu
- Centre Hospitalier de Luxembourg, rue Nicolas Ernest Barblé 4, 1210, Luxembourg, Luxembourg
| | - Rashi Halder
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, avenue des Hauts-Fourneaux 7, 4362, Esch-sur-Alzette, Luxembourg
| | - Conny Sjöqvist
- KTH Royal Institute of Technology, Science for Life Laboratory, School of Biotechnology, Division of Gene Technology, Tomtebodavägen 23a, 17165, Solna, Sweden
- Environmental and Marine Biology, Åbo Akademi University, Tykistökatu 6, 20520, Turku, Finland
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, avenue des Hauts-Fourneaux 7, 4362, Esch-sur-Alzette, Luxembourg
| | - Anders F Andersson
- KTH Royal Institute of Technology, Science for Life Laboratory, School of Biotechnology, Division of Gene Technology, Tomtebodavägen 23a, 17165, Solna, Sweden
| | - Carine de Beaufort
- Centre Hospitalier de Luxembourg, rue Nicolas Ernest Barblé 4, 1210, Luxembourg, Luxembourg
| | - Paul Wilmes
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, avenue des Hauts-Fourneaux 7, 4362, Esch-sur-Alzette, Luxembourg.
| |
Collapse
|
8
|
Ariza-Jimenez L, Quintero OL, Pinel N. Unsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor Embeddings. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2018; 2018:1315-1318. [PMID: 30440633 DOI: 10.1109/embc.2018.8512529] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Shotgun metagenomic studies attempt to reconstruct population genome sequences from complex microbial communities. In some traditional genome demarcation approaches, high-dimensional sequence data are embedded into two-dimensional spaces and subsequently binned into candidate genomic populations. One such approach uses a combination of the Barnes-Hut approximation and the $t -$Stochastic Neighbor Embedding (BH-SNE) algorithm for dimensionality reduction of DNA sequence data pentamer profiles; and demarcation of groups based on Gaussian mixture models within humanimposed boundaries. We found that genome demarcation from three-dimensional BH-SNE embeddings consistently results in more accurate binnings than 2-D embeddings. We further addressed the lack of a priori population number information by developing an unsupervised binning approach based on the Subtractive and Fuzzy c-means (FCM) clustering algorithms combined with internal clustering validity indices. Lastly, we addressed the subject of shared membership of individual data objects in a mixed community by assigning a degree of membership to individual objects using the FCM algorithm, and discriminated between confidently binned and uncertain sequence data objects from the community for subsequent biological interpretation. The binning of metagenome sequence fragments according to thresholds in the degree of membership opens the door for the identification of horizontally transferred elements and other genomic regions of uncertain assignment in which biologically meaningful information resides. The reported approach improves the unsupervised genome demarcation of populations within complex communities, increases the confidence in the coherence of the binned elements, and enables the identification of evolutionary processes ignored in hard-binning approaches in shotgun metagenomic studies.
Collapse
|
9
|
Herath D, Tang SL, Tandon K, Ackland D, Halgamuge SK. CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision. BMC Bioinformatics 2017; 18:571. [PMID: 29297295 PMCID: PMC5751405 DOI: 10.1186/s12859-017-1967-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Background In metagenomics, the separation of nucleotide sequences belonging to an individual or closely matched populations is termed binning. Binning helps the evaluation of underlying microbial population structure as well as the recovery of individual genomes from a sample of uncultivable microbial organisms. Both supervised and unsupervised learning methods have been employed in binning; however, characterizing a metagenomic sample containing multiple strains remains a significant challenge. In this study, we designed and implemented a new workflow, Coverage and composition based binning of Metagenomes (CoMet), for binning contigs in a single metagenomic sample. CoMet utilizes coverage values and the compositional features of metagenomic contigs. The binning strategy in CoMet includes the initial grouping of contigs in guanine-cytosine (GC) content-coverage space and refinement of bins in tetranucleotide frequencies space in a purely unsupervised manner. With CoMet, the clustering algorithm DBSCAN is employed for binning contigs. The performances of CoMet were compared against four existing approaches for binning a single metagenomic sample, including MaxBin, Metawatt, MyCC (default) and MyCC (coverage) using multiple datasets including a sample comprised of multiple strains. Results Binning methods based on both compositional features and coverages of contigs had higher performances than the method which is based only on compositional features of contigs. CoMet yielded higher or comparable precision in comparison to the existing binning methods on benchmark datasets of varying complexities. MyCC (coverage) had the highest ranking score in F1-score. However, the performances of CoMet were higher than MyCC (coverage) on the dataset containing multiple strains. Furthermore, CoMet recovered contigs of more species and was 18 - 39% higher in precision than the compared existing methods in discriminating species from the sample of multiple strains. CoMet resulted in higher precision than MyCC (default) and MyCC (coverage) on a real metagenome. Conclusions The approach proposed with CoMet for binning contigs, improves the precision of binning while characterizing more species in a single metagenomic sample and in a sample containing multiple strains. The F1-scores obtained from different binning strategies vary with different datasets; however, CoMet yields the highest F1-score with a sample comprised of multiple strains. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1967-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Damayanthi Herath
- Department of Mechanical Engineering, The University of Melbourne, Parkville, Melbourne, 3010, Australia. .,Department of Computer Engineering, University of Peradeniya, Prof. E. O. E. Pereira Mawatha, Peradeniya, 20400, Sri Lanka.
| | - Sen-Lin Tang
- Biodiversity Research Center, Academia Sinica, Nan-Kang, Taipei, 11529, Taiwan
| | - Kshitij Tandon
- Biodiversity Research Center, Academia Sinica, Nan-Kang, Taipei, 11529, Taiwan.,Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Hsinchu, 300, Taiwan.,Bioinformatics Program, Institute of Information Science, Taiwan International Graduate Program, Academia Sinica, Taipei, 115, Taiwan
| | - David Ackland
- Department of Biomedical Engineering, The University of Melbourne, Victoria, 3010, Australia
| | - Saman Kumara Halgamuge
- Research School of Engineering, College of Engineering and Computer Science, The Australian National University, Canberra ACT, 2601, Australia
| |
Collapse
|
10
|
Metagenomic binning and association of plasmids with bacterial host genomes using DNA methylation. Nat Biotechnol 2017; 36:61-69. [PMID: 29227468 PMCID: PMC5762413 DOI: 10.1038/nbt.4037] [Citation(s) in RCA: 86] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Accepted: 11/13/2017] [Indexed: 02/04/2023]
Abstract
Shotgun metagenomics methods enable characterization of microbial communities in human microbiome and environmental samples. Assembly of metagenome sequences does not output whole genomes, so computational binning methods have been developed to cluster sequences into genome ‘bins’. These methods exploit sequence composition, species abundance, or chromosome organization but cannot fully distinguish closely related species and strains. We present a binning method that incorporates bacterial DNA methylation signatures, which are detected using single-molecule real-time sequencing. Our method takes advantage of these endogenous epigenetic barcodes to resolve individual reads and assembled contigs into species- and strain-level bins. We validated our method using synthetic and real microbiome sequences. In addition to genome binning, we show that our method links plasmids and other mobile genetic elements to their host species in a real microbiome sample. Incorporation of DNA methylation information into shotgun metagenomics analyses will complement existing methods to enable more accurate sequence binning.
Collapse
|
11
|
Schulz A, Brinkrolf J, Hammer B. Efficient kernelisation of discriminative dimensionality reduction. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.01.104] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
12
|
Metagenome-Assembled Genome Sequences of Acetobacterium sp. Strain MES1 and Desulfovibrio sp. Strain MES5 from a Cathode-Associated Acetogenic Microbial Community. GENOME ANNOUNCEMENTS 2017; 5:5/36/e00938-17. [PMID: 28883141 PMCID: PMC5589535 DOI: 10.1128/genomea.00938-17] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Draft genome sequences of Acetobacterium sp. strain MES1 and Desulfovibrio sp. strain MES5 were obtained from the metagenome of a cathode-associated community enriched within a microbial electrosynthesis system (MES). The draft genome sequences provide insight into the functional potential of these microorganisms within an MES and a foundation for future comparative analyses.
Collapse
|
13
|
Integrated meta-omic analyses of the gastrointestinal tract microbiome in patients undergoing allogeneic hematopoietic stem cell transplantation. Transl Res 2017; 186:79-94.e1. [PMID: 28686852 DOI: 10.1016/j.trsl.2017.06.008] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Revised: 05/17/2017] [Accepted: 06/12/2017] [Indexed: 02/06/2023]
Abstract
In patients undergoing allogeneic hematopoietic stem cell transplantation (allo-HSCT), treatment-induced changes to the gastrointestinal tract (GIT) microbiome have been linked to adverse outcomes, most notably graft-versus-host disease (GvHD). However, it is presently unknown whether this relationship is causal or consequential. Here, we performed an integrated meta-omic analysis to probe deeper into the GIT microbiome changes during allo-HSCT and its accompanying treatments. We used 16S and 18S rRNA gene amplicon sequencing to resolve archaea, bacteria, and eukaryotes within the GIT microbiomes of 16 patients undergoing allo-HSCT for the treatment of hematologic malignancies. These results revealed a major shift in the GIT microbiome after allo-HSCT including a marked reduction in bacterial diversity, accompanied by only limited changes in eukaryotes and archaea. An integrated analysis of metagenomic and metatranscriptomic data was performed on samples collected from a patient before and after allo-HSCT for acute myeloid leukemia. This patient developed severe GvHD, leading to death 9 months after allo-HSCT. In addition to drastically decreased bacterial diversity, the post-treatment microbiome showed a higher overall number and higher expression levels of antibiotic resistance genes (ARGs). One specific Escherichia coli strain causing a paravertebral abscess was linked to GIT dysbiosis, suggesting loss of intestinal barrier integrity. The apparent selection for bacteria expressing ARGs suggests that prophylactic antibiotic administration may adversely affect the overall treatment outcome. We therefore assert that such analyses including information about the selection of pathogenic bacteria expressing ARGs may assist clinicians in "personalizing" regimens for individual patients to improve overall outcomes.
Collapse
|
14
|
Laczny CC, Kiefer C, Galata V, Fehlmann T, Backes C, Keller A. BusyBee Web: metagenomic data analysis by bootstrapped supervised binning and annotation. Nucleic Acids Res 2017; 45:W171-W179. [PMID: 28472498 PMCID: PMC5570254 DOI: 10.1093/nar/gkx348] [Citation(s) in RCA: 56] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Revised: 04/11/2017] [Accepted: 04/21/2017] [Indexed: 12/20/2022] Open
Abstract
Metagenomics-based studies of mixed microbial communities are impacting biotechnology, life sciences and medicine. Computational binning of metagenomic data is a powerful approach for the culture-independent recovery of population-resolved genomic sequences, i.e. from individual or closely related, constituent microorganisms. Existing binning solutions often require a priori characterized reference genomes and/or dedicated compute resources. Extending currently available reference-independent binning tools, we developed the BusyBee Web server for the automated deconvolution of metagenomic data into population-level genomic bins using assembled contigs (Illumina) or long reads (Pacific Biosciences, Oxford Nanopore Technologies). A reversible compression step as well as bootstrapped supervised binning enable quick turnaround times. The binning results are represented in interactive 2D scatterplots. Moreover, bin quality estimates, taxonomic annotations and annotations of antibiotic resistance genes are computed and visualized. Ground truth-based benchmarks of BusyBee Web demonstrate comparably high performance to state-of-the-art binning solutions for assembled contigs and markedly improved performance for long reads (median F1 scores: 70.02-95.21%). Furthermore, the applicability to real-world metagenomic datasets is shown. In conclusion, our reference-independent approach automatically bins assembled contigs or long reads, exhibits high sensitivity and precision, enables intuitive inspection of the results, and only requires FASTA-formatted input. The web-based application is freely accessible at: https://ccb-microbe.cs.uni-saarland.de/busybee.
Collapse
Affiliation(s)
- Cedric C. Laczny
- Chair for Clinical Bioinformatics, Saarland University, Campus Building E2.1, 66123 Saarbrücken, Germany
| | - Christina Kiefer
- Chair for Clinical Bioinformatics, Saarland University, Campus Building E2.1, 66123 Saarbrücken, Germany
| | - Valentina Galata
- Chair for Clinical Bioinformatics, Saarland University, Campus Building E2.1, 66123 Saarbrücken, Germany
| | - Tobias Fehlmann
- Chair for Clinical Bioinformatics, Saarland University, Campus Building E2.1, 66123 Saarbrücken, Germany
| | - Christina Backes
- Chair for Clinical Bioinformatics, Saarland University, Campus Building E2.1, 66123 Saarbrücken, Germany
| | - Andreas Keller
- Chair for Clinical Bioinformatics, Saarland University, Campus Building E2.1, 66123 Saarbrücken, Germany
| |
Collapse
|
15
|
Interpreting Microbial Biosynthesis in the Genomic Age: Biological and Practical Considerations. Mar Drugs 2017; 15:md15060165. [PMID: 28587290 PMCID: PMC5484115 DOI: 10.3390/md15060165] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Revised: 05/22/2017] [Accepted: 05/31/2017] [Indexed: 02/06/2023] Open
Abstract
Genome mining has become an increasingly powerful, scalable, and economically accessible tool for the study of natural product biosynthesis and drug discovery. However, there remain important biological and practical problems that can complicate or obscure biosynthetic analysis in genomic and metagenomic sequencing projects. Here, we focus on limitations of available technology as well as computational and experimental strategies to overcome them. We review the unique challenges and approaches in the study of symbiotic and uncultured systems, as well as those associated with biosynthetic gene cluster (BGC) assembly and product prediction. Finally, to explore sequencing parameters that affect the recovery and contiguity of large and repetitive BGCs assembled de novo, we simulate Illumina and PacBio sequencing of the Salinispora tropica genome focusing on assembly of the salinilactam (slm) BGC.
Collapse
|
16
|
Lux M, Krüger J, Rinke C, Maus I, Schlüter A, Woyke T, Sczyrba A, Hammer B. acdc - Automated Contamination Detection and Confidence estimation for single-cell genome data. BMC Bioinformatics 2016; 17:543. [PMID: 27998267 PMCID: PMC5168860 DOI: 10.1186/s12859-016-1397-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Accepted: 11/29/2016] [Indexed: 01/05/2023] Open
Abstract
Background A major obstacle in single-cell sequencing is sample contamination with foreign DNA. To guarantee clean genome assemblies and to prevent the introduction of contamination into public databases, considerable quality control efforts are put into post-sequencing analysis. Contamination screening generally relies on reference-based methods such as database alignment or marker gene search, which limits the set of detectable contaminants to organisms with closely related reference species. As genomic coverage in the tree of life is highly fragmented, there is an urgent need for a reference-free methodology for contaminant identification in sequence data. Results We present acdc, a tool specifically developed to aid the quality control process of genomic sequence data. By combining supervised and unsupervised methods, it reliably detects both known and de novo contaminants. First, 16S rRNA gene prediction and the inclusion of ultrafast exact alignment techniques allow sequence classification using existing knowledge from databases. Second, reference-free inspection is enabled by the use of state-of-the-art machine learning techniques that include fast, non-linear dimensionality reduction of oligonucleotide signatures and subsequent clustering algorithms that automatically estimate the number of clusters. The latter also enables the removal of any contaminant, yielding a clean sample. Furthermore, given the data complexity and the ill-posedness of clustering, acdc employs bootstrapping techniques to provide statistically profound confidence values. Tested on a large number of samples from diverse sequencing projects, our software is able to quickly and accurately identify contamination. Results are displayed in an interactive user interface. Acdc can be run from the web as well as a dedicated command line application, which allows easy integration into large sequencing project analysis workflows. Conclusions Acdc can reliably detect contamination in single-cell genome data. In addition to database-driven detection, it complements existing tools by its unsupervised techniques, which allow for the detection of de novo contaminants. Our contribution has the potential to drastically reduce the amount of resources put into these processes, particularly in the context of limited availability of reference species. As single-cell genome data continues to grow rapidly, acdc adds to the toolkit of crucial quality assurance tools. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1397-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Markus Lux
- Computational Methods for the Analysis of the Diversity and Dynamics of Genomes, Bielefeld University, Universitätsstr. 25, Bielefeld, 33615, Germany.
| | - Jan Krüger
- Center for Biotechnology - CeBiTec, Bielefeld University, Universitätsstr. 27, Bielefeld, 33615, Germany
| | - Christian Rinke
- Australian Centre for Ecogenomics, University of Queensland, ST LUCIA, Brisbane, QLD 4072, Australia
| | - Irena Maus
- Center for Biotechnology - CeBiTec, Bielefeld University, Universitätsstr. 27, Bielefeld, 33615, Germany
| | - Andreas Schlüter
- Center for Biotechnology - CeBiTec, Bielefeld University, Universitätsstr. 27, Bielefeld, 33615, Germany
| | - Tanja Woyke
- , 2800 Mitchell Drive, Walnut Creek, 94598, CA, USA
| | - Alexander Sczyrba
- Center for Biotechnology - CeBiTec, Bielefeld University, Universitätsstr. 27, Bielefeld, 33615, Germany
| | - Barbara Hammer
- CITEC centre of excellence, Bielefeld University, Inspiration 1, Bielefeld, 33619, Germany
| |
Collapse
|
17
|
Narayanasamy S, Jarosz Y, Muller EEL, Heintz-Buschart A, Herold M, Kaysen A, Laczny CC, Pinel N, May P, Wilmes P. IMP: a pipeline for reproducible reference-independent integrated metagenomic and metatranscriptomic analyses. Genome Biol 2016; 17:260. [PMID: 27986083 PMCID: PMC5159968 DOI: 10.1186/s13059-016-1116-8] [Citation(s) in RCA: 86] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Accepted: 11/22/2016] [Indexed: 01/28/2023] Open
Abstract
Existing workflows for the analysis of multi-omic microbiome datasets are lab-specific and often result in sub-optimal data usage. Here we present IMP, a reproducible and modular pipeline for the integrated and reference-independent analysis of coupled metagenomic and metatranscriptomic data. IMP incorporates robust read preprocessing, iterative co-assembly, analyses of microbial community structure and function, automated binning, as well as genomic signature-based visualizations. The IMP-based data integration strategy enhances data usage, output volume, and output quality as demonstrated using relevant use-cases. Finally, IMP is encapsulated within a user-friendly implementation using Python and Docker. IMP is available at http://r3lab.uni.lu/web/imp/ (MIT license).
Collapse
Affiliation(s)
- Shaman Narayanasamy
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette, L-4362 Luxembourg
| | - Yohan Jarosz
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette, L-4362 Luxembourg
| | - Emilie E. L. Muller
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette, L-4362 Luxembourg
- Present address: Department of Microbiology, Genomics and the Environment, UMR 7156 UNISTRA—CNRS, Université de Strasbourg, Strasbourg, France
| | - Anna Heintz-Buschart
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette, L-4362 Luxembourg
| | - Malte Herold
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette, L-4362 Luxembourg
| | - Anne Kaysen
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette, L-4362 Luxembourg
| | - Cédric C. Laczny
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette, L-4362 Luxembourg
- Present address: Saarland University, Building E2 1, Saarbrücken, 66123 Germany
| | - Nicolás Pinel
- Institute of Systems Biology, 401 Terry Avenue North, Seattle, WA 98109 USA
- Present address: Universidad EAFIT, Carrera 49 No 7 sur 50, Medellín, Colombia
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette, L-4362 Luxembourg
| | - Paul Wilmes
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette, L-4362 Luxembourg
| |
Collapse
|
18
|
Sedlar K, Kupkova K, Provaznik I. Bioinformatics strategies for taxonomy independent binning and visualization of sequences in shotgun metagenomics. Comput Struct Biotechnol J 2016; 15:48-55. [PMID: 27980708 PMCID: PMC5148923 DOI: 10.1016/j.csbj.2016.11.005] [Citation(s) in RCA: 76] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2016] [Revised: 11/24/2016] [Accepted: 11/26/2016] [Indexed: 12/11/2022] Open
Abstract
One of main steps in a study of microbial communities is resolving their composition, diversity and function. In the past, these issues were mostly addressed by the use of amplicon sequencing of a target gene because of reasonable price and easier computational postprocessing of the bioinformatic data. With the advancement of sequencing techniques, the main focus shifted to the whole metagenome shotgun sequencing, which allows much more detailed analysis of the metagenomic data, including reconstruction of novel microbial genomes and to gain knowledge about genetic potential and metabolic capacities of whole environments. On the other hand, the output of whole metagenomic shotgun sequencing is mixture of short DNA fragments belonging to various genomes, therefore this approach requires more sophisticated computational algorithms for clustering of related sequences, commonly referred to as sequence binning. There are currently two types of binning methods: taxonomy dependent and taxonomy independent. The first type classifies the DNA fragments by performing a standard homology inference against a reference database, while the latter performs the reference-free binning by applying clustering techniques on features extracted from the sequences. In this review, we describe the strategies within the second approach. Although these strategies do not require prior knowledge, they have higher demands on the length of sequences. Besides their basic principle, an overview of particular methods and tools is provided. Furthermore, the review covers the utilization of the methods in context with the length of sequences and discusses the needs for metagenomic data preprocessing in form of initial assembly prior to binning.
Collapse
Affiliation(s)
- Karel Sedlar
- Department of Biomedical Engineering, Brno University of Technology, Technicka 12, Brno, Czech Republic
| | | | | |
Collapse
|
19
|
Integrated multi-omics of the human gut microbiome in a case study of familial type 1 diabetes. Nat Microbiol 2016; 2:16180. [PMID: 27723761 DOI: 10.1038/nmicrobiol.2016.180] [Citation(s) in RCA: 169] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Accepted: 08/23/2016] [Indexed: 12/21/2022]
Abstract
The gastrointestinal microbiome is a complex ecosystem with functions that shape human health. Studying the relationship between taxonomic alterations and functional repercussions linked to disease remains challenging. Here, we present an integrative approach to resolve the taxonomic and functional attributes of gastrointestinal microbiota at the metagenomic, metatranscriptomic and metaproteomic levels. We apply our methods to samples from four families with multiple cases of type 1 diabetes mellitus (T1DM). Analysis of intra- and inter-individual variation demonstrates that family membership has a pronounced effect on the structural and functional composition of the gastrointestinal microbiome. In the context of T1DM, consistent taxonomic differences were absent across families, but certain human exocrine pancreatic proteins were found at lower levels. The associated microbial functional signatures were linked to metabolic traits in distinct taxa. The methodologies and results provide a foundation for future large-scale integrated multi-omic analyses of the gastrointestinal microbiome in the context of host-microbe interactions in human health and disease.
Collapse
|
20
|
Single sample resolution of rare microbial dark matter in a marine invertebrate metagenome. Sci Rep 2016; 6:34362. [PMID: 27681823 PMCID: PMC5041132 DOI: 10.1038/srep34362] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2016] [Accepted: 09/13/2016] [Indexed: 12/31/2022] Open
Abstract
Direct, untargeted sequencing of environmental samples (metagenomics) and de novo genome assembly enable the study of uncultured and phylogenetically divergent organisms. However, separating individual genomes from a mixed community has often relied on the differential-coverage analysis of multiple, deeply sequenced samples. In the metagenomic investigation of the marine bryozoan Bugula neritina, we uncovered seven bacterial genomes associated with a single B. neritina individual that appeared to be transient associates, two of which were unique to one individual and undetectable using certain “universal” 16S rRNA primers and probes. We recovered high quality genome assemblies for several rare instances of “microbial dark matter,” or phylogenetically divergent bacteria lacking genomes in reference databases, from a single tissue sample that was not subjected to any physical or chemical pre-treatment. One of these rare, divergent organisms has a small (593 kbp), poorly annotated genome with low GC content (20.9%) and a 16S rRNA gene with just 65% sequence similarity to the closest reference sequence. Our findings illustrate the importance of sampling strategy and de novo assembly of metagenomic reads to understand the extent and function of bacterial biodiversity.
Collapse
|
21
|
Laczny CC, Muller EEL, Heintz-Buschart A, Herold M, Lebrun LA, Hogan A, May P, de Beaufort C, Wilmes P. Identification, Recovery, and Refinement of Hitherto Undescribed Population-Level Genomes from the Human Gastrointestinal Tract. Front Microbiol 2016; 7:884. [PMID: 27445992 PMCID: PMC4914512 DOI: 10.3389/fmicb.2016.00884] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Accepted: 05/25/2016] [Indexed: 12/05/2022] Open
Abstract
Linking taxonomic identity and functional potential at the population-level is important for the study of mixed microbial communities and is greatly facilitated by the availability of microbial reference genomes. While the culture-independent recovery of population-level genomes from environmental samples using the binning of metagenomic data has expanded available reference genome catalogs, several microbial lineages remain underrepresented. Here, we present two reference-independent approaches for the identification, recovery, and refinement of hitherto undescribed population-level genomes. The first approach is aimed at genome recovery of varied taxa and involves multi-sample automated binning using CANOPY CLUSTERING complemented by visualization and human-augmented binning using VIZBIN post hoc. The second approach is particularly well-suited for the study of specific taxa and employs VIZBIN de novo. Using these approaches, we reconstructed a total of six population-level genomes of distinct and divergent representatives of the Alphaproteobacteria class, the Mollicutes class, the Clostridiales order, and the Melainabacteria class from human gastrointestinal tract-derived metagenomic data. Our results demonstrate that, while automated binning approaches provide great potential for large-scale studies of mixed microbial communities, these approaches should be complemented with informative visualizations because expert-driven inspection and refinements are critical for the recovery of high-quality population-level genomes.
Collapse
Affiliation(s)
- Cedric C. Laczny
- Luxembourg Centre for Systems Biomedicine, University of LuxembourgBelvaux, Luxembourg
| | - Emilie E. L. Muller
- Luxembourg Centre for Systems Biomedicine, University of LuxembourgBelvaux, Luxembourg
| | - Anna Heintz-Buschart
- Luxembourg Centre for Systems Biomedicine, University of LuxembourgBelvaux, Luxembourg
| | - Malte Herold
- Luxembourg Centre for Systems Biomedicine, University of LuxembourgBelvaux, Luxembourg
| | - Laura A. Lebrun
- Luxembourg Centre for Systems Biomedicine, University of LuxembourgBelvaux, Luxembourg
| | - Angela Hogan
- Integrated Biobank of LuxembourgLuxembourg, Luxembourg
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, University of LuxembourgBelvaux, Luxembourg
| | - Carine de Beaufort
- Luxembourg Centre for Systems Biomedicine, University of LuxembourgBelvaux, Luxembourg
- Centre Hospitalier de LuxembourgLuxembourg, Luxembourg
| | - Paul Wilmes
- Luxembourg Centre for Systems Biomedicine, University of LuxembourgBelvaux, Luxembourg
| |
Collapse
|
22
|
Lin HH, Liao YC. Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes. Sci Rep 2016; 6:24175. [PMID: 27067514 PMCID: PMC4828714 DOI: 10.1038/srep24175] [Citation(s) in RCA: 140] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2015] [Accepted: 03/22/2016] [Indexed: 12/18/2022] Open
Abstract
Metagenomics, the application of shotgun sequencing, facilitates the reconstruction of the genomes of individual species from natural environments. A major challenge in the genome recovery domain is to agglomerate or ‘bin’ sequences assembled from metagenomic reads into individual groups. Metagenomic binning without consideration of reference sequences enables the comprehensive discovery of new microbial organisms and aids in the microbial genome reconstruction process. Here we present MyCC, an automated binning tool that combines genomic signatures, marker genes and optional contig coverages within one or multiple samples, in order to visualize the metagenomes and to identify the reconstructed genomic fragments. We demonstrate the superior performance of MyCC compared to other binning tools including CONCOCT, GroopM, MaxBin and MetaBAT on both synthetic and real human gut communities with a small sample size (one to 11 samples), as well as on a large metagenome dataset (over 250 samples). Moreover, we demonstrate the visualization of metagenomes in MyCC to aid in the reconstruction of genomes from distinct bins. MyCC is freely available at http://sourceforge.net/projects/sb2nhri/files/MyCC/.
Collapse
Affiliation(s)
- Hsin-Hung Lin
- Institute of Population Health Sciences, National Health Research Institutes, Miaoli County 35053, Taiwan
| | - Yu-Chieh Liao
- Institute of Population Health Sciences, National Health Research Institutes, Miaoli County 35053, Taiwan
| |
Collapse
|
23
|
Xie M, Ren M, Yang C, Yi H, Li Z, Li T, Zhao J. Metagenomic Analysis Reveals Symbiotic Relationship among Bacteria in Microcystis-Dominated Community. Front Microbiol 2016; 7:56. [PMID: 26870018 PMCID: PMC4735357 DOI: 10.3389/fmicb.2016.00056] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2015] [Accepted: 01/13/2016] [Indexed: 11/13/2022] Open
Abstract
Microcystis bloom, a cyanobacterial mass occurrence often found in eutrophicated water bodies, is one of the most serious threats to freshwater ecosystems worldwide. In nature, Microcystis forms aggregates or colonies that contain heterotrophic bacteria. The Microcystis-bacteria colonies were persistent even when they were maintained in lab culture for a long period. The relationship between Microcystis and the associated bacteria was investigated by a metagenomic approach in this study. We developed a visualization-guided method of binning for genome assembly after total colony DNA sequencing. We found that the method was effective in grouping sequences and it did not require reference genome sequence. Individual genomes of the colony bacteria were obtained and they provided valuable insights into microbial community structures. Analysis of metabolic pathways based on these genomes revealed that while all heterotrophic bacteria were dependent upon Microcystis for carbon and energy, Vitamin B12 biosynthesis, which is required for growth by Microcystis, was accomplished in a cooperative fashion among the bacteria. Our analysis also suggests that individual bacteria in the colony community contributed a complete pathway for degradation of benzoate, which is inhibitory to the cyanobacterial growth, and its ecological implication for Microcystis bloom is discussed.
Collapse
Affiliation(s)
- Meili Xie
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of SciencesWuhan, China; University of Chinese Academy of SciencesBeijing, China
| | - Minglei Ren
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of SciencesWuhan, China; University of Chinese Academy of SciencesBeijing, China
| | - Chen Yang
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of SciencesWuhan, China; University of Chinese Academy of SciencesBeijing, China
| | - Haisi Yi
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of SciencesWuhan, China; University of Chinese Academy of SciencesBeijing, China
| | - Zhe Li
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences Beijing, China
| | - Tao Li
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences Wuhan, China
| | - Jindong Zhao
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of SciencesWuhan, China; College of Life Science, Peking UniversityBeijing, China
| |
Collapse
|
24
|
Bauer E, Laczny CC, Magnusdottir S, Wilmes P, Thiele I. Phenotypic differentiation of gastrointestinal microbes is reflected in their encoded metabolic repertoires. MICROBIOME 2015; 3:55. [PMID: 26617277 PMCID: PMC4663747 DOI: 10.1186/s40168-015-0121-6] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2015] [Accepted: 09/30/2015] [Indexed: 05/27/2023]
Abstract
BACKGROUND The human gastrointestinal tract harbors a diverse microbial community, in which metabolic phenotypes play important roles for the human host. Recent developments in meta-omics attempt to unravel metabolic roles of microbes by linking genotypic and phenotypic characteristics. This connection, however, still remains poorly understood with respect to its evolutionary and ecological context. RESULTS We generated automatically refined draft genome-scale metabolic models of 301 representative intestinal microbes in silico. We applied a combination of unsupervised machine-learning and systems biology techniques to study individual and global differences in genomic content and inferred metabolic capabilities. Based on the global metabolic differences, we found that energy metabolism and membrane synthesis play important roles in delineating different taxonomic groups. Furthermore, we found an exponential relationship between phylogeny and the reaction composition, meaning that closely related microbes of the same genus can exhibit pronounced differences with respect to their metabolic capabilities while at the family level only marginal metabolic differences can be observed. This finding was further substantiated by the metabolic divergence within different genera. In particular, we could distinguish three sub-type clusters based on membrane and energy metabolism within the Lactobacilli as well as two clusters within the Bifidobacteria and Bacteroides. CONCLUSIONS We demonstrate that phenotypic differentiation within closely related species could be explained by their metabolic repertoire rather than their phylogenetic relationships. These results have important implications in our understanding of the ecological and evolutionary complexity of the human gastrointestinal microbiome.
Collapse
Affiliation(s)
- Eugen Bauer
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg.
| | - Cedric Christian Laczny
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg.
| | - Stefania Magnusdottir
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg.
| | - Paul Wilmes
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg.
| | - Ines Thiele
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg.
| |
Collapse
|
25
|
Narayanasamy S, Muller EEL, Sheik AR, Wilmes P. Integrated omics for the identification of key functionalities in biological wastewater treatment microbial communities. Microb Biotechnol 2015; 8:363-8. [PMID: 25678254 PMCID: PMC4408170 DOI: 10.1111/1751-7915.12255] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2014] [Revised: 11/11/2014] [Accepted: 11/13/2014] [Indexed: 11/30/2022] Open
Abstract
Biological wastewater treatment plants harbour diverse and complex microbial communities which prominently serve as models for microbial ecology and mixed culture biotechnological processes. Integrated omic analyses (combined metagenomics, metatranscriptomics, metaproteomics and metabolomics) are currently gaining momentum towards providing enhanced understanding of community structure, function and dynamics in situ as well as offering the potential to discover novel biological functionalities within the framework of Eco-Systems Biology. The integration of information from genome to metabolome allows the establishment of associations between genetic potential and final phenotype, a feature not realizable by only considering single ‘omes’. Therefore, in our opinion, integrated omics will become the future standard for large-scale characterization of microbial consortia including those underpinning biological wastewater treatment processes. Systematically obtained time and space-resolved omic datasets will allow deconvolution of structure–function relationships by identifying key members and functions. Such knowledge will form the foundation for discovering novel genes on a much larger scale compared with previous efforts. In general, these insights will allow us to optimize microbial biotechnological processes either through better control of mixed culture processes or by use of more efficient enzymes in bioengineering applications.
Collapse
Affiliation(s)
- Shaman Narayanasamy
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 7 avenue des Hauts-Fourneaux, Esch-Sur-Alzette, L-4362, Luxembourg
| | | | | | | |
Collapse
|
26
|
Laczny CC, Sternal T, Plugaru V, Gawron P, Atashpendar A, Margossian HH, Coronado S, der Maaten LV, Vlassis N, Wilmes P. VizBin - an application for reference-independent visualization and human-augmented binning of metagenomic data. MICROBIOME 2015; 3:1. [PMID: 25621171 PMCID: PMC4305225 DOI: 10.1186/s40168-014-0066-1] [Citation(s) in RCA: 198] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Accepted: 12/18/2014] [Indexed: 05/20/2023]
Abstract
BACKGROUND Metagenomics is limited in its ability to link distinct microbial populations to genetic potential due to a current lack of representative isolate genome sequences. Reference-independent approaches, which exploit for example inherent genomic signatures for the clustering of metagenomic fragments (binning), offer the prospect to resolve and reconstruct population-level genomic complements without the need for prior knowledge. RESULTS We present VizBin, a Java™-based application which offers efficient and intuitive reference-independent visualization of metagenomic datasets from single samples for subsequent human-in-the-loop inspection and binning. The method is based on nonlinear dimension reduction of genomic signatures and exploits the superior pattern recognition capabilities of the human eye-brain system for cluster identification and delineation. We demonstrate the general applicability of VizBin for the analysis of metagenomic sequence data by presenting results from two cellulolytic microbial communities and one human-borne microbial consortium. The superior performance of our application compared to other analogous metagenomic visualization and binning methods is also presented. CONCLUSIONS VizBin can be applied de novo for the visualization and subsequent binning of metagenomic datasets from single samples, and it can be used for the post hoc inspection and refinement of automatically generated bins. Due to its computational efficiency, it can be run on common desktop machines and enables the analysis of complex metagenomic datasets in a matter of minutes. The software implementation is available at https://claczny.github.io/VizBin under the BSD License (four-clause) and runs under Microsoft Windows™, Apple Mac OS X™ (10.7 to 10.10), and Linux.
Collapse
Affiliation(s)
- Cedric C Laczny
- />Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, 4362 Luxembourg
| | - Tomasz Sternal
- />Institute of Computing Science, Poznan University of Technology, Poznan, 60-965 Poland
| | - Valentin Plugaru
- />Computer Science and Communications Research Unit, University of Luxembourg, Luxembourg, 1359 Luxembourg
| | - Piotr Gawron
- />Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, 4362 Luxembourg
| | - Arash Atashpendar
- />Computer Science and Communications Research Unit, University of Luxembourg, Luxembourg, 1359 Luxembourg
| | - Houry Hera Margossian
- />Computer Science and Communications Research Unit, University of Luxembourg, Luxembourg, 1359 Luxembourg
| | - Sergio Coronado
- />Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, 4362 Luxembourg
| | - Laurens van der Maaten
- />Pattern Recognition and Bioinformatics Group, Delft University of Technology, CD Delft, 2628 Netherlands
| | | | - Paul Wilmes
- />Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, 4362 Luxembourg
| |
Collapse
|
27
|
Community-integrated omics links dominance of a microbial generalist to fine-tuned resource usage. Nat Commun 2014; 5:5603. [PMID: 25424998 PMCID: PMC4263124 DOI: 10.1038/ncomms6603] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2014] [Accepted: 10/20/2014] [Indexed: 11/08/2022] Open
Abstract
Microbial communities are complex and dynamic systems that are primarily structured according to their members’ ecological niches. To investigate how niche breadth (generalist versus specialist lifestyle strategies) relates to ecological success, we develop and apply an integrative workflow for the multi-omic analysis of oleaginous mixed microbial communities from a biological wastewater treatment plant. Time- and space-resolved coupled metabolomic and taxonomic analyses demonstrate that the community-wide lipid accumulation phenotype is associated with the dominance of the generalist bacterium Candidatus Microthrix spp. By integrating population-level genomic reconstructions (reflecting fundamental niches) with transcriptomic and proteomic data (realised niches), we identify finely tuned gene expression governing resource usage by Candidatus Microthrix parvicella over time. Moreover, our results indicate that the fluctuating environmental conditions constrain the accumulation of genetic variation in Candidatus Microthrix parvicella likely due to fitness trade-offs. Based on our observations, niche breadth has to be considered as an important factor for understanding the evolutionary processes governing (microbial) population sizes and structures in situ. Within microbial communities, microorganisms adopt different lifestyle strategies to use the available resources. Here, the authors use an integrated ‘multi-omic’ approach to study niche breadth (generalist versus specialist lifestyles) in oleaginous microbial assemblages from an anoxic wastewater treatment tank.
Collapse
|
28
|
Wu YW, Tang YH, Tringe SG, Simmons BA, Singer SW. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. MICROBIOME 2014; 2:26. [PMID: 25136443 PMCID: PMC4129434 DOI: 10.1186/2049-2618-2-26] [Citation(s) in RCA: 421] [Impact Index Per Article: 38.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2014] [Accepted: 06/04/2014] [Indexed: 05/11/2023]
Abstract
BACKGROUND Recovering individual genomes from metagenomic datasets allows access to uncultivated microbial populations that may have important roles in natural and engineered ecosystems. Understanding the roles of these uncultivated populations has broad application in ecology, evolution, biotechnology and medicine. Accurate binning of assembled metagenomic sequences is an essential step in recovering the genomes and understanding microbial functions. RESULTS We have developed a binning algorithm, MaxBin, which automates the binning of assembled metagenomic scaffolds using an expectation-maximization algorithm after the assembly of metagenomic sequencing reads. Binning of simulated metagenomic datasets demonstrated that MaxBin had high levels of accuracy in binning microbial genomes. MaxBin was used to recover genomes from metagenomic data obtained through the Human Microbiome Project, which demonstrated its ability to recover genomes from real metagenomic datasets with variable sequencing coverages. Application of MaxBin to metagenomes obtained from microbial consortia adapted to grow on cellulose allowed genomic analysis of new, uncultivated, cellulolytic bacterial populations, including an abundant myxobacterial population distantly related to Sorangium cellulosum that possessed a much smaller genome (5 MB versus 13 to 14 MB) but has a more extensive set of genes for biomass deconstruction. For the cellulolytic consortia, the MaxBin results were compared to binning using emergent self-organizing maps (ESOMs) and differential coverage binning, demonstrating that it performed comparably to these methods but had distinct advantages in automation, resolution of related genomes and sensitivity. CONCLUSIONS The automatic binning software that we developed successfully classifies assembled sequences in metagenomic datasets into recovered individual genomes. The isolation of dozens of species in cellulolytic microbial consortia, including a novel species of myxobacteria that has the smallest genome among all sequenced aerobic myxobacteria, was easily achieved using the binning software. This work demonstrates that the processes required for recovering genomes from assembled metagenomic datasets can be readily automated, an important advance in understanding the metabolic potential of microbes in natural environments. MaxBin is available at https://sourceforge.net/projects/maxbin/.
Collapse
Affiliation(s)
- Yu-Wei Wu
- Joint BioEnergy Institute, Emeryville, CA 94608, USA
- Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Yung-Hsu Tang
- Joint BioEnergy Institute, Emeryville, CA 94608, USA
- City College of San Francisco, San Francisco, CA 94112, USA
| | - Susannah G Tringe
- Joint Genome Institute, Walnut Creek, CA 94598, USA
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Blake A Simmons
- Joint BioEnergy Institute, Emeryville, CA 94608, USA
- Biological and Materials Sciences Center, Sandia National Laboratories, Livermore, CA 94551, USA
| | - Steven W Singer
- Joint BioEnergy Institute, Emeryville, CA 94608, USA
- Earth Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| |
Collapse
|