2051
|
Afiahayati, Sato K, Sakakibara Y. An extended genovo metagenomic assembler by incorporating paired-end information. PeerJ 2013; 1:e196. [PMID: 24281688 PMCID: PMC3817583 DOI: 10.7717/peerj.196] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2013] [Accepted: 10/10/2013] [Indexed: 11/25/2022] Open
Abstract
Metagenomes present assembly challenges, when assembling multiple genomes from mixed reads of multiple species. An assembler for single genomes can’t adapt well when applied in this case. A metagenomic assembler, Genovo, is a de novo assembler for metagenomes under a generative probabilistic model. Genovo assembles all reads without discarding any reads in a preprocessing step, and is therefore able to extract more information from metagenomic data and, in principle, generate better assembly results. Paired end sequencing is currently widely-used yet Genovo was designed for 454 single end reads. In this research, we attempted to extend Genovo by incorporating paired-end information, named Xgenovo, so that it generates higher quality assemblies with paired end reads. First, we extended Genovo by adding a bonus parameter in the Chinese Restaurant Process used to get prior accounts for the unknown number of genomes in the sample. This bonus parameter intends for a pair of reads to be in the same contig and as an effort to solve chimera contig case. Second, we modified the sampling process of the location of a read in a contig. We used relative distance for the number of trials in the symmetric geometric distribution instead of using distance between the offset and the center of contig used in Genovo. Using this relative distance, a read sampled in the appropriate location has higher probability. Therefore a read will be mapped in the correct location. Results of extensive experiments on simulated metagenomic datasets from simple to complex with species coverage setting following uniform and lognormal distribution showed that Xgenovo can be superior to the original Genovo and the recently proposed metagenome assembler for 454 reads, MAP. Xgenovo successfully generated longer N50 than Genovo and MAP while maintaining the assembly quality even for very complex metagenomic datasets consisting of 115 species. Xgenovo also demonstrated the potential to decrease the computational cost. This means that our strategy worked well. The software and all simulated datasets are publicly available online at http://xgenovo.dna.bio.keio.ac.jp.
Collapse
Affiliation(s)
- Afiahayati
- Department of Biosciences and Informatics, Keio University , Hiyoshi, Kohoku-ku, Yokohama , Japan
| | | | | |
Collapse
|
2052
|
Abstract
Cultivation-independent surveys of microbial diversity have revealed many bacterial phyla that lack cultured representatives. These lineages, referred to as candidate phyla, have been detected across many environments. Here, we deeply sequenced microbial communities from acetate-stimulated aquifer sediment to recover the complete and essentially complete genomes of single representatives of the candidate phyla SR1, WWE3, TM7, and OD1. All four of these genomes are very small, 0.7 to 1.2 Mbp, and have large inventories of novel proteins. Additionally, all lack identifiable biosynthetic pathways for several key metabolites. The SR1 genome uses the UGA codon to encode glycine, and the same codon is very rare in the OD1 genome, suggesting that the OD1 organism could also transition to alternate coding. Interestingly, the relative abundance of the members of SR1 increased with the appearance of sulfide in groundwater, a pattern mirrored by a member of the phylum Tenericutes. All four genomes encode type IV pili, which may be involved in interorganism interaction. On the basis of these results and other recently published research, metabolic dependence on other organisms may be widely distributed across multiple bacterial candidate phyla. Few or no genomic sequences exist for members of the numerous bacterial phyla lacking cultivated representatives, making it difficult to assess their roles in the environment. This paper presents three complete and one essentially complete genomes of members of four candidate phyla, documents consistently small genome size, and predicts metabolic capabilities on the basis of gene content. These metagenomic analyses expand our view of a lifestyle apparently common across these candidate phyla.
Collapse
|
2053
|
Di Rienzi SC, Sharon I, Wrighton KC, Koren O, Hug LA, Thomas BC, Goodrich JK, Bell JT, Spector TD, Banfield JF, Ley RE. The human gut and groundwater harbor non-photosynthetic bacteria belonging to a new candidate phylum sibling to Cyanobacteria. eLife 2013; 2:e01102. [PMID: 24137540 PMCID: PMC3787301 DOI: 10.7554/elife.01102] [Citation(s) in RCA: 270] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2013] [Accepted: 08/22/2013] [Indexed: 12/21/2022] Open
Abstract
Cyanobacteria were responsible for the oxygenation of the ancient atmosphere; however, the evolution of this phylum is enigmatic, as relatives have not been characterized. Here we use whole genome reconstruction of human fecal and subsurface aquifer metagenomic samples to obtain complete genomes for members of a new candidate phylum sibling to Cyanobacteria, for which we propose the designation 'Melainabacteria'. Metabolic analysis suggests that the ancestors to both lineages were non-photosynthetic, anaerobic, motile, and obligately fermentative. Cyanobacterial light sensing may have been facilitated by regulators present in the ancestor of these lineages. The subsurface organism has the capacity for nitrogen fixation using a nitrogenase distinct from that in Cyanobacteria, suggesting nitrogen fixation evolved separately in the two lineages. We hypothesize that Cyanobacteria split from Melainabacteria prior or due to the acquisition of oxygenic photosynthesis. Melainabacteria remained in anoxic zones and differentiated by niche adaptation, including for symbiosis in the mammalian gut. DOI:http://dx.doi.org/10.7554/eLife.01102.001.
Collapse
Affiliation(s)
- Sara C Di Rienzi
- Department of Microbiology, Cornell University, Ithaca, United States
| | - Itai Sharon
- Department of Earth and Planetary Science, University of California, Berkeley, Berkeley, United States
| | - Kelly C Wrighton
- Department of Earth and Planetary Science, University of California, Berkeley, Berkeley, United States
| | - Omry Koren
- Department of Microbiology, Cornell University, Ithaca, United States
| | - Laura A Hug
- Department of Earth and Planetary Science, University of California, Berkeley, Berkeley, United States
| | - Brian C Thomas
- Department of Earth and Planetary Science, University of California, Berkeley, Berkeley, United States
| | - Julia K Goodrich
- Department of Microbiology, Cornell University, Ithaca, United States
| | - Jordana T Bell
- Department of Twin Research and Genetic Epidemiology, King’s College London, London, United Kingdom
| | - Timothy D Spector
- Department of Twin Research and Genetic Epidemiology, King’s College London, London, United Kingdom
| | - Jillian F Banfield
- Department of Earth and Planetary Science, University of California, Berkeley, Berkeley, United States
- Department of Environmental Science, Policy, and Management, University of California, Berkeley, Berkeley, United States
| | - Ruth E Ley
- Department of Microbiology, Cornell University, Ithaca, United States
| |
Collapse
|
2054
|
Hug LA, Castelle CJ, Wrighton KC, Thomas BC, Sharon I, Frischkorn KR, Williams KH, Tringe SG, Banfield JF. Community genomic analyses constrain the distribution of metabolic traits across the Chloroflexi phylum and indicate roles in sediment carbon cycling. MICROBIOME 2013; 1:22. [PMID: 24450983 PMCID: PMC3971608 DOI: 10.1186/2049-2618-1-22] [Citation(s) in RCA: 336] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2013] [Accepted: 07/24/2013] [Indexed: 05/19/2023]
Abstract
BACKGROUND Sediments are massive reservoirs of carbon compounds and host a large fraction of microbial life. Microorganisms within terrestrial aquifer sediments control buried organic carbon turnover, degrade organic contaminants, and impact drinking water quality. Recent 16S rRNA gene profiling indicates that members of the bacterial phylum Chloroflexi are common in sediment. Only the role of the class Dehalococcoidia, which degrade halogenated solvents, is well understood. Genomic sampling is available for only six of the approximate 30 Chloroflexi classes, so little is known about the phylogenetic distribution of reductive dehalogenation or about the broader metabolic characteristics of Chloroflexi in sediment. RESULTS We used metagenomics to directly evaluate the metabolic potential and diversity of Chloroflexi in aquifer sediments. We sampled genomic sequence from 86 Chloroflexi representing 15 distinct lineages, including members of eight classes previously characterized only by 16S rRNA sequences. Unlike in the Dehalococcoidia, genes for organohalide respiration are rare within the Chloroflexi genomes sampled here. Near-complete genomes were reconstructed for three Chloroflexi. One, a member of an unsequenced lineage in the Anaerolinea, is an aerobe with the potential for respiring diverse carbon compounds. The others represent two genomically unsampled classes sibling to the Dehalococcoidia, and are anaerobes likely involved in sugar and plant-derived-compound degradation to acetate. Both fix CO2 via the Wood-Ljungdahl pathway, a pathway not previously documented in Chloroflexi. The genomes each encode unique traits apparently acquired from Archaea, including mechanisms of motility and ATP synthesis. CONCLUSIONS Chloroflexi in the aquifer sediments are abundant and highly diverse. Genomic analyses provide new evolutionary boundaries for obligate organohalide respiration. We expand the potential roles of Chloroflexi in sediment carbon cycling beyond organohalide respiration to include respiration of sugars, fermentation, CO2 fixation, and acetogenesis with ATP formation by substrate-level phosphorylation.
Collapse
Affiliation(s)
- Laura A Hug
- Department of Earth and Planetary Science, UC Berkeley, Berkeley, CA, USA
| | - Cindy J Castelle
- Department of Earth and Planetary Science, UC Berkeley, Berkeley, CA, USA
| | - Kelly C Wrighton
- Department of Earth and Planetary Science, UC Berkeley, Berkeley, CA, USA
| | - Brian C Thomas
- Department of Earth and Planetary Science, UC Berkeley, Berkeley, CA, USA
| | - Itai Sharon
- Department of Earth and Planetary Science, UC Berkeley, Berkeley, CA, USA
| | - Kyle R Frischkorn
- Department of Earth and Planetary Science, UC Berkeley, Berkeley, CA, USA
| | - Kenneth H Williams
- Geophysics Department, Earth Sciences Division, Lawrence Berkeley National Lab, Berkeley, CA, USA
| | - Susannah G Tringe
- Metagenome Program, DOE Joint Genome Institute, Walnut Creek, CA, USA
| | - Jillian F Banfield
- Department of Earth and Planetary Science, UC Berkeley, Berkeley, CA, USA
| |
Collapse
|
2055
|
Taghavi Z, Movahedi NS, Draghici S, Chitsaz H. Distilled single-cell genome sequencing and de novo assembly for sparse microbial communities. Bioinformatics 2013; 29:2395-401. [PMID: 23918251 DOI: 10.1093/bioinformatics/btt420] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Identification of every single genome present in a microbial sample is an important and challenging task with crucial applications. It is challenging because there are typically millions of cells in a microbial sample, the vast majority of which elude cultivation. The most accurate method to date is exhaustive single-cell sequencing using multiple displacement amplification, which is simply intractable for a large number of cells. However, there is hope for breaking this barrier, as the number of different cell types with distinct genome sequences is usually much smaller than the number of cells. RESULTS Here, we present a novel divide and conquer method to sequence and de novo assemble all distinct genomes present in a microbial sample with a sequencing cost and computational complexity proportional to the number of genome types, rather than the number of cells. The method is implemented in a tool called Squeezambler. We evaluated Squeezambler on simulated data. The proposed divide and conquer method successfully reduces the cost of sequencing in comparison with the naïve exhaustive approach. AVAILABILITY Squeezambler and datasets are available at http://compbio.cs.wayne.edu/software/squeezambler/.
Collapse
Affiliation(s)
- Zeinab Taghavi
- Department of Computer Science and Department of Obstetrics and Gynecology, Wayne State University, Detroit, MI 48202, USA
| | | | | | | |
Collapse
|
2056
|
Wylie KM, Weinstock GM, Storch GA. Virome genomics: a tool for defining the human virome. Curr Opin Microbiol 2013; 16:479-84. [PMID: 23706900 PMCID: PMC3755052 DOI: 10.1016/j.mib.2013.04.006] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2013] [Revised: 04/19/2013] [Accepted: 04/23/2013] [Indexed: 11/21/2022]
Abstract
High throughput, deep sequencing assays are powerful tools for gaining insights into virus-host interactions. Sequencing assays can discover novel viruses and describe the genomes of novel and known viruses. Genomic information can predict viral proteins that can be characterized, describe important genes in the host that control infections, and evaluate gene expression of viruses and hosts during infection. Sequencing can also describe variation and evolution of viruses during replication and transmission. This review recounts some of the major advances in the studies of virus-host interactions from the last two years, and discusses the uses of sequencing technologies relating to these studies.
Collapse
Affiliation(s)
- Kristine M Wylie
- The Genome Institute, Washington University School of Medicine, Campus Box 8501, 4444 Forest Park Avenue, St. Louis, MO 63108, United States.
| | | | | |
Collapse
|
2057
|
Bourgeois YXC, Lhuillier E, Cézard T, Bertrand JAM, Delahaie B, Cornuault J, Duval T, Bouchez O, Milá B, Thébaud C. Mass production of
SNP
markers in a nonmodel passerine bird through
RAD
sequencing and contig mapping to the zebra finch genome. Mol Ecol Resour 2013; 13:899-907. [DOI: 10.1111/1755-0998.12137] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2013] [Revised: 05/24/2013] [Accepted: 06/04/2013] [Indexed: 01/01/2023]
Affiliation(s)
- Yann X. C. Bourgeois
- Laboratoire Évolution et Diversité Biologique UMR 5174 CNRS ‐ Université Paul Sabatier – ENFA 118 route de Narbonne, Bâtiment 4R1 F‐31062 Toulouse Cedex 9 France
| | - Emeline Lhuillier
- INRA UAR 1209 Département de Génétique Animale INRA Auzeville F‐31326 Castanet‐Tolosan France
- GeT‐PlaGe Genotoul INRA Auzeville F‐31326 Castanet‐Tolosan France
| | - Timothée Cézard
- The GenePool Ashworth Laboratories The University of Edinburgh The King's Building Edinburgh EH9 3JT UK
| | - Joris A. M. Bertrand
- Laboratoire Évolution et Diversité Biologique UMR 5174 CNRS ‐ Université Paul Sabatier – ENFA 118 route de Narbonne, Bâtiment 4R1 F‐31062 Toulouse Cedex 9 France
| | - Boris Delahaie
- Laboratoire Évolution et Diversité Biologique UMR 5174 CNRS ‐ Université Paul Sabatier – ENFA 118 route de Narbonne, Bâtiment 4R1 F‐31062 Toulouse Cedex 9 France
| | - Josselin Cornuault
- Laboratoire Évolution et Diversité Biologique UMR 5174 CNRS ‐ Université Paul Sabatier – ENFA 118 route de Narbonne, Bâtiment 4R1 F‐31062 Toulouse Cedex 9 France
| | - Thomas Duval
- Société Calédonienne d'Ornithologie Nord BP 236 F‐98822 Poindimié Nouvelle Calédonie France
| | - Olivier Bouchez
- GeT‐PlaGe Genotoul INRA Auzeville F‐31326 Castanet‐Tolosan France
- INRA UMR 444 Laboratoire de Génétique Cellulaire INRA Auzeville F‐31326 Castanet‐Tolosan France
| | - Borja Milá
- Museo Nacional de Ciencias Naturales CSIC José Gutiérrez Abascal 2 Madrid 28006 Spain
| | - Christophe Thébaud
- Laboratoire Évolution et Diversité Biologique UMR 5174 CNRS ‐ Université Paul Sabatier – ENFA 118 route de Narbonne, Bâtiment 4R1 F‐31062 Toulouse Cedex 9 France
| |
Collapse
|
2058
|
Abstract
Humans are colonized by immense populations of viruses, which metagenomic analysis shows are mostly unique to each individual. To investigate the origin and evolution of the human gut virome, we analyzed the viral community of one adult individual over 2.5 y by extremely deep metagenomic sequencing (56 billion bases of purified viral sequence from 24 longitudinal fecal samples). After assembly, 478 well-determined contigs could be identified, which are inferred to correspond mostly to previously unstudied bacteriophage genomes. Fully 80% of these types persisted throughout the duration of the 2.5-y study, indicating long-term global stability. Mechanisms of base substitution, rates of accumulation, and the amount of variation varied among viral types. Temperate phages showed relatively lower mutation rates, consistent with replication by accurate bacterial DNA polymerases in the integrated prophage state. In contrast, Microviridae, which are lytic bacteriophages with single-stranded circular DNA genomes, showed high substitution rates (>10(-5) per nucleotide each day), so that sequence divergence over the 2.5-y period studied approached values sufficient to distinguish new viral species. Longitudinal changes also were associated with diversity-generating retroelements and virus-encoded Clustered Regularly Interspaced Short Palindromic Repeats arrays. We infer that the extreme interpersonal diversity of human gut viruses derives from two sources, persistence of a small portion of the global virome within the gut of each individual and rapid evolution of some long-term virome members.
Collapse
|
2059
|
Leung HC, Yiu SM, Parkinson J, Chin FY. IDBA-MT: De Novo Assembler for Metatranscriptomic Data Generated from Next-Generation Sequencing Technology. J Comput Biol 2013; 20:540-50. [DOI: 10.1089/cmb.2013.0042] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Affiliation(s)
- Henry C.M. Leung
- Department of Computer Science, The University of Hong Kong, Hong Kong, People's Republic of China
| | - Siu-Ming Yiu
- Department of Computer Science, The University of Hong Kong, Hong Kong, People's Republic of China
| | - John Parkinson
- Biochemistry & Molecular and Medical Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Francis Y.L. Chin
- Department of Computer Science, The University of Hong Kong, Hong Kong, People's Republic of China
| |
Collapse
|
2060
|
Virus-host and CRISPR dynamics in Archaea-dominated hypersaline Lake Tyrrell, Victoria, Australia. ARCHAEA-AN INTERNATIONAL MICROBIOLOGICAL JOURNAL 2013; 2013:370871. [PMID: 23853523 PMCID: PMC3703381 DOI: 10.1155/2013/370871] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2012] [Revised: 05/17/2013] [Accepted: 05/27/2013] [Indexed: 11/29/2022]
Abstract
The study of natural archaeal assemblages requires community context, namely, a concurrent assessment of the dynamics of archaeal, bacterial, and viral populations. Here, we use filter size-resolved metagenomic analyses to report the dynamics of 101 archaeal and bacterial OTUs and 140 viral populations across 17 samples collected over different timescales from 2007–2010 from Australian hypersaline Lake Tyrrell (LT). All samples were dominated by Archaea (75–95%). Archaeal, bacterial, and viral populations were found to be dynamic on timescales of months to years, and different viral assemblages were present in planktonic, relative to host-associated (active and provirus) size fractions. Analyses of clustered regularly interspaced short palindromic repeat (CRISPR) regions indicate that both rare and abundant viruses were targeted, primarily by lower abundance hosts. Although very few spacers had hits to the NCBI nr database or to the 140 LT viral populations, 21% had hits to unassembled LT viral concentrate reads. This suggests local adaptation to LT-specific viruses and/or undersampling of haloviral assemblages in public databases, along with successful CRISPR-mediated maintenance of viral populations at abundances low enough to preclude genomic assembly. This is the first metagenomic report evaluating widespread archaeal dynamics at the population level on short timescales in a hypersaline system.
Collapse
|
2061
|
Chikhi R, Medvedev P. Informed and automated k-mer size selection for genome assembly. Bioinformatics 2013; 30:31-7. [DOI: 10.1093/bioinformatics/btt310] [Citation(s) in RCA: 481] [Impact Index Per Article: 40.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
|
2062
|
PRICE: software for the targeted assembly of components of (Meta) genomic sequence data. G3-GENES GENOMES GENETICS 2013; 3:865-80. [PMID: 23550143 PMCID: PMC3656733 DOI: 10.1534/g3.113.005967] [Citation(s) in RCA: 194] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Low-cost DNA sequencing technologies have expanded the role for direct nucleic acid sequencing in the analysis of genomes, transcriptomes, and the metagenomes of whole ecosystems. Human and machine comprehension of such large datasets can be simplified via synthesis of sequence fragments into long, contiguous blocks of sequence (contigs), but most of the progress in the field of assembly has focused on genomes in isolation rather than metagenomes. Here, we present software for paired-read iterative contig extension (PRICE), a strategy for focused assembly of particular nucleic acid species using complex metagenomic data as input. We describe the assembly strategy implemented by PRICE and provide examples of its application to the sequence of particular genes, transcripts, and virus genomes from complex multicomponent datasets, including an assembly of the BCBL-1 strain of Kaposi's sarcoma-associated herpesvirus. PRICE is open-source and available for free download (derisilab.ucsf.edu/software/price/ or sourceforge.net/projects/pricedenovo/).
Collapse
|
2063
|
Segata N, Boernigen D, Tickle TL, Morgan XC, Garrett WS, Huttenhower C. Computational meta'omics for microbial community studies. Mol Syst Biol 2013; 9:666. [PMID: 23670539 PMCID: PMC4039370 DOI: 10.1038/msb.2013.22] [Citation(s) in RCA: 198] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2013] [Accepted: 04/03/2013] [Indexed: 12/16/2022] Open
Abstract
Complex microbial communities are an integral part of the Earth's ecosystem and of our bodies in health and disease. In the last two decades, culture-independent approaches have provided new insights into their structure and function, with the exponentially decreasing cost of high-throughput sequencing resulting in broadly available tools for microbial surveys. However, the field remains far from reaching a technological plateau, as both computational techniques and nucleotide sequencing platforms for microbial genomic and transcriptional content continue to improve. Current microbiome analyses are thus starting to adopt multiple and complementary meta'omic approaches, leading to unprecedented opportunities to comprehensively and accurately characterize microbial communities and their interactions with their environments and hosts. This diversity of available assays, analysis methods, and public data is in turn beginning to enable microbiome-based predictive and modeling tools. We thus review here the technological and computational meta'omics approaches that are already available, those that are under active development, their success in biological discovery, and several outstanding challenges.
Collapse
Affiliation(s)
- Nicola Segata
- Biostatistics Department, Harvard School of Public Health, Boston, MA, USA
- Present address: Centre for Integrative Biology, University of Trento, Trento, Italy
| | - Daniela Boernigen
- Biostatistics Department, Harvard School of Public Health, Boston, MA, USA
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Timothy L Tickle
- Biostatistics Department, Harvard School of Public Health, Boston, MA, USA
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Xochitl C Morgan
- Biostatistics Department, Harvard School of Public Health, Boston, MA, USA
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Wendy S Garrett
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Immunology and Infectious Diseases, Harvard School of Public Health, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Curtis Huttenhower
- Biostatistics Department, Harvard School of Public Health, Boston, MA, USA
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| |
Collapse
|
2064
|
Genome Sequences of Two Klebsiella pneumoniae Isolates from Different Geographical Regions, Argentina (Strain JHCK1) and the United States (Strain VA360). GENOME ANNOUNCEMENTS 2013; 1:1/2/e00168-13. [PMID: 23640195 PMCID: PMC3642250 DOI: 10.1128/genomea.00168-13] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
We report the sequences of two Klebsiella pneumoniae clinical isolates, strains JHCK1 and VA360, from a newborn with meningitis in Buenos Aires, Argentina, and from a tertiary care medical center in Cleveland, OH, respectively. Both isolates contain one chromosome and at least five plasmids; isolate VA360 contains the Klebsiella pneumoniae carbapenemase (KPC) gene.
Collapse
|
2065
|
Blainey PC. The future is now: single-cell genomics of bacteria and archaea. FEMS Microbiol Rev 2013; 37:407-27. [PMID: 23298390 PMCID: PMC3878092 DOI: 10.1111/1574-6976.12015] [Citation(s) in RCA: 202] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2012] [Revised: 11/28/2012] [Accepted: 12/20/2012] [Indexed: 01/08/2023] Open
Abstract
Interest in the expanding catalog of uncultivated microorganisms, increasing recognition of heterogeneity among seemingly similar cells, and technological advances in whole-genome amplification and single-cell manipulation are driving considerable progress in single-cell genomics. Here, the spectrum of applications for single-cell genomics, key advances in the development of the field, and emerging methodology for single-cell genome sequencing are reviewed by example with attention to the diversity of approaches and their unique characteristics. Experimental strategies transcending specific methodologies are identified and organized as a road map for future studies in single-cell genomics of environmental microorganisms. Over the next decade, increasingly powerful tools for single-cell genome sequencing and analysis will play key roles in accessing the genomes of uncultivated organisms, determining the basis of microbial community functions, and fundamental aspects of microbial population biology.
Collapse
|
2066
|
Desai A, Marwah VS, Yadav A, Jha V, Dhaygude K, Bangar U, Kulkarni V, Jere A. Identification of optimum sequencing depth especially for de novo genome assembly of small genomes using next generation sequencing data. PLoS One 2013; 8:e60204. [PMID: 23593174 PMCID: PMC3625192 DOI: 10.1371/journal.pone.0060204] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2012] [Accepted: 02/22/2013] [Indexed: 12/13/2022] Open
Abstract
Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.
Collapse
Affiliation(s)
- Aarti Desai
- Persistent LABS, Persistent Systems Ltd., Pune, Maharashtra, India
| | | | - Akshay Yadav
- Persistent LABS, Persistent Systems Ltd., Pune, Maharashtra, India
| | - Vineet Jha
- Persistent LABS, Persistent Systems Ltd., Pune, Maharashtra, India
| | - Kishor Dhaygude
- Persistent LABS, Persistent Systems Ltd., Pune, Maharashtra, India
| | - Ujwala Bangar
- Persistent LABS, Persistent Systems Ltd., Pune, Maharashtra, India
| | - Vivek Kulkarni
- Persistent LABS, Persistent Systems Ltd., Pune, Maharashtra, India
| | - Abhay Jere
- Persistent LABS, Persistent Systems Ltd., Pune, Maharashtra, India
| |
Collapse
|
2067
|
Zhou Q, Su X, Wang A, Xu J, Ning K. QC-Chain: fast and holistic quality control method for next-generation sequencing data. PLoS One 2013; 8:e60234. [PMID: 23565205 PMCID: PMC3615005 DOI: 10.1371/journal.pone.0060234] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2012] [Accepted: 02/23/2013] [Indexed: 02/01/2023] Open
Abstract
Next-generation sequencing (NGS) technologies have been widely used in life sciences. However, several kinds of sequencing artifacts, including low-quality reads and contaminating reads, were found to be quite common in raw sequencing data, which compromise downstream analysis. Therefore, quality control (QC) is essential for raw NGS data. However, although a few NGS data quality control tools are publicly available, there are two limitations: First, the processing speed could not cope with the rapid increase of large data volume. Second, with respect to removing the contaminating reads, none of them could identify contaminating sources de novo, and they rely heavily on prior information of the contaminating species, which is usually not available in advance. Here we report QC-Chain, a fast, accurate and holistic NGS data quality-control method. The tool synergeticly comprised of user-friendly tools for (1) quality assessment and trimming of raw reads using Parallel-QC, a fast read processing tool; (2) identification, quantification and filtration of unknown contamination to get high-quality clean reads. It was optimized based on parallel computation, so the processing speed is significantly higher than other QC methods. Experiments on simulated and real NGS data have shown that reads with low sequencing quality could be identified and filtered. Possible contaminating sources could be identified and quantified de novo, accurately and quickly. Comparison between raw reads and processed reads also showed that subsequent analyses (genome assembly, gene prediction, gene annotation, etc.) results based on processed reads improved significantly in completeness and accuracy. As regard to processing speed, QC-Chain achieves 7–8 time speed-up based on parallel computation as compared to traditional methods. Therefore, QC-Chain is a fast and useful quality control tool for read quality process and de novo contamination filtration of NGS reads, which could significantly facilitate downstream analysis. QC-Chain is publicly available at: http://www.computationalbioenergy.org/qc-chain.html.
Collapse
Affiliation(s)
- Qian Zhou
- CAS Key Laboratory of Biofuels and Shandong Key Laboratory of Energy Genetics, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China
| | - Xiaoquan Su
- CAS Key Laboratory of Biofuels and Shandong Key Laboratory of Energy Genetics, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China
| | - Anhui Wang
- CAS Key Laboratory of Biofuels and Shandong Key Laboratory of Energy Genetics, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China
- College of Computer and Information Technology, China Three Gorges University, Yichang, Hubei, China
| | - Jian Xu
- CAS Key Laboratory of Biofuels and Shandong Key Laboratory of Energy Genetics, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China
| | - Kang Ning
- CAS Key Laboratory of Biofuels and Shandong Key Laboratory of Energy Genetics, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China
- * E-mail:
| |
Collapse
|
2068
|
Yang Y, Yooseph S. SPA: a short peptide assembler for metagenomic data. Nucleic Acids Res 2013; 41:e91. [PMID: 23435317 PMCID: PMC3632116 DOI: 10.1093/nar/gkt118] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2012] [Revised: 02/01/2013] [Accepted: 02/05/2013] [Indexed: 12/22/2022] Open
Abstract
The metagenomic paradigm allows for an understanding of the metabolic and functional potential of microbes in a community via a study of their proteins. The substrate for protein identification is either the set of individual nucleotide reads generated from metagenomic samples or the set of contig sequences produced by assembling these reads. However, a read-based strategy using reads generated by next-generation sequencing (NGS) technologies, results in an overwhelming majority of partial-length protein predictions. A nucleotide assembly-based strategy does not fare much better, as metagenomic assemblies are typically fragmented and also leave a large fraction of reads unassembled. Here, we present a method for reconstructing complete protein sequences directly from NGS metagenomic data. Our framework is based on a novel short peptide assembler (SPA) that assembles protein sequences from their constituent peptide fragments identified on short reads. The SPA algorithm is based on informed traversals of a de Bruijn graph, defined on an amino acid alphabet, to identify probable paths that correspond to proteins. Using large simulated and real metagenomic data sets, we show that our method outperforms the alternate approach of identifying genes on nucleotide sequence assemblies and generates longer protein sequences that can be more effectively analysed.
Collapse
Affiliation(s)
| | - Shibu Yooseph
- Informatics Department, J. Craig Venter Institute, San Diego, CA 92121, USA
| |
Collapse
|
2069
|
De Filippo C, Ramazzotti M, Fontana P, Cavalieri D. Bioinformatic approaches for functional annotation and pathway inference in metagenomics data. Brief Bioinform 2013; 13:696-710. [PMID: 23175748 PMCID: PMC3505041 DOI: 10.1093/bib/bbs070] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Metagenomic approaches are increasingly recognized as a baseline for understanding the
ecology and evolution of microbial ecosystems. The development of methods for pathway
inference from metagenomics data is of paramount importance to link a phenotype to a
cascade of events stemming from a series of connected sets of genes or proteins.
Biochemical and regulatory pathways have until recently been thought and modelled within
one cell type, one organism, one species. This vision is being dramatically changed by the
advent of whole microbiome sequencing studies, revealing the role of symbiotic microbial
populations in fundamental biochemical functions. The new landscape we face requires a
clear picture of the potentialities of existing tools and development of new tools to
characterize, reconstruct and model biochemical and regulatory pathways as the result of
integration of function in complex symbiotic interactions of ontologically and
evolutionary distinct cell types.
Collapse
|
2070
|
Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. ACTA ACUST UNITED AC 2013; 29:1072-5. [PMID: 23422339 DOI: 10.1093/bioinformatics/btt086] [Citation(s) in RCA: 6436] [Impact Index Per Article: 536.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
SUMMARY Limitations of genome sequencing techniques have led to dozens of assembly algorithms, none of which is perfect. A number of methods for comparing assemblers have been developed, but none is yet a recognized benchmark. Further, most existing methods for comparing assemblies are only applicable to new assemblies of finished genomes; the problem of evaluating assemblies of previously unsequenced species has not been adequately considered. Here, we present QUAST-a quality assessment tool for evaluating and comparing genome assemblies. This tool improves on leading assembly comparison software with new ideas and quality metrics. QUAST can evaluate assemblies both with a reference genome, as well as without a reference. QUAST produces many reports, summary tables and plots to help scientists in their research and in their publications. In this study, we used QUAST to compare several genome assemblers on three datasets. QUAST tables and plots for all of them are available in the Supplementary Material, and interactive versions of these reports are on the QUAST website. AVAILABILITY http://bioinf.spbau.ru/quast . SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alexey Gurevich
- Algorithmic Biology Laboratory, St. Petersburg Academic University, Russian Academy of Sciences, St. Petersburg, Russia.
| | | | | | | |
Collapse
|
2071
|
Abstract
Vibrio vulnificus, which can lead to rapidly expanding cellulitis or septicemia, is present in the marine environment. Here, we present the draft genome sequence of strain B2, which was isolated from a septicemia patient in 2010.
Collapse
|
2072
|
Genome sequence of the pathogenic Herbaspirillum seropedicae strain Os45, isolated from rice roots. J Bacteriol 2013; 194:6995-6. [PMID: 23209242 DOI: 10.1128/jb.01935-12] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Most Herbaspirillum seropedicae strains are beneficial to plants. In contrast, H. seropedicae strain Os45, isolated from rice roots, is pathogenic. The draft genome sequence of strain Os45 presented here allows an in-depth comparative genome analysis to understand the subtle mechanisms of beneficial and pathogenic Herbaspirillum-plant interactions.
Collapse
|
2073
|
Abstract
Advances in sequencing technologies and increased access to sequencing services have led to renewed interest in sequence and genome assembly. Concurrently, new applications for sequencing have emerged, including gene expression analysis, discovery of genomic variants and metagenomics, and each of these has different needs and challenges in terms of assembly. We survey the theoretical foundations that underlie modern assembly and highlight the options and practical trade-offs that need to be considered, focusing on how individual features address the needs of specific applications. We also review key software and the interplay between experimental design and efficacy of assembly.
Collapse
Affiliation(s)
- Niranjan Nagarajan
- Computational and Systems Biology, Genome Institute of Singapore, 138672 Singapore
| | | |
Collapse
|
2074
|
Assembling Genomes and Mini-metagenomes from Highly Chimeric Reads. LECTURE NOTES IN COMPUTER SCIENCE 2013. [DOI: 10.1007/978-3-642-37195-0_13] [Citation(s) in RCA: 333] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
2075
|
Solonenko SA, Sullivan MB. Preparation of metagenomic libraries from naturally occurring marine viruses. Methods Enzymol 2013; 531:143-65. [PMID: 24060120 DOI: 10.1016/b978-0-12-407863-5.00008-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Microbes are now well recognized as major drivers of the biogeochemical cycling that fuels the Earth, and their viruses (phages) are known to be abundant and important in microbial mortality, horizontal gene transfer, and modulating microbial metabolic output. Investigation of environmental phages has been frustrated by an inability to culture the vast majority of naturally occurring diversity coupled with the lack of robust, quantitative, culture-independent methods for studying this uncultured majority. However, for double-stranded DNA phages, a quantitative viral metagenomic sample-to-sequence workflow now exists. Here, we review these advances with special emphasis on the technical details of preparing DNA sequencing libraries for metagenomic sequencing from environmentally relevant low-input DNA samples. Library preparation steps broadly involve manipulating the sample DNA by fragmentation, end repair and adaptor ligation, size fractionation, and amplification. One critical area of future research and development is parallel advances for alternate nucleic acid types such as single-stranded DNA and RNA viruses that are also abundant in nature. Combinations of recent advances in fragmentation (e.g., acoustic shearing and tagmentation), ligation reactions (adaptor-to-template ratio reference table availability), size fractionation (non-gel-sizing), and amplification (linear amplification for deep sequencing and linker amplification protocols) enhance our ability to generate quantitatively representative metagenomic datasets from low-input DNA samples. Such datasets are already providing new insights into the role of viruses in marine systems and will continue to do so as new environments are explored and synergies and paradigms emerge from large-scale comparative analyses.
Collapse
Affiliation(s)
- Sergei A Solonenko
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona, USA
| | | |
Collapse
|
2076
|
Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y, Tang J, Wu G, Zhang H, Shi Y, Liu Y, Yu C, Wang B, Lu Y, Han C, Cheung DW, Yiu SM, Peng S, Xiaoqian Z, Liu G, Liao X, Li Y, Yang H, Wang J, Lam TW, Wang J. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 2012; 1:18. [PMID: 23587118 PMCID: PMC3626529 DOI: 10.1186/2047-217x-1-18] [Citation(s) in RCA: 3644] [Impact Index Per Article: 280.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2012] [Accepted: 12/10/2012] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND There is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads; however, several big challenges remain to be overcome in order for this to be efficient and accurate. SOAPdenovo has been successfully applied to assemble many published genomes, but it still needs improvement in continuity, accuracy and coverage, especially in repeat regions. FINDINGS To overcome these challenges, we have developed its successor, SOAPdenovo2, which has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome. CONCLUSIONS Benchmark using the Assemblathon1 and GAGE datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive to other assemblers on both assembly length and accuracy. We also provide an updated assembly version of the 2008 Asian (YH) genome using SOAPdenovo2. Here, the contig and scaffold N50 of the YH genome were ~20.9 kbp and ~22 Mbp, respectively, which is 3-fold and 50-fold longer than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was ~2/3 lower during the point of largest memory consumption.
Collapse
Affiliation(s)
- Ruibang Luo
- BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong
- HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Pokfulam, Hong Kong
| | - Binghang Liu
- BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong
- HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Pokfulam, Hong Kong
| | - Yinlong Xie
- BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong
- HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Pokfulam, Hong Kong
- School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, 510006, China
| | - Zhenyu Li
- BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong
- HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Pokfulam, Hong Kong
| | - Weihua Huang
- BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong
| | - Jianying Yuan
- BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong
| | - Guangzhu He
- BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong
| | - Yanxiang Chen
- BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong
| | - Qi Pan
- BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong
| | - Yunjie Liu
- BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong
| | - Jingbo Tang
- BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong
| | - Gengxiong Wu
- BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong
| | - Hao Zhang
- BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong
| | - Yujian Shi
- BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong
| | - Yong Liu
- BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong
| | - Chang Yu
- BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong
| | - Bo Wang
- BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong
| | - Yao Lu
- BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong
| | - Changlei Han
- BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong
| | - David W Cheung
- HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Pokfulam, Hong Kong
| | - Siu-Ming Yiu
- HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Pokfulam, Hong Kong
| | - Shaoliang Peng
- School of Computer Science, National University of Defense Technology, No.47, Yanwachi street, Kaifu District, Changsha, Hunan, 410073, China
| | - Zhu Xiaoqian
- School of Computer Science, National University of Defense Technology, No.47, Yanwachi street, Kaifu District, Changsha, Hunan, 410073, China
| | - Guangming Liu
- School of Computer Science, National University of Defense Technology, No.47, Yanwachi street, Kaifu District, Changsha, Hunan, 410073, China
| | - Xiangke Liao
- School of Computer Science, National University of Defense Technology, No.47, Yanwachi street, Kaifu District, Changsha, Hunan, 410073, China
| | - Yingrui Li
- BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong
- HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Pokfulam, Hong Kong
| | - Huanming Yang
- BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong
| | - Jian Wang
- BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong
| | - Tak-Wah Lam
- HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Pokfulam, Hong Kong
| | - Jun Wang
- BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong
| |
Collapse
|
2077
|
Ren X, Liu T, Dong J, Sun L, Yang J, Zhu Y, Jin Q. Evaluating de Bruijn graph assemblers on 454 transcriptomic data. PLoS One 2012; 7:e51188. [PMID: 23236450 PMCID: PMC3517413 DOI: 10.1371/journal.pone.0051188] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2012] [Accepted: 10/31/2012] [Indexed: 12/29/2022] Open
Abstract
Next generation sequencing (NGS) technologies have greatly changed the landscape of transcriptomic studies of non-model organisms. Since there is no reference genome available, de novo assembly methods play key roles in the analysis of these data sets. Because of the huge amount of data generated by NGS technologies for each run, many assemblers, e.g., ABySS, Velvet and Trinity, are developed based on a de Bruijn graph due to its time- and space-efficiency. However, most of these assemblers were developed initially for the Illumina/Solexa platform. The performance of these assemblers on 454 transcriptomic data is unknown. In this study, we evaluated and compared the relative performance of these de Bruijn graph based assemblers on both simulated and real 454 transcriptomic data. The results suggest that Trinity, the Illumina/Solexa-specialized transcriptomic assembler, performs the best among the multiple de Bruijn graph assemblers, comparable to or even outperforming the standard 454 assembler Newbler which is based on the overlap-layout-consensus algorithm. Our evaluation is expected to provide helpful guidance for researchers to choose assemblers when analyzing 454 transcriptomic data.
Collapse
Affiliation(s)
- Xianwen Ren
- MOH Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Tao Liu
- MOH Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Jie Dong
- MOH Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Lilian Sun
- MOH Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Jian Yang
- MOH Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Yafang Zhu
- MOH Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Qi Jin
- MOH Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
- * E-mail:
| |
Collapse
|
2078
|
Genome sequence of the biocontrol agent Microbacterium barkeri strain 2011-R4. J Bacteriol 2012; 194:6666-7. [PMID: 23144410 DOI: 10.1128/jb.01468-12] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Microbacterium barkeri strain 2011-R4 is a Gram-positive epiphyte which has been confirmed as a biocontrol agent against several plant pathogens in our previous studies. Here, we present the draft genome sequence of this strain, which was isolated from the rice rhizosphere in Tonglu city, Zhejiang province, China.
Collapse
|
2079
|
Prakash T, Taylor TD. Functional assignment of metagenomic data: challenges and applications. Brief Bioinform 2012; 13:711-27. [PMID: 22772835 PMCID: PMC3504928 DOI: 10.1093/bib/bbs033] [Citation(s) in RCA: 101] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2012] [Accepted: 05/26/2012] [Indexed: 12/14/2022] Open
Abstract
Metagenomic sequencing provides a unique opportunity to explore earth's limitless environments harboring scores of yet unknown and mostly unculturable microbes and other organisms. Functional analysis of the metagenomic data plays a central role in projects aiming to explore the most essential questions in microbiology, namely 'In a given environment, among the microbes present, what are they doing, and how are they doing it?' Toward this goal, several large-scale metagenomic projects have recently been conducted or are currently underway. Functional analysis of metagenomic data mainly suffers from the vast amount of data generated in these projects. The shear amount of data requires much computational time and storage space. These problems are compounded by other factors potentially affecting the functional analysis, including, sample preparation, sequencing method and average genome size of the metagenomic samples. In addition, the read-lengths generated during sequencing influence sequence assembly, gene prediction and subsequently the functional analysis. The level of confidence for functional predictions increases with increasing read-length. Usually, the most reliable functional annotations for metagenomic sequences are achieved using homology-based approaches against publicly available reference sequence databases. Here, we present an overview of the current state of functional analysis of metagenomic sequence data, bottlenecks frequently encountered and possible solutions in light of currently available resources and tools. Finally, we provide some examples of applications from recent metagenomic studies which have been successfully conducted in spite of the known difficulties.
Collapse
|
2080
|
Davey JW, Cezard T, Fuentes-Utrilla P, Eland C, Gharbi K, Blaxter ML. Special features of RAD Sequencing data: implications for genotyping. Mol Ecol 2012; 22:3151-64. [PMID: 23110438 PMCID: PMC3712469 DOI: 10.1111/mec.12084] [Citation(s) in RCA: 236] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2012] [Revised: 09/07/2012] [Accepted: 09/12/2012] [Indexed: 12/17/2022]
Abstract
Restriction site-associated DNA Sequencing (RAD-Seq) is an economical and efficient method for SNP discovery and genotyping. As with other sequencing-by-synthesis methods, RAD-Seq produces stochastic count data and requires sensitive analysis to develop or genotype markers accurately. We show that there are several sources of bias specific to RAD-Seq that are not explicitly addressed by current genotyping tools, namely restriction fragment bias, restriction site heterozygosity and PCR GC content bias. We explore the performance of existing analysis tools given these biases and discuss approaches to limiting or handling biases in RAD-Seq data. While these biases need to be taken seriously, we believe RAD loci affected by them can be excluded or processed with relative ease in most cases and that most RAD loci will be accurately genotyped by existing tools.
Collapse
Affiliation(s)
- John W Davey
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, West Mains Road, Edinburgh, EH9 3JT, UK.
| | | | | | | | | | | |
Collapse
|
2081
|
Stepanauskas R. Single cell genomics: an individual look at microbes. Curr Opin Microbiol 2012; 15:613-20. [PMID: 23026140 DOI: 10.1016/j.mib.2012.09.001] [Citation(s) in RCA: 163] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2012] [Accepted: 09/12/2012] [Indexed: 12/18/2022]
Abstract
Single cell genomics (SCG) uncovers hereditary information at the most basic level of biological organization. It is emerging as a powerful complement to cultivation-based and microbial community-focused research approaches. SCG has been instrumental in identifying metabolic features, evolutionary histories and inter-organismal interactions of the uncultured microbial groups that dominate many environments and biogeochemical cycles. The SCG approach also holds great promise in microbial microevolution studies and industrial bioprospecting. Methods for SCG consist of a series of integrated processes, beginning with the collection and preservation of environmental samples, followed by physical separation, lysis and whole genome amplification of individual cells, and culminating in genomic sequencing and the inference of encoded biological features.
Collapse
|
2082
|
Abstract
Second-generation sequencing technologies are revolutionizing the study of metagenomes. Whole-genome shotgun sequencing of metagenomic DNA may become an attractive alternative to the current widely used ribosomal RNA gene studies. Large data sets of short sequence reads are mapped onto a custom microbial reference sequence. If a bacterial pangenome of completely sequenced genomes is taken as a reference, the output consists of the distribution of bacterial taxa in and bacterial gene contents of the metagenome. The relative abundance of functional categories and of individual pathways and fitness traits encoded by the metagenomic gene pool provides insight into habitat-specific features of the microbial community. Polymorphic sites in sequence reads may resolve the number and abundance of individual clonal complexes of dominant species in the polymicrobial community. These SNPs and de novo mutations may be exploited to trace the spatiotemporal spread of clones and the emergence of novel traits such as fitness or resistance determinants. In conclusion, massively parallel sequencing of metagenomic DNA allows deep insights into the composition and the genetic repertoire of polymicrobial communities.
Collapse
|
2083
|
Kamke J, Bayer K, Woyke T, Hentschel U. Exploring symbioses by single-cell genomics. THE BIOLOGICAL BULLETIN 2012; 223:30-43. [PMID: 22983031 DOI: 10.1086/bblv223n1p30] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Single-cell genomics has advanced the field of microbiology from the analysis of microbial metagenomes where information is "drowning in a sea of sequences," to recognizing each microbial cell as a separate and unique entity. Single-cell genomics employs Phi29 polymerase-mediated whole-genome amplification to yield microgram-range genomic DNA from single microbial cells. This method has now been applied to a handful of symbiotic systems, including bacterial symbionts of marine sponges, insects (grasshoppers, termites), and vertebrates (mouse, human). In each case, novel insights were obtained into the functional genomic repertoire of the bacterial partner, which, in turn, led to an improved understanding of the corresponding host. Single-cell genomics is particularly valuable when dealing with uncultivated microorganisms, as is still the case for many bacterial symbionts. In this review, we explore the power of single-cell genomics for symbiosis research and highlight recent insights into the symbiotic systems that were obtained by this approach.
Collapse
Affiliation(s)
- Janine Kamke
- Julius-von-Sachs Institute for Biological Sciences, University of Würzburg, Julius-von-Sachs Platz 3, 97082 Würzburg, Germany
| | | | | | | |
Collapse
|