51
|
Li C, Lin F, An D, Wang W, Huang R. Genome Sequencing and Assembly by Long Reads in Plants. Genes (Basel) 2017; 9:E6. [PMID: 29283420 PMCID: PMC5793159 DOI: 10.3390/genes9010006] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Revised: 12/18/2017] [Accepted: 12/18/2017] [Indexed: 11/17/2022] Open
Abstract
Plant genomes generated by Sanger and Next Generation Sequencing (NGS) have provided insight into species diversity and evolution. However, Sanger sequencing is limited in its applications due to high cost, labor intensity, and low throughput, while NGS reads are too short to resolve abundant repeats and polyploidy, leading to incomplete or ambiguous assemblies. The advent and improvement of long-read sequencing by Third Generation Sequencing (TGS) methods such as PacBio and Nanopore have shown promise in producing high-quality assemblies for complex genomes. Here, we review the development of sequencing, introducing the application as well as considerations of experimental design in TGS of plant genomes. We also introduce recent revolutionary scaffolding technologies including BioNano, Hi-C, and 10× Genomics. We expect that the informative guidance for genome sequencing and assembly by long reads will benefit the initiation of scientists' projects.
Collapse
Affiliation(s)
- Changsheng Li
- College of Agronomy, Shenyang Agricultural University, 120 Dongling Road, Shenyang 110866, China.
| | - Feng Lin
- College of Bioscience and Biotechnology, Shenyang Agricultural University, 120 Dongling Road, Shenyang 110866, China.
| | - Dong An
- School of Agriculture and Biology, Shanghai Jiao Tong University, 800 Dong Chuan Road, Shanghai 200240, China.
| | - Wenqin Wang
- School of Agriculture and Biology, Shanghai Jiao Tong University, 800 Dong Chuan Road, Shanghai 200240, China.
| | - Ruidong Huang
- College of Agronomy, Shenyang Agricultural University, 120 Dongling Road, Shenyang 110866, China.
| |
Collapse
|
52
|
Worthey EA. Analysis and Annotation of Whole-Genome or Whole-Exome Sequencing Derived Variants for Clinical Diagnosis. ACTA ACUST UNITED AC 2017; 95:9.24.1-9.24.28. [PMID: 29044471 DOI: 10.1002/cphg.49] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Over the last 10 years, next-generation sequencing (NGS) has transformed genomic research through substantial advances in technology and reduction in the cost of sequencing, and also in the systems required for analysis of these large volumes of data. This technology is now being used as a standard molecular diagnostic test in some clinical settings. The advances in sequencing have come so rapidly that the major bottleneck in identification of causal variants is no longer the sequencing or analysis (given access to appropriate tools), but rather clinical interpretation. Interpretation of genetic findings in a complex and ever changing clinical setting is scarcely a new challenge, but the task is increasingly complex in clinical genome-wide sequencing given the dramatic increase in dataset size and complexity. This increase requires application of appropriate interpretation tools, as well as development and application of appropriate methodologies and standard procedures. This unit provides an overview of these items. Specific challenges related to implementation of genome-wide sequencing in a clinical setting are discussed. © 2017 by John Wiley & Sons, Inc.
Collapse
|
53
|
Draft Genome Sequence of Plasmodium gonderi, a Malaria Parasite of African Old World Monkeys. GENOME ANNOUNCEMENTS 2017; 5:5/28/e00612-17. [PMID: 28705975 PMCID: PMC5511914 DOI: 10.1128/genomea.00612-17] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Plasmodium gonderi is a primate parasite whose natural host is the African Old World monkeys. Here, we report the draft genome sequence for P. gonderi The data are useful not only for understanding the evolution of malaria but also for allowing the comparative genomics of malaria parasites.
Collapse
|
54
|
Baichoo S, Ouzounis CA. Computational complexity of algorithms for sequence comparison, short-read assembly and genome alignment. Biosystems 2017; 156-157:72-85. [PMID: 28392341 DOI: 10.1016/j.biosystems.2017.03.003] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Revised: 03/21/2017] [Accepted: 03/22/2017] [Indexed: 12/12/2022]
Abstract
A multitude of algorithms for sequence comparison, short-read assembly and whole-genome alignment have been developed in the general context of molecular biology, to support technology development for high-throughput sequencing, numerous applications in genome biology and fundamental research on comparative genomics. The computational complexity of these algorithms has been previously reported in original research papers, yet this often neglected property has not been reviewed previously in a systematic manner and for a wider audience. We provide a review of space and time complexity of key sequence analysis algorithms and highlight their properties in a comprehensive manner, in order to identify potential opportunities for further research in algorithm or data structure optimization. The complexity aspect is poised to become pivotal as we will be facing challenges related to the continuous increase of genomic data on unprecedented scales and complexity in the foreseeable future, when robust biological simulation at the cell level and above becomes a reality.
Collapse
Affiliation(s)
- Shakuntala Baichoo
- Department of Computer Science & Engineering, University of Mauritius, Réduit 80837, Mauritius.
| | - Christos A Ouzounis
- Biological Computation & Process Laboratory, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas, Thessalonica 57001, Greece.
| |
Collapse
|
55
|
Genome Sequence of Pseudomonas citronellolis SJTE-3, an Estrogen- and Polycyclic Aromatic Hydrocarbon-Degrading Bacterium. GENOME ANNOUNCEMENTS 2016; 4:4/6/e01373-16. [PMID: 27932659 PMCID: PMC5146451 DOI: 10.1128/genomea.01373-16] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Pseudomonas citronellolis SJTE-3, isolated from the active sludge of a wastewater treatment plant in China, can utilize a series of environmental estrogens and estrogen-like toxicants. Here, we report its whole-genome sequence, containing one circular chromosome and one circular plasmid. Genes involved in estrogen biodegradation in this bacterium were predicted.
Collapse
|
56
|
Rusconi B, Sanjar F, Koenig SSK, Mammel MK, Tarr PI, Eppinger M. Whole Genome Sequencing for Genomics-Guided Investigations of Escherichia coli O157:H7 Outbreaks. Front Microbiol 2016; 7:985. [PMID: 27446025 PMCID: PMC4928038 DOI: 10.3389/fmicb.2016.00985] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2016] [Accepted: 06/08/2016] [Indexed: 01/29/2023] Open
Abstract
Multi isolate whole genome sequencing (WGS) and typing for outbreak investigations has become a reality in the post-genomics era. We applied this technology to strains from Escherichia coli O157:H7 outbreaks. These include isolates from seven North America outbreaks, as well as multiple isolates from the same patient and from different infected individuals in the same household. Customized high-resolution bioinformatics sequence typing strategies were developed to assess the core genome and mobilome plasticity. Sequence typing was performed using an in-house single nucleotide polymorphism (SNP) discovery and validation pipeline. Discriminatory power becomes of particular importance for the investigation of isolates from outbreaks in which macrogenomic techniques such as pulse-field gel electrophoresis or multiple locus variable number tandem repeat analysis do not differentiate closely related organisms. We also characterized differences in the phage inventory, allowing us to identify plasticity among outbreak strains that is not detectable at the core genome level. Our comprehensive analysis of the mobilome identified multiple plasmids that have not previously been associated with this lineage. Applied phylogenomics approaches provide strong molecular evidence for exceptionally little heterogeneity of strains within outbreaks and demonstrate the value of intra-cluster comparisons, rather than basing the analysis on archetypal reference strains. Next generation sequencing and whole genome typing strategies provide the technological foundation for genomic epidemiology outbreak investigation utilizing its significantly higher sample throughput, cost efficiency, and phylogenetic relatedness accuracy. These phylogenomics approaches have major public health relevance in translating information from the sequence-based survey to support timely and informed countermeasures. Polymorphisms identified in this work offer robust phylogenetic signals that index both short- and long-term evolution and can complement currently employed typing schemes for outbreak ex- and inclusion, diagnostics, surveillance, and forensic studies.
Collapse
Affiliation(s)
- Brigida Rusconi
- South Texas Center for Emerging Infectious Diseases, University of Texas at San AntonioSan Antonio, TX, USA; Department of Biology, University of Texas at San AntonioSan Antonio, TX, USA
| | - Fatemeh Sanjar
- South Texas Center for Emerging Infectious Diseases, University of Texas at San AntonioSan Antonio, TX, USA; Department of Biology, University of Texas at San AntonioSan Antonio, TX, USA
| | - Sara S K Koenig
- South Texas Center for Emerging Infectious Diseases, University of Texas at San AntonioSan Antonio, TX, USA; Department of Biology, University of Texas at San AntonioSan Antonio, TX, USA
| | - Mark K Mammel
- Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration Laurel, MD, USA
| | - Phillip I Tarr
- Department of Pediatrics, Washington University School of Medicine St. Louis, MO, USA
| | - Mark Eppinger
- South Texas Center for Emerging Infectious Diseases, University of Texas at San AntonioSan Antonio, TX, USA; Department of Biology, University of Texas at San AntonioSan Antonio, TX, USA
| |
Collapse
|
57
|
Whole-Genome Sequencing of a Haarlem Extensively Drug-Resistant Mycobacterium tuberculosis Clinical Isolate from Medellín, Colombia. GENOME ANNOUNCEMENTS 2016; 4:4/3/e00566-16. [PMID: 27313305 PMCID: PMC4911484 DOI: 10.1128/genomea.00566-16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Colombia is one of the 105 countries that has reported at least one case of extensively drug-resistant tuberculosis (XDR-TB). The Mycobacterium tuberculosis Haarlem genotype is ubiquitous worldwide. Here, we report the high-quality draft genome sequence of a Colombian Haarlem XDR-TB clinical isolate composed of 4,329,127 bp with 4,386 genes.
Collapse
|
58
|
Thangam M, Gopal RK. CRCDA--Comprehensive resources for cancer NGS data analysis. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav092. [PMID: 26450948 PMCID: PMC4597977 DOI: 10.1093/database/bav092] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 08/31/2015] [Indexed: 12/24/2022]
Abstract
Next generation sequencing (NGS) innovations put a compelling landmark in life science and changed the direction of research in clinical oncology with its productivity to diagnose and treat cancer. The aim of our portal comprehensive resources for cancer NGS data analysis (CRCDA) is to provide a collection of different NGS tools and pipelines under diverse classes with cancer pathways and databases and furthermore, literature information from PubMed. The literature data was constrained to 18 most common cancer types such as breast cancer, colon cancer and other cancers that exhibit in worldwide population. NGS-cancer tools for the convenience have been categorized into cancer genomics, cancer transcriptomics, cancer epigenomics, quality control and visualization. Pipelines for variant detection, quality control and data analysis were listed to provide out-of-the box solution for NGS data analysis, which may help researchers to overcome challenges in selecting and configuring individual tools for analysing exome, whole genome and transcriptome data. An extensive search page was developed that can be queried by using (i) type of data [literature, gene data and sequence read archive (SRA) data] and (ii) type of cancer (selected based on global incidence and accessibility of data). For each category of analysis, variety of tools are available and the biggest challenge is in searching and using the right tool for the right application. The objective of the work is collecting tools in each category available at various places and arranging the tools and other data in a simple and user-friendly manner for biologists and oncologists to find information easier. To the best of our knowledge, we have collected and presented a comprehensive package of most of the resources available in cancer for NGS data analysis. Given these factors, we believe that this website will be an useful resource to the NGS research community working on cancer. Database URL: http://bioinfo.au-kbc.org.in/ngs/ngshome.html.
Collapse
Affiliation(s)
- Manonanthini Thangam
- AU-KBC Research Centre, MIT Campus of Anna University, Chromepet, Chennai, India
| | - Ramesh Kumar Gopal
- AU-KBC Research Centre, MIT Campus of Anna University, Chromepet, Chennai, India
| |
Collapse
|
59
|
Kosugi S, Hirakawa H, Tabata S. GMcloser: closing gaps in assemblies accurately with a likelihood-based selection of contig or long-read alignments. Bioinformatics 2015; 31:3733-41. [PMID: 26261222 DOI: 10.1093/bioinformatics/btv465] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2015] [Accepted: 08/04/2015] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Genome assemblies generated with next-generation sequencing (NGS) reads usually contain a number of gaps. Several tools have recently been developed to close the gaps in these assemblies with NGS reads. Although these gap-closing tools efficiently close the gaps, they entail a high rate of misassembly at gap-closing sites. RESULTS We have found that the assembly error rates caused by these tools are 20-500-fold higher than the rate of errors introduced into contigs by de novo assemblers. We here describe GMcloser, a tool that accurately closes these gaps with a preassembled contig set or a long read set (i.e., error-corrected PacBio reads). GMcloser uses likelihood-based classifiers calculated from the alignment statistics between scaffolds, contigs and paired-end reads to correctly assign contigs or long reads to gap regions of scaffolds, thereby achieving accurate and efficient gap closure. We demonstrate with sequencing data from various organisms that the gap-closing accuracy of GMcloser is 3-100-fold higher than those of other available tools, with similar efficiency. AVAILABILITY AND IMPLEMENTATION GMcloser and an accompanying tool (GMvalue) for evaluating the assembly and correcting misassemblies except SNPs and short indels in the assembly are available at https://sourceforge.net/projects/gmcloser/. CONTACT shunichi.kosugi@riken.jp. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shunichi Kosugi
- Department of Technology Development, Kazusa DNA Research Institute, Kisarazu, Chiba 292-0818, Japan
| | - Hideki Hirakawa
- Department of Technology Development, Kazusa DNA Research Institute, Kisarazu, Chiba 292-0818, Japan
| | - Satoshi Tabata
- Department of Technology Development, Kazusa DNA Research Institute, Kisarazu, Chiba 292-0818, Japan
| |
Collapse
|
60
|
Ikegami T, Inatsugi T, Kojima I, Umemura M, Hagiwara H, Machida M, Asai K. Hybrid De Novo Genome Assembly Using MiSeq and SOLiD Short Read Data. PLoS One 2015; 10:e0126289. [PMID: 25919614 PMCID: PMC4412624 DOI: 10.1371/journal.pone.0126289] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2014] [Accepted: 03/31/2015] [Indexed: 12/04/2022] Open
Abstract
A hybrid de novo assembly pipeline was constructed to utilize both MiSeq and SOLiD short read data in combination in the assembly. The short read data were converted to a standard format of the pipeline, and were supplied to the pipeline components such as ABySS and SOAPdenovo. The assembly pipeline proceeded through several stages, and either MiSeq paired-end data, SOLiD mate-paired data, or both of them could be specified as input data at each stage separately. The pipeline was examined on the filamentous fungus Aspergillus oryzae RIB40, by aligning the assembly results against the reference sequences. Using both the MiSeq and the SOLiD data in the hybrid assembly, the alignment length was improved by a factor of 3 to 8, compared with the assemblies using either one of the data types. The number of the reproduced gene cluster regions encoding secondary metabolite biosyntheses (SMB) was also improved by the hybrid assemblies. These results imply that the MiSeq data with long read length are essential to construct accurate nucleotide sequences, while the SOLiD mate-paired reads with long insertion length enhance long-range arrangements of the sequences. The pipeline was also tested on the actinomycete Streptomyces avermitilis MA-4680, whose gene is known to have high-GC content. Although the quality of the SOLiD reads was too low to perform any meaningful assemblies by themselves, the alignment length to the reference was improved by a factor of 2, compared with the assembly using only the MiSeq data.
Collapse
Affiliation(s)
- Tsutomu Ikegami
- Information Technology Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Ibaraki, Japan
- * E-mail:
| | | | - Isao Kojima
- Information Technology Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Ibaraki, Japan
| | - Myco Umemura
- Bioproduction Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Sapporo, Hokkaido, Japan
| | - Hiroko Hagiwara
- Bioproduction Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Sapporo, Hokkaido, Japan
| | - Masayuki Machida
- Bioproduction Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Sapporo, Hokkaido, Japan
| | - Kiyoshi Asai
- Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| |
Collapse
|
61
|
ARYANA: Aligning Reads by Yet Another Approach. BMC Bioinformatics 2014; 15 Suppl 9:S12. [PMID: 25252881 PMCID: PMC4168712 DOI: 10.1186/1471-2105-15-s9-s12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Motivation Although there are many different algorithms and software tools for aligning sequencing reads, fast gapped sequence search is far from solved. Strong interest in fast alignment is best reflected in the $106 prize for the Innocentive competition on aligning a collection of reads to a given database of reference genomes. In addition, de novo assembly of next-generation sequencing long reads requires fast overlap-layout-concensus algorithms which depend on fast and accurate alignment. Contribution We introduce ARYANA, a fast gapped read aligner, developed on the base of BWA indexing infrastructure with a completely new alignment engine that makes it significantly faster than three other aligners: Bowtie2, BWA and SeqAlto, with comparable generality and accuracy. Instead of the time-consuming backtracking procedures for handling mismatches, ARYANA comes with the seed-and-extend algorithmic framework and a significantly improved efficiency by integrating novel algorithmic techniques including dynamic seed selection, bidirectional seed extension, reset-free hash tables, and gap-filling dynamic programming. As the read length increases ARYANA's superiority in terms of speed and alignment rate becomes more evident. This is in perfect harmony with the read length trend as the sequencing technologies evolve. The algorithmic platform of ARYANA makes it easy to develop mission-specific aligners for other applications using ARYANA engine. Availability ARYANA with complete source code can be obtained from http://github.com/aryana-aligner
Collapse
|
62
|
Jiang X, Peery A, Hall AB, Sharma A, Chen XG, Waterhouse RM, Komissarov A, Riehle MM, Shouche Y, Sharakhova MV, Lawson D, Pakpour N, Arensburger P, Davidson VLM, Eiglmeier K, Emrich S, George P, Kennedy RC, Mane SP, Maslen G, Oringanje C, Qi Y, Settlage R, Tojo M, Tubio JMC, Unger MF, Wang B, Vernick KD, Ribeiro JMC, James AA, Michel K, Riehle MA, Luckhart S, Sharakhov IV, Tu Z. Genome analysis of a major urban malaria vector mosquito, Anopheles stephensi. Genome Biol 2014; 15:459. [PMID: 25244985 PMCID: PMC4195908 DOI: 10.1186/s13059-014-0459-2] [Citation(s) in RCA: 85] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2014] [Accepted: 09/03/2014] [Indexed: 12/24/2022] Open
Abstract
Background Anopheles stephensi is the key vector of malaria throughout the Indian subcontinent and Middle East and an emerging model for molecular and genetic studies of mosquito-parasite interactions. The type form of the species is responsible for the majority of urban malaria transmission across its range. Results Here, we report the genome sequence and annotation of the Indian strain of the type form of An. stephensi. The 221 Mb genome assembly represents more than 92% of the entire genome and was produced using a combination of 454, Illumina, and PacBio sequencing. Physical mapping assigned 62% of the genome onto chromosomes, enabling chromosome-based analysis. Comparisons between An. stephensi and An. gambiae reveal that the rate of gene order reshuffling on the X chromosome was three times higher than that on the autosomes. An. stephensi has more heterochromatin in pericentric regions but less repetitive DNA in chromosome arms than An. gambiae. We also identify a number of Y-chromosome contigs and BACs. Interspersed repeats constitute 7.1% of the assembled genome while LTR retrotransposons alone comprise more than 49% of the Y contigs. RNA-seq analyses provide new insights into mosquito innate immunity, development, and sexual dimorphism. Conclusions The genome analysis described in this manuscript provides a resource and platform for fundamental and translational research into a major urban malaria vector. Chromosome-based investigations provide unique perspectives on Anopheles chromosome evolution. RNA-seq analysis and studies of immunity genes offer new insights into mosquito biology and mosquito-parasite interactions. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0459-2) contains supplementary material, which is available to authorized users.
Collapse
|
63
|
Draft Genome Sequences of Three Escherichia coli Strains Investigated for the Effects of Lysogeny on Niche Diversification. GENOME ANNOUNCEMENTS 2014; 2:2/5/e00955-14. [PMID: 25291771 PMCID: PMC4175207 DOI: 10.1128/genomea.00955-14] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
During the course of investigating the effects of lysogeny on niche diversification of Escherichia coli, we used the temperate phages induced from one E. coli strain to infect another and created an isogenic lysogen of the latter. The draft genome sequences of the three E. coli strains are reported herein.
Collapse
|
64
|
Tallon LJ, Liu X, Bennuru S, Chibucos MC, Godinez A, Ott S, Zhao X, Sadzewicz L, Fraser CM, Nutman TB, Dunning Hotopp JC. Single molecule sequencing and genome assembly of a clinical specimen of Loa loa, the causative agent of loiasis. BMC Genomics 2014; 15:788. [PMID: 25217238 PMCID: PMC4175631 DOI: 10.1186/1471-2164-15-788] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2014] [Accepted: 09/02/2014] [Indexed: 12/31/2022] Open
Abstract
Background More than 20% of the world’s population is at risk for infection by filarial nematodes and >180 million people worldwide are already infected. Along with infection comes significant morbidity that has a socioeconomic impact. The eight filarial nematodes that infect humans are Wuchereria bancrofti, Brugia malayi, Brugia timori, Onchocerca volvulus, Loa loa, Mansonella perstans, Mansonella streptocerca, and Mansonella ozzardi, of which three have published draft genome sequences. Since all have humans as the definitive host, standard avenues of research that rely on culturing and genetics have often not been possible. Therefore, genome sequencing provides an important window into understanding the biology of these parasites. The need for large amounts of high quality genomic DNA from homozygous, inbred lines; the availability of only short sequence reads from next-generation sequencing platforms at a reasonable expense; and the lack of random large insert libraries has limited our ability to generate high quality genome sequences for these parasites. However, the Pacific Biosciences single molecule, real-time sequencing platform holds great promise in reducing input amounts and generating sufficiently long sequences that bypass the need for large insert paired libraries. Results Here, we report on efforts to generate a more complete genome assembly for L. loa using genetically heterogeneous DNA isolated from a single clinical sample and sequenced on the Pacific Biosciences platform. To obtain the best assembly, numerous assemblers and sequencing datasets were analyzed, combined, and compared. Quiver-informed trimming of an assembly of only Pacific Biosciences reads by HGAP2 was selected as the final assembly of 96.4 Mbp in 2,250 contigs. This results in ~9% more of the genome in ~85% fewer contigs from ~80% less starting material at a fraction of the cost of previous Roche 454-based sequencing efforts. Conclusions The result is the most complete filarial nematode assembly produced thus far and demonstrates the utility of single molecule sequencing on the Pacific Biosciences platform for genetically heterogeneous metazoan genomes. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-788) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | - Julie C Dunning Hotopp
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
65
|
Schürch AC, Schipper D, Bijl MA, Dau J, Beckmen KB, Schapendonk CME, Raj VS, Osterhaus ADME, Haagmans BL, Tryland M, Smits SL. Metagenomic survey for viruses in Western Arctic caribou, Alaska, through iterative assembly of taxonomic units. PLoS One 2014; 9:e105227. [PMID: 25140520 PMCID: PMC4139337 DOI: 10.1371/journal.pone.0105227] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2014] [Accepted: 07/18/2014] [Indexed: 12/16/2022] Open
Abstract
Pathogen surveillance in animals does not provide a sufficient level of vigilance because it is generally confined to surveillance of pathogens with known economic impact in domestic animals and practically nonexistent in wildlife species. As most (re-)emerging viral infections originate from animal sources, it is important to obtain insight into viral pathogens present in the wildlife reservoir from a public health perspective. When monitoring living, free-ranging wildlife for viruses, sample collection can be challenging and availability of nucleic acids isolated from samples is often limited. The development of viral metagenomics platforms allows a more comprehensive inventory of viruses present in wildlife. We report a metagenomic viral survey of the Western Arctic herd of barren ground caribou (Rangifer tarandus granti) in Alaska, USA. The presence of mammalian viruses in eye and nose swabs of 39 free-ranging caribou was investigated by random amplification combined with a metagenomic analysis approach that applied exhaustive iterative assembly of sequencing results to define taxonomic units of each metagenome. Through homology search methods we identified the presence of several mammalian viruses, including different papillomaviruses, a novel parvovirus, polyomavirus, and a virus that potentially represents a member of a novel genus in the family Coronaviridae.
Collapse
Affiliation(s)
- Anita C. Schürch
- Department of Viroscience, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Debby Schipper
- Department of Viroscience, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Maarten A. Bijl
- Department of Viroscience, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Jim Dau
- Alaska Department of Fish and Game, Kotzebue, Alaska, United States of America
| | - Kimberlee B. Beckmen
- Alaska Department of Fish and Game, Division of Wildlife Conservation, Fairbanks, Alaska, United States of America
| | | | - V. Stalin Raj
- Department of Viroscience, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Albert D. M. E. Osterhaus
- Department of Viroscience, Erasmus Medical Center, Rotterdam, The Netherlands
- Viroclinics Biosciences, Rotterdam, The Netherlands
| | - Bart L. Haagmans
- Department of Viroscience, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Morten Tryland
- Research Group for Arctic Infection Biology, Department of Arctic and Marine Biology, UiT - the Arctic University of Norway, Tromsø, Norway
- Genøk - Centre for Biosafety, Tromsø, Norway
| | - Saskia L. Smits
- Department of Viroscience, Erasmus Medical Center, Rotterdam, The Netherlands
- Viroclinics Biosciences, Rotterdam, The Netherlands
| |
Collapse
|
66
|
O'Neil ST, Dzurisin JDK, Williams CM, Lobo NF, Higgins JK, Deines JM, Carmichael RD, Zeng E, Tan JC, Wu GC, Emrich SJ, Hellmann JJ. Gene expression in closely related species mirrors local adaptation: consequences for responses to a warming world. Mol Ecol 2014; 23:2686-98. [PMID: 24766086 DOI: 10.1111/mec.12773] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2013] [Revised: 04/18/2014] [Accepted: 04/23/2014] [Indexed: 11/27/2022]
Abstract
Local adaptation of populations could preclude or slow range expansions in response to changing climate, particularly when dispersal is limited. To investigate the differential responses of populations to changing climatic conditions, we exposed poleward peripheral and central populations of two Lepidoptera to reciprocal, common-garden climatic conditions and compared their whole-transcriptome expression. We found evidence of simple population differentiation in both species, and in the species with previously identified population structure and phenotypic local adaptation, we found several hundred genes that responded in a synchronized and localized fashion. These genes were primarily involved in energy metabolism and oxidative stress, and expression levels were most divergent between populations in the same environment in which we previously detected divergence for metabolism. We found no localized genes in the species with less population structure and for which no local adaptation was previously detected. These results challenge the assumption that species are functionally similar across their ranges and poleward peripheral populations are preadapted to warmer conditions. Rather, some taxa deserve population-level consideration when predicting the effects of climate change because they respond in genetically based, distinctive ways to changing conditions.
Collapse
Affiliation(s)
- Shawn T O'Neil
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
67
|
Olano C, Cano-Prieto C, Losada AA, Bull AT, Goodfellow M, Fiedler HP, Méndez C, Salas JA. Draft Genome Sequence of Marine Actinomycete Streptomyces sp. Strain NTK 937, Producer of the Benzoxazole Antibiotic Caboxamycin. GENOME ANNOUNCEMENTS 2014; 2:e00534-14. [PMID: 24994793 PMCID: PMC4081993 DOI: 10.1128/genomea.00534-14] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 05/13/2014] [Accepted: 06/16/2014] [Indexed: 11/20/2022]
Abstract
Streptomyces sp. strain NTK 937 is the producer of the benzoxazole antibiotic caboxamycin, which has been shown to exert inhibitory activity against Gram-positive bacteria, cytotoxic activity against several human tumor cell lines, and inhibition of the enzyme phosphodiesterase. In this genome announcement, we present a draft genome sequence of Streptomyces sp. NTK 937 in which we identified at least 35 putative secondary metabolite biosynthetic gene clusters.
Collapse
Affiliation(s)
- Carlos Olano
- Departamento de Biología Funcional e Instituto Universitario de Oncología del Principado de Asturias (I.U.O.P.A.), Universidad de Oviedo, Oviedo, Spain
| | - Carolina Cano-Prieto
- Departamento de Biología Funcional e Instituto Universitario de Oncología del Principado de Asturias (I.U.O.P.A.), Universidad de Oviedo, Oviedo, Spain
| | - Armando A Losada
- Departamento de Biología Funcional e Instituto Universitario de Oncología del Principado de Asturias (I.U.O.P.A.), Universidad de Oviedo, Oviedo, Spain
| | - Alan T Bull
- School of Biosciences, University of Kent, Canterbury, Kent, United Kingdom
| | - Michael Goodfellow
- School of Biology, University of Newcastle, Newcastle upon Tyne, United Kingdom
| | | | - Carmen Méndez
- Departamento de Biología Funcional e Instituto Universitario de Oncología del Principado de Asturias (I.U.O.P.A.), Universidad de Oviedo, Oviedo, Spain
| | - José A Salas
- Departamento de Biología Funcional e Instituto Universitario de Oncología del Principado de Asturias (I.U.O.P.A.), Universidad de Oviedo, Oviedo, Spain
| |
Collapse
|
68
|
Ekblom R, Wolf JBW. A field guide to whole-genome sequencing, assembly and annotation. Evol Appl 2014; 7:1026-42. [PMID: 25553065 PMCID: PMC4231593 DOI: 10.1111/eva.12178] [Citation(s) in RCA: 194] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2014] [Accepted: 05/20/2014] [Indexed: 12/12/2022] Open
Abstract
Genome sequencing projects were long confined to biomedical model organisms and required the concerted effort of large consortia. Rapid progress in high-throughput sequencing technology and the simultaneous development of bioinformatic tools have democratized the field. It is now within reach for individual research groups in the eco-evolutionary and conservation community to generate de novo draft genome sequences for any organism of choice. Because of the cost and considerable effort involved in such an endeavour, the important first step is to thoroughly consider whether a genome sequence is necessary for addressing the biological question at hand. Once this decision is taken, a genome project requires careful planning with respect to the organism involved and the intended quality of the genome draft. Here, we briefly review the state of the art within this field and provide a step-by-step introduction to the workflow involved in genome sequencing, assembly and annotation with particular reference to large and complex genomes. This tutorial is targeted at scientists with a background in conservation genetics, but more generally, provides useful practical guidance for researchers engaging in whole-genome sequencing projects.
Collapse
Affiliation(s)
- Robert Ekblom
- Department of Evolutionary Biology, Uppsala University Uppsala, Sweden
| | - Jochen B W Wolf
- Department of Evolutionary Biology, Uppsala University Uppsala, Sweden
| |
Collapse
|
69
|
|
70
|
|
71
|
Worthey EA. Analysis and annotation of whole-genome or whole-exome sequencing-derived variants for clinical diagnosis. CURRENT PROTOCOLS IN HUMAN GENETICS 2013; 79:9.24.1-9.24.24. [PMID: 24510652 DOI: 10.1002/0471142905.hg0924s79] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Over the last several years, next-generation sequencing (NGS) has transformed genomic research through substantial advances in technology and reduction in the cost of sequencing, and also in the systems required for analysis of these large volumes of data. This technology is now being used as a standard molecular diagnostic test under particular circumstances in some clinical settings. The advances in sequencing have come so rapidly that the major bottleneck in identification of causal variants is no longer the sequencing but rather the analysis and interpretation. Interpretation of genetic findings in a clinical setting is scarcely a new challenge, but the task is increasingly complex in clinical genome-wide sequencing given the dramatic increase in dataset size and complexity. This increase requires the development of novel or repositioned analysis tools, methodologies, and processes. This unit provides an overview of these items. Specific challenges related to implementation in a clinical setting are discussed.
Collapse
Affiliation(s)
- Elizabeth A Worthey
- Department of Pediatrics, Medical College of Wisconsin, Milwaukee, Wisconsin.,The Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, Wisconsin.,Department of Computer Science, University of Wisconsin, Milwaukee, Wisconsin
| |
Collapse
|
72
|
Zheng W, Huang L, Huang J, Wang X, Chen X, Zhao J, Guo J, Zhuang H, Qiu C, Liu J, Liu H, Huang X, Pei G, Zhan G, Tang C, Cheng Y, Liu M, Zhang J, Zhao Z, Zhang S, Han Q, Han D, Zhang H, Zhao J, Gao X, Wang J, Ni P, Dong W, Yang L, Yang H, Xu JR, Zhang G, Kang Z. High genome heterozygosity and endemic genetic recombination in the wheat stripe rust fungus. Nat Commun 2013; 4:2673. [PMID: 24150273 PMCID: PMC3826619 DOI: 10.1038/ncomms3673] [Citation(s) in RCA: 168] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2013] [Accepted: 09/26/2013] [Indexed: 11/08/2022] Open
Abstract
Stripe rust, caused by Puccinia striiformis f. sp. tritici (Pst), is one of the most destructive diseases of wheat. Here we report a 110-Mb draft sequence of Pst isolate CY32, obtained using a 'fosmid-to-fosmid' strategy, to better understand its race evolution and pathogenesis. The Pst genome is highly heterozygous and contains 25,288 protein-coding genes. Compared with non-obligate fungal pathogens, Pst has a more diverse gene composition and more genes encoding secreted proteins. Re-sequencing analysis indicates significant genetic variation among six isolates collected from different continents. Approximately 35% of SNPs are in the coding sequence regions, and half of them are non-synonymous. High genetic diversity in Pst suggests that sexual reproduction has an important role in the origin of different regional races. Our results show the effectiveness of the 'fosmid-to-fosmid' strategy for sequencing dikaryotic genomes and the feasibility of genome analysis to understand race evolution in Pst and other obligate pathogens.
Collapse
Affiliation(s)
- Wenming Zheng
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Physiology, Ecology and Genetic Improvement of Food Crop in Henan Province and College of Life Sciences, Henan Agricultural University, Zhengzhou, Henan Province 450002, China
- These authors contributed equally to this work
| | - Lili Huang
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
- These authors contributed equally to this work
| | - Jinqun Huang
- State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen 518083, China
- These authors contributed equally to this work
| | - Xiaojie Wang
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Xianming Chen
- USDA-ARS and Department of Plant Pathology, Washington State University, Pullman, Washington 99164-6430, USA
| | - Jie Zhao
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Jun Guo
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Hua Zhuang
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Chuangzhao Qiu
- State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen 518083, China
| | - Jie Liu
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Huiquan Liu
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Xueling Huang
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Guoliang Pei
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Gangming Zhan
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Chunlei Tang
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Yulin Cheng
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Minjie Liu
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Jinshan Zhang
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Zhongtao Zhao
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Shijie Zhang
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Qingmei Han
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Dejun Han
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Hongchang Zhang
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Jing Zhao
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Xiaoning Gao
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Jianfeng Wang
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Peixiang Ni
- State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen 518083, China
| | - Wei Dong
- State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen 518083, China
| | - Linfeng Yang
- State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen 518083, China
| | - Huanming Yang
- State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen 518083, China
| | - Jin-Rong Xu
- Department of Botany and Plant Pathology, Purdue University, West Lafayette, IN 47907, USA
| | - Gengyun Zhang
- State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen 518083, China
| | - Zhensheng Kang
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| |
Collapse
|
73
|
Abstract
The next-generation sequencing (NGS) revolution has drastically reduced time and cost requirements for sequencing of large genomes, and also qualitatively changed the problem of assembly. This article reviews the state of the art in de novo genome assembly, paying particular attention to mammalian-sized genomes. The strengths and weaknesses of the main sequencing platforms are highlighted, leading to a discussion of assembly and the new challenges associated with NGS data. Current approaches to assembly are outlined and the various software packages available are introduced and compared. The question of whether quality assemblies can be produced using short-read NGS data alone, or whether it must be combined with more expensive sequencing techniques, is considered. Prospects for future assemblers and tests of assembly performance are also discussed.
Collapse
Affiliation(s)
- Joseph Henson
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - German Tischler
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Zemin Ning
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| |
Collapse
|
74
|
O'Neil ST, Emrich SJ. Haplotype and minimum-chimerism consensus determination using short sequence data. BMC Genomics 2012; 13 Suppl 2:S4. [PMID: 22537299 PMCID: PMC3394418 DOI: 10.1186/1471-2164-13-s2-s4] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Assembling haplotypes given sequence data derived from a single individual is a well studied problem, but only recently has haplotype assembly been considered for population-sampled data. We discuss a software tool called Hapler, which is designed specifically for low-diversity, low-coverage data such as ecological samples derived from natural populations. Because such data may contain error as well as ambiguous haplotype information, we developed methods that increase confidence in these assemblies. Hapler also reconstructs full consensus sequences while minimizing and identifying possible chimeric points. Results Experiments on simulated data indicate that Hapler is effective at assembling haplotypes from gene-sized alignments of short reads. Further, in our tests Hapler-generated consensus sequences are less chimeric than the alternative consensus approaches of majority vote and viral quasispecies estimation regardless of error rate, read length, or population haplotype bias. Conclusions The analysis of genetically diverse sequence data is increasingly common, particularly in the field of ecoinformatics where transcriptome sequencing of natural populations is a cost effective alternative to genome sequencing. For such studies, it is important to consider and identify haplotype diversity. Hapler provides robust haplotype information and identifies possible phasing errors in consensus sequences, providing valuable information for population studies and downstream usage of resulting assemblies.
Collapse
Affiliation(s)
- Shawn T O'Neil
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA.
| | | |
Collapse
|
75
|
Carnevali P, Baccash J, Halpern AL, Nazarenko I, Nilsen GB, Pant KP, Ebert JC, Brownley A, Morenzoni M, Karpinchyk V, Martin B, Ballinger DG, Drmanac R. Computational techniques for human genome resequencing using mated gapped reads. J Comput Biol 2011; 19:279-92. [PMID: 22175250 DOI: 10.1089/cmb.2011.0201] [Citation(s) in RCA: 85] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Unchained base reads on self-assembling DNA nanoarrays have recently emerged as a promising approach to low-cost, high-quality resequencing of human genomes. Because of unique characteristics of these mated pair reads, existing computational methods for resequencing assembly, such as those based on map-consensus calling, are not adequate for accurate variant calling. We describe novel computational methods developed for accurate calling of SNPs and short substitutions and indels (<100 bp); the same methods apply to evaluation of hypothesized larger, structural variations. We use an optimization process that iteratively adjusts the genome sequence to maximize its a posteriori probability given the observed reads. For each candidate sequence, this probability is computed using Bayesian statistics with a simple read generation model and simplifying assumptions that make the problem computationally tractable. The optimization process iteratively applies one-base substitutions, insertions, and deletions until convergence is achieved to an optimum diploid sequence. A local de novo assembly procedure that generalizes approaches based on De Bruijn graphs is used to seed the optimization process in order to reduce the chance of converging to local optima. Finally, a correlation-based filter is applied to reduce the false positive rate caused by the presence of repetitive regions in the reference genome.
Collapse
Affiliation(s)
- Paolo Carnevali
- Complete Genomics Inc., Mountain View, California 94043, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
76
|
Dalloul RA, Long JA, Zimin AV, Aslam L, Beal K, Ann Blomberg L, Bouffard P, Burt DW, Crasta O, Crooijmans RPMA, Cooper K, Coulombe RA, De S, Delany ME, Dodgson JB, Dong JJ, Evans C, Frederickson KM, Flicek P, Florea L, Folkerts O, Groenen MAM, Harkins TT, Herrero J, Hoffmann S, Megens HJ, Jiang A, de Jong P, Kaiser P, Kim H, Kim KW, Kim S, Langenberger D, Lee MK, Lee T, Mane S, Marcais G, Marz M, McElroy AP, Modise T, Nefedov M, Notredame C, Paton IR, Payne WS, Pertea G, Prickett D, Puiu D, Qioa D, Raineri E, Ruffier M, Salzberg SL, Schatz MC, Scheuring C, Schmidt CJ, Schroeder S, Searle SMJ, Smith EJ, Smith J, Sonstegard TS, Stadler PF, Tafer H, Tu Z(J, Van Tassell CP, Vilella AJ, Williams KP, Yorke JA, Zhang L, Zhang HB, Zhang X, Zhang Y, Reed KM. Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis. PLoS Biol 2010; 8:e1000475. [PMID: 20838655 PMCID: PMC2935454 DOI: 10.1371/journal.pbio.1000475] [Citation(s) in RCA: 292] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2009] [Accepted: 07/27/2010] [Indexed: 12/11/2022] Open
Abstract
A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest.
Collapse
Affiliation(s)
- Rami A. Dalloul
- Avian Immunobiology Laboratory, Department of Animal and Poultry Sciences, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Julie A. Long
- Animal Biosciences and Biotechnology Laboratory, USDA Agricultural Research Service, Beltsville, Maryland, United States of America
| | - Aleksey V. Zimin
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America
| | - Luqman Aslam
- Animal Breeding and Genomics Centre, Wageningen University, Wageningen, the Netherlands
| | - Kathryn Beal
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Le Ann Blomberg
- Animal Biosciences and Biotechnology Laboratory, USDA Agricultural Research Service, Beltsville, Maryland, United States of America
| | - Pascal Bouffard
- Roche Applied Science, Indianapolis, Indiana, United States of America
| | - David W. Burt
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Roslin, Midlothian, United Kingdom
| | - Oswald Crasta
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, United States of America
- Chromatin Inc., Champaign, Illinois, United States of America
| | | | - Kristal Cooper
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Roger A. Coulombe
- Department of Veterinary Sciences, Utah State University, Logan, Utah, United States of America
| | - Supriyo De
- Gene Expression and Genomics Unit, National Institute on Aging, National Institutes of Health, Baltimore, Maryland, United States of America
| | - Mary E. Delany
- Department of Animal Science, University of California, Davis, California, United States of America
| | - Jerry B. Dodgson
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, United States of America
| | - Jennifer J. Dong
- Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, United States of America
| | - Clive Evans
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, United States of America
| | | | - Paul Flicek
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Liliana Florea
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
| | - Otto Folkerts
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, United States of America
- Chromatin Inc., Champaign, Illinois, United States of America
| | - Martien A. M. Groenen
- Animal Breeding and Genomics Centre, Wageningen University, Wageningen, the Netherlands
| | - Tim T. Harkins
- Roche Applied Science, Indianapolis, Indiana, United States of America
| | - Javier Herrero
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Steve Hoffmann
- Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
- LIFE Project, University of Leipzig, Leipzig, Germany
| | - Hendrik-Jan Megens
- Animal Breeding and Genomics Centre, Wageningen University, Wageningen, the Netherlands
| | - Andrew Jiang
- Department of Animal Science, University of California, Davis, California, United States of America
| | - Pieter de Jong
- Children's Hospital and Research Center at Oakland, Oakland, California, United States of America
| | - Pete Kaiser
- Institute for Animal Health, Compton, Berkshire, United Kingdom
| | - Heebal Kim
- Laboratory of Bioinformatics and Population Genetics, Department of Agricultural Biotechnology, Seoul National University, Seoul, Korea
| | - Kyu-Won Kim
- Laboratory of Bioinformatics and Population Genetics, Department of Agricultural Biotechnology, Seoul National University, Seoul, Korea
| | - Sungwon Kim
- Avian Immunobiology Laboratory, Department of Animal and Poultry Sciences, Virginia Tech, Blacksburg, Virginia, United States of America
| | - David Langenberger
- Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
| | - Mi-Kyung Lee
- Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, United States of America
| | - Taeheon Lee
- Laboratory of Bioinformatics and Population Genetics, Department of Agricultural Biotechnology, Seoul National University, Seoul, Korea
| | - Shrinivasrao Mane
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Guillaume Marcais
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America
| | - Manja Marz
- Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
- Philipps-Universität Marburg, Pharmazeutische Chemie, Marburg, Germany
| | - Audrey P. McElroy
- Avian Immunobiology Laboratory, Department of Animal and Poultry Sciences, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Thero Modise
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Mikhail Nefedov
- Children's Hospital and Research Center at Oakland, Oakland, California, United States of America
| | - Cédric Notredame
- Comparative Bioinformatics, Centre for Genomic Regulation (CRG), Universitat Pompeus Fabre, Barcelona, Spain
| | - Ian R. Paton
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Roslin, Midlothian, United Kingdom
| | - William S. Payne
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, United States of America
| | - Geo Pertea
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
| | - Dennis Prickett
- Institute for Animal Health, Compton, Berkshire, United Kingdom
| | - Daniela Puiu
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
| | - Dan Qioa
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Emanuele Raineri
- Comparative Bioinformatics, Centre for Genomic Regulation (CRG), Universitat Pompeus Fabre, Barcelona, Spain
| | - Magali Ruffier
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Steven L. Salzberg
- Center for Bioinformatics and Computational Biology, Department of Computer Science, University of Maryland, College Park, Maryland, United States of America
| | - Michael C. Schatz
- Center for Bioinformatics and Computational Biology, Department of Computer Science, University of Maryland, College Park, Maryland, United States of America
| | - Chantel Scheuring
- Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, United States of America
| | - Carl J. Schmidt
- Department of Animal and Food Sciences, University of Delaware, Newark, Delaware, United States of America
| | - Steven Schroeder
- Bovine Functional Genomics Laboratory, USDA Agricultural Research Service, Beltsville Agricultural Research Center, Beltsville, Maryland, United States of America
| | - Stephen M. J. Searle
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Edward J. Smith
- Avian Immunobiology Laboratory, Department of Animal and Poultry Sciences, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Jacqueline Smith
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Roslin, Midlothian, United Kingdom
| | - Tad S. Sonstegard
- Bovine Functional Genomics Laboratory, USDA Agricultural Research Service, Beltsville Agricultural Research Center, Beltsville, Maryland, United States of America
| | - Peter F. Stadler
- Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
- Fraunhofer Institut für Zelltherapie und Immunologie, Leipzig, Germany
- Department of Theoretical Chemistry University of Vienna, Vienna, Austria
- Santa Fe Institute, Santa Fe, New Mexico, United States of America
| | - Hakim Tafer
- Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
- Department of Theoretical Chemistry University of Vienna, Vienna, Austria
| | - Zhijian (Jake) Tu
- Department of Biochemistry, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Curtis P. Van Tassell
- Bovine Functional Genomics Laboratory, USDA Agricultural Research Service, Beltsville Agricultural Research Center, Beltsville, Maryland, United States of America
- Animal Improvement Programs Laboratory, USDA Agricultural Research Service, Beltsville Agricultural Research Center, Beltsville, Maryland, United States of America
| | - Albert J. Vilella
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Kelly P. Williams
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, United States of America
| | - James A. Yorke
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America
| | - Liqing Zhang
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Hong-Bin Zhang
- Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, United States of America
| | - Xiaojun Zhang
- Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, United States of America
| | - Yang Zhang
- Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, United States of America
| | - Kent M. Reed
- Department of Veterinary and Biomedical Sciences, College of Veterinary Medicine, University of Minnesota, St. Paul, Minnesota, United States of America
| |
Collapse
|
77
|
Lévesque CA, Brouwer H, Cano L, Hamilton JP, Holt C, Huitema E, Raffaele S, Robideau GP, Thines M, Win J, Zerillo MM, Beakes GW, Boore JL, Busam D, Dumas B, Ferriera S, Fuerstenberg SI, Gachon CMM, Gaulin E, Govers F, Grenville-Briggs L, Horner N, Hostetler J, Jiang RHY, Johnson J, Krajaejun T, Lin H, Meijer HJG, Moore B, Morris P, Phuntmart V, Puiu D, Shetty J, Stajich JE, Tripathy S, Wawra S, van West P, Whitty BR, Coutinho PM, Henrissat B, Martin F, Thomas PD, Tyler BM, De Vries RP, Kamoun S, Yandell M, Tisserat N, Buell CR. Genome sequence of the necrotrophic plant pathogen Pythium ultimum reveals original pathogenicity mechanisms and effector repertoire. Genome Biol 2010; 11:R73. [PMID: 20626842 PMCID: PMC2926784 DOI: 10.1186/gb-2010-11-7-r73] [Citation(s) in RCA: 273] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2010] [Revised: 05/02/2010] [Accepted: 07/13/2010] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Pythium ultimum is a ubiquitous oomycete plant pathogen responsible for a variety of diseases on a broad range of crop and ornamental species. RESULTS The P. ultimum genome (42.8 Mb) encodes 15,290 genes and has extensive sequence similarity and synteny with related Phytophthora species, including the potato blight pathogen Phytophthora infestans. Whole transcriptome sequencing revealed expression of 86% of genes, with detectable differential expression of suites of genes under abiotic stress and in the presence of a host. The predicted proteome includes a large repertoire of proteins involved in plant pathogen interactions, although, surprisingly, the P. ultimum genome does not encode any classical RXLR effectors and relatively few Crinkler genes in comparison to related phytopathogenic oomycetes. A lower number of enzymes involved in carbohydrate metabolism were present compared to Phytophthora species, with the notable absence of cutinases, suggesting a significant difference in virulence mechanisms between P. ultimum and more host-specific oomycete species. Although we observed a high degree of orthology with Phytophthora genomes, there were novel features of the P. ultimum proteome, including an expansion of genes involved in proteolysis and genes unique to Pythium. We identified a small gene family of cadherins, proteins involved in cell adhesion, the first report of these in a genome outside the metazoans. CONCLUSIONS Access to the P. ultimum genome has revealed not only core pathogenic mechanisms within the oomycetes but also lineage-specific genes associated with the alternative virulence and lifestyles found within the pythiaceous lineages compared to the Peronosporaceae.
Collapse
Affiliation(s)
- C André Lévesque
- Agriculture and Agri-Food Canada, 960 Carling Ave, Ottawa, ON, K1A 0C6, Canada
- Department of Biology, Carleton University, Ottawa, ON, K1S 5B6, Canada
| | - Henk Brouwer
- CBS-KNAW, Fungal Biodiversity Centre, Uppsalalaan 8, Utrecht, 3584 CT, The Netherlands
| | | | - John P Hamilton
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Carson Holt
- Eccles Institute of Human Genetics, University of Utah, 15 North 2030 East, Room 2100, Salt Lake City, UT 84112-5330, USA
| | | | | | - Gregg P Robideau
- Agriculture and Agri-Food Canada, 960 Carling Ave, Ottawa, ON, K1A 0C6, Canada
- Department of Biology, Carleton University, Ottawa, ON, K1S 5B6, Canada
| | - Marco Thines
- Biodiversity and Climate Research Centre, Georg-Voigt-Str 14-16, D-60325, Frankfurt, Germany
- Department of Biological Sciences, Insitute of Ecology, Evolution and Diversity, Johann Wolfgang Goethe University, Siesmayerstr. 70, D-60323 Frankfurt, Germany
| | - Joe Win
- The Sainsbury Laboratory, Norwich, NR4 7UH, UK
| | - Marcelo M Zerillo
- Department of Bioagricultural Sciences and Pest Management, Colorado State University, Fort Collins, CO 80523-1177, USA
| | - Gordon W Beakes
- School of Biology, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK
| | - Jeffrey L Boore
- Genome Project Solutions, 1024 Promenade Street, Hercules, CA 94547, USA
| | - Dana Busam
- J Craig Venter Institute, 9704 Medical Center Dr., Rockville, MD 20850, USA
| | - Bernard Dumas
- Surfaces Cellulaires et Signalisation chez les Végétaux, UMR5546 CNRS-Université de Toulouse, 24 chemin de Borde Rouge, BP42617, Auzeville, Castanet-Tolosan, F-31326, France
| | - Steve Ferriera
- J Craig Venter Institute, 9704 Medical Center Dr., Rockville, MD 20850, USA
| | | | | | - Elodie Gaulin
- Surfaces Cellulaires et Signalisation chez les Végétaux, UMR5546 CNRS-Université de Toulouse, 24 chemin de Borde Rouge, BP42617, Auzeville, Castanet-Tolosan, F-31326, France
| | - Francine Govers
- Laboratory of Phytopathology, Wageningen University, NL-1-6708 PB, Wageningen, The Netherlands
- Centre for BioSystems Genomics (CBSG), PO Box 98, 6700 AB Wageningen, The Netherlands
| | - Laura Grenville-Briggs
- Institute of Medical Sciences, University of Aberdeen, Foresterhill, Aberdeen, AB25 2ZD, UK
| | - Neil Horner
- Institute of Medical Sciences, University of Aberdeen, Foresterhill, Aberdeen, AB25 2ZD, UK
| | - Jessica Hostetler
- J Craig Venter Institute, 9704 Medical Center Dr., Rockville, MD 20850, USA
| | - Rays HY Jiang
- The Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Justin Johnson
- J Craig Venter Institute, 9704 Medical Center Dr., Rockville, MD 20850, USA
| | - Theerapong Krajaejun
- Department of Pathology, Faculty of Medicine-Ramathibodi Hospital, Mahidol University, Rama 6 Road, Bangkok, 10400, Thailand
| | - Haining Lin
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Harold JG Meijer
- Laboratory of Phytopathology, Wageningen University, NL-1-6708 PB, Wageningen, The Netherlands
| | - Barry Moore
- Eccles Institute of Human Genetics, University of Utah, 15 North 2030 East, Room 2100, Salt Lake City, UT 84112-5330, USA
| | - Paul Morris
- Department of Biological Sciences, Bowling Green State University, Bowling Green, OH 43403, USA
| | - Vipaporn Phuntmart
- Department of Biological Sciences, Bowling Green State University, Bowling Green, OH 43403, USA
| | - Daniela Puiu
- J Craig Venter Institute, 9704 Medical Center Dr., Rockville, MD 20850, USA
| | - Jyoti Shetty
- J Craig Venter Institute, 9704 Medical Center Dr., Rockville, MD 20850, USA
| | - Jason E Stajich
- Department of Plant Pathology and Microbiology, University of California, Riverside, CA 92521, USA
| | - Sucheta Tripathy
- Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Washington Street, Blacksburg, VA 24061-0477, USA
| | - Stephan Wawra
- Institute of Medical Sciences, University of Aberdeen, Foresterhill, Aberdeen, AB25 2ZD, UK
| | - Pieter van West
- Institute of Medical Sciences, University of Aberdeen, Foresterhill, Aberdeen, AB25 2ZD, UK
| | - Brett R Whitty
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Pedro M Coutinho
- Architecture et Fonction des Macromolecules Biologiques, UMR6098, CNRS, Univ. Aix-Marseille I & II, 163 Avenue de Luminy, 13288 Marseille, France
| | - Bernard Henrissat
- Architecture et Fonction des Macromolecules Biologiques, UMR6098, CNRS, Univ. Aix-Marseille I & II, 163 Avenue de Luminy, 13288 Marseille, France
| | - Frank Martin
- USDA-ARS, 1636 East Alisal St, Salinias, CA, 93905, USA
| | - Paul D Thomas
- Evolutionary Systems Biology, SRI International, Room AE207, 333 Ravenswood Ave, Menlo Park, CA 94025, USA
| | - Brett M Tyler
- Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Washington Street, Blacksburg, VA 24061-0477, USA
| | - Ronald P De Vries
- CBS-KNAW, Fungal Biodiversity Centre, Uppsalalaan 8, Utrecht, 3584 CT, The Netherlands
| | | | - Mark Yandell
- Eccles Institute of Human Genetics, University of Utah, 15 North 2030 East, Room 2100, Salt Lake City, UT 84112-5330, USA
| | - Ned Tisserat
- Department of Bioagricultural Sciences and Pest Management, Colorado State University, Fort Collins, CO 80523-1177, USA
| | - C Robin Buell
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
78
|
O'Neil ST, Dzurisin JDK, Carmichael RD, Lobo NF, Emrich SJ, Hellmann JJ. Population-level transcriptome sequencing of nonmodel organisms Erynnis propertius and Papilio zelicaon. BMC Genomics 2010; 11:310. [PMID: 20478048 PMCID: PMC2887415 DOI: 10.1186/1471-2164-11-310] [Citation(s) in RCA: 106] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2009] [Accepted: 05/17/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Several recent studies have demonstrated the use of Roche 454 sequencing technology for de novo transcriptome analysis. Low error rates and high coverage also allow for effective SNP discovery and genetic diversity estimates. However, genetically diverse datasets, such as those sourced from natural populations, pose challenges for assembly programs and subsequent analysis. Further, estimating the effectiveness of transcript discovery using Roche 454 transcriptome data is still a difficult task. RESULTS Using the Roche 454 FLX Titanium platform, we sequenced and assembled larval transcriptomes for two butterfly species: the Propertius duskywing, Erynnis propertius (Lepidoptera: Hesperiidae) and the Anise swallowtail, Papilio zelicaon (Lepidoptera: Papilionidae). The Expressed Sequence Tags (ESTs) generated represent a diverse sample drawn from multiple populations, developmental stages, and stress treatments. Despite this diversity, > 95% of the ESTs assembled into long (> 714 bp on average) and highly covered (> 9.6x on average) contigs. To estimate the effectiveness of transcript discovery, we compared the number of bases in the hit region of unigenes (contigs and singletons) to the length of the best match silkworm (Bombyx mori) protein--this "ortholog hit ratio" gives a close estimate on the amount of the transcript discovered relative to a model lepidopteran genome. For each species, we tested two assembly programs and two parameter sets; although CAP3 is commonly used for such data, the assemblies produced by Celera Assembler with modified parameters were chosen over those produced by CAP3 based on contig and singleton counts as well as ortholog hit ratio analysis. In the final assemblies, 1,413 E. propertius and 1,940 P. zelicaon unigenes had a ratio > 0.8; 2,866 E. propertius and 4,015 P. zelicaon unigenes had a ratio > 0.5. CONCLUSIONS Ultimately, these assemblies and SNP data will be used to generate microarrays for ecoinformatics examining climate change tolerance of different natural populations. These studies will benefit from high quality assemblies with few singletons (less than 26% of bases for each assembled transcriptome are present in unassembled singleton ESTs) and effective transcript discovery (over 6,500 of our putative orthologs cover at least 50% of the corresponding model silkworm gene).
Collapse
Affiliation(s)
- Shawn T O'Neil
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA
| | | | | | | | | | | |
Collapse
|
79
|
Kelley DR, Salzberg SL. Detection and correction of false segmental duplications caused by genome mis-assembly. Genome Biol 2010; 11:R28. [PMID: 20219098 PMCID: PMC2864568 DOI: 10.1186/gb-2010-11-3-r28] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2009] [Revised: 12/11/2009] [Accepted: 03/10/2010] [Indexed: 11/23/2022] Open
Abstract
Diploid genomes with divergent chromosomes present special problems for assembly software as two copies of especially polymorphic regions may be mistakenly constructed, creating the appearance of a recent segmental duplication. We developed a method for identifying such false duplications and applied it to four vertebrate genomes. For each genome, we corrected mis-assemblies, improved estimates of the amount of duplicated sequence, and recovered polymorphisms between the sequenced chromosomes.
Collapse
Affiliation(s)
- David R Kelley
- Center for Bioinformatics and Computational Biology, Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, USA
| | - Steven L Salzberg
- Center for Bioinformatics and Computational Biology, Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
80
|
Rausch T, Koren S, Denisov G, Weese D, Emde AK, Döring A, Reinert K. A consistency-based consensus algorithm for de novo and reference-guided sequence assembly of short reads. Bioinformatics 2009; 25:1118-24. [PMID: 19269990 PMCID: PMC2732307 DOI: 10.1093/bioinformatics/btp131] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2008] [Revised: 01/23/2009] [Accepted: 03/02/2009] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Novel high-throughput sequencing technologies pose new algorithmic challenges in handling massive amounts of short-read, high-coverage data. A robust and versatile consensus tool is of particular interest for such data since a sound multi-read alignment is a prerequisite for variation analyses, accurate genome assemblies and insert sequencing. RESULTS A multi-read alignment algorithm for de novo or reference-guided genome assembly is presented. The program identifies segments shared by multiple reads and then aligns these segments using a consistency-enhanced alignment graph. On real de novo sequencing data obtained from the newly established NCBI Short Read Archive, the program performs similarly in quality to other comparable programs. On more challenging simulated datasets for insert sequencing and variation analyses, our program outperforms the other tools. AVAILABILITY The consensus program can be downloaded from http://www.seqan.de/projects/consensus.html. It can be used stand-alone or in conjunction with the Celera Assembler. Both application scenarios as well as the usage of the tool are described in the documentation.
Collapse
Affiliation(s)
- Tobias Rausch
- International Max Planck Research School for Computational Biology and Scientific Computing, Ihnestr. 63-73, Algorithmische Bioinformatik, Institut für Informatik, Takustr. 9, 14195 Berlin, Germany.
| | | | | | | | | | | | | |
Collapse
|
81
|
Miller JR, Delcher AL, Koren S, Venter E, Walenz BP, Brownley A, Johnson J, Li K, Mobarry C, Sutton G. Aggressive assembly of pyrosequencing reads with mates. Bioinformatics 2008; 24:2818-24. [PMID: 18952627 PMCID: PMC2639302 DOI: 10.1093/bioinformatics/btn548] [Citation(s) in RCA: 374] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2008] [Revised: 10/17/2008] [Accepted: 10/20/2008] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION DNA sequence reads from Sanger and pyrosequencing platforms differ in cost, accuracy, typical coverage, average read length and the variety of available paired-end protocols. Both read types can complement one another in a 'hybrid' approach to whole-genome shotgun sequencing projects, but assembly software must be modified to accommodate their different characteristics. This is true even of pyrosequencing mated and unmated read combinations. Without special modifications, assemblers tuned for homogeneous sequence data may perform poorly on hybrid data. RESULTS Celera Assembler was modified for combinations of ABI 3730 and 454 FLX reads. The revised pipeline called CABOG (Celera Assembler with the Best Overlap Graph) is robust to homopolymer run length uncertainty, high read coverage and heterogeneous read lengths. In tests on four genomes, it generated the longest contigs among all assemblers tested. It exploited the mate constraints provided by paired-end reads from either platform to build larger contigs and scaffolds, which were validated by comparison to a finished reference sequence. A low rate of contig mis-assembly was detected in some CABOG assemblies, but this was reduced in the presence of sufficient mate pair data. AVAILABILITY The software is freely available as open-source from http://wgs-assembler.sf.net under the GNU Public License.
Collapse
Affiliation(s)
- Jason R Miller
- The J. Craig Venter Institute, 9712 Medical Center Drive, Rockville MD 20850, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
82
|
Axelrod N, Lin Y, Ng PC, Stockwell TB, Crabtree J, Huang J, Kirkness E, Strausberg RL, Frazier ME, Venter JC, Kravitz S, Levy S. The HuRef Browser: a web resource for individual human genomics. Nucleic Acids Res 2008; 37:D1018-24. [PMID: 19036787 PMCID: PMC2686481 DOI: 10.1093/nar/gkn939] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The HuRef Genome Browser is a web application for the navigation and analysis of the previously published genome of a human individual, termed HuRef. The browser provides a comparative view between the NCBI human reference sequence and the HuRef assembly, and it enables the navigation of the HuRef genome in the context of HuRef, NCBI and Ensembl annotations. Single nucleotide polymorphisms, indels, inversions, structural and copy-number variations are shown in the context of existing functional annotations on either genome in the comparative view. Demonstrated here are some potential uses of the browser to enable a better understanding of individual human genetic variation. The browser provides full access to the underlying reads with sequence and quality information, the genome assembly and the evidence supporting the identification of DNA polymorphisms. The HuRef Browser is a unique and versatile tool for browsing genome assemblies and studying individual human sequence variation in a diploid context. The browser is available online at http://huref.jcvi.org.
Collapse
Affiliation(s)
- Nelson Axelrod
- J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|