51
|
Skern-Mauritzen R, Malde K, Besnier F, Nilsen F, Jonassen I, Reinhardt R, Koop B, Dalvin S, Mæhle S, Kongshaug H, Glover K. How does sequence variability affectde novoassembly quality? J NAT HIST 2013. [DOI: 10.1080/00222933.2012.738833] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
52
|
Singhal S. De novo
transcriptomic analyses for non‐model organisms: an evaluation of methods across a multi‐species data set. Mol Ecol Resour 2013; 13:403-16. [DOI: 10.1111/1755-0998.12077] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2012] [Revised: 12/13/2012] [Accepted: 12/22/2012] [Indexed: 01/09/2023]
Affiliation(s)
- Sonal Singhal
- Museum of Vertebrate Zoology University of California, Berkeley 3101 Valley Life Sciences Building Berkeley CA 94720‐3160 USA
- Department of Integrative Biology University of California, Berkeley 1005 Valley Life Sciences Building Berkeley CA 94720‐3140 USA
| |
Collapse
|
53
|
Shin H, Liu T, Duan X, Zhang Y, Liu XS. Computational methodology for ChIP-seq analysis. QUANTITATIVE BIOLOGY 2013; 1:54-70. [PMID: 25741452 DOI: 10.1007/s40484-013-0006-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Chromatin immunoprecipitation coupled with massive parallel sequencing (ChIP-seq) is a powerful technology to identify the genome-wide locations of DNA binding proteins such as transcription factors or modified histones. As more and more experimental laboratories are adopting ChIP-seq to unravel the transcriptional and epigenetic regulatory mechanisms, computational analyses of ChIP-seq also become increasingly comprehensive and sophisticated. In this article, we review current computational methodology for ChIP-seq analysis, recommend useful algorithms and workflows, and introduce quality control measures at different analytical steps. We also discuss how ChIP-seq could be integrated with other types of genomic assays, such as gene expression profiling and genome-wide association studies, to provide a more comprehensive view of gene regulatory mechanisms in important physiological and pathological processes.
Collapse
Affiliation(s)
- Hyunjin Shin
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute/Harvard School of Public Health, Boston, MA 02115, USA
| | - Tao Liu
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute/Harvard School of Public Health, Boston, MA 02115, USA
| | - Xikun Duan
- Department of Bioinformatics, School of Life Science and Technology, Tongji University, Shanghai 200092, China
| | - Yong Zhang
- Department of Bioinformatics, School of Life Science and Technology, Tongji University, Shanghai 200092, China
| | - X Shirley Liu
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute/Harvard School of Public Health, Boston, MA 02115, USA
| |
Collapse
|
54
|
Pabinger S, Dander A, Fischer M, Snajder R, Sperk M, Efremova M, Krabichler B, Speicher MR, Zschocke J, Trajanoski Z. A survey of tools for variant analysis of next-generation genome sequencing data. Brief Bioinform 2013; 15:256-78. [PMID: 23341494 PMCID: PMC3956068 DOI: 10.1093/bib/bbs086] [Citation(s) in RCA: 347] [Impact Index Per Article: 28.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Recent advances in genome sequencing technologies provide unprecedented opportunities to characterize individual genomic landscapes and identify mutations relevant for diagnosis and therapy. Specifically, whole-exome sequencing using next-generation sequencing (NGS) technologies is gaining popularity in the human genetics community due to the moderate costs, manageable data amounts and straightforward interpretation of analysis results. While whole-exome and, in the near future, whole-genome sequencing are becoming commodities, data analysis still poses significant challenges and led to the development of a plethora of tools supporting specific parts of the analysis workflow or providing a complete solution. Here, we surveyed 205 tools for whole-genome/whole-exome sequencing data analysis supporting five distinct analytical steps: quality assessment, alignment, variant identification, variant annotation and visualization. We report an overview of the functionality, features and specific requirements of the individual tools. We then selected 32 programs for variant identification, variant annotation and visualization, which were subjected to hands-on evaluation using four data sets: one set of exome data from two patients with a rare disease for testing identification of germline mutations, two cancer data sets for testing variant callers for somatic mutations, copy number variations and structural variations, and one semi-synthetic data set for testing identification of copy number variations. Our comprehensive survey and evaluation of NGS tools provides a valuable guideline for human geneticists working on Mendelian disorders, complex diseases and cancers.
Collapse
Affiliation(s)
- Stephan Pabinger
- Division for Bioinformatics, Innsbruck Medical University, Innrain 80, 6020 Innsbruck, Austria. Tel.: +43-512-9003-71401; Fax: +43-512-9003-73100;
| | | | | | | | | | | | | | | | | | | |
Collapse
|
55
|
Zhou X, Bao S, Wang B, Zhang X, Song YQ. Short read mapping for exome sequencing. Methods Mol Biol 2013; 1038:93-111. [PMID: 23872971 DOI: 10.1007/978-1-62703-514-9_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Mapping short reads to the reference genome is very often the prerequisite for applications utilizing the next-generation sequencing technologies. A dozen of software tools developed for this purpose have been widely used. But many practical issues remained when utilizing them to build a computational pipeline for downstream analyses. In this chapter, we describe the read mapping procedures adopted in our lab for the exome sequencing studies as an example to illustrate those practical details.
Collapse
Affiliation(s)
- Xueya Zhou
- Bioinformatics Division, Tsinghua National Laboratory of Information Science and Technology, Beijing, China
| | | | | | | | | |
Collapse
|
56
|
Gullapalli RR, Desai KV, Santana-Santos L, Kant JA, Becich MJ. Next generation sequencing in clinical medicine: Challenges and lessons for pathology and biomedical informatics. J Pathol Inform 2012; 3:40. [PMID: 23248761 PMCID: PMC3519097 DOI: 10.4103/2153-3539.103013] [Citation(s) in RCA: 94] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2012] [Accepted: 07/19/2012] [Indexed: 11/25/2022] Open
Abstract
The Human Genome Project (HGP) provided the initial draft of mankind's DNA sequence in 2001. The HGP was produced by 23 collaborating laboratories using Sanger sequencing of mapped regions as well as shotgun sequencing techniques in a process that occupied 13 years at a cost of ~$3 billion. Today, Next Generation Sequencing (NGS) techniques represent the next phase in the evolution of DNA sequencing technology at dramatically reduced cost compared to traditional Sanger sequencing. A single laboratory today can sequence the entire human genome in a few days for a few thousand dollars in reagents and staff time. Routine whole exome or even whole genome sequencing of clinical patients is well within the realm of affordability for many academic institutions across the country. This paper reviews current sequencing technology methods and upcoming advancements in sequencing technology as well as challenges associated with data generation, data manipulation and data storage. Implementation of routine NGS data in cancer genomics is discussed along with potential pitfalls in the interpretation of the NGS data. The overarching importance of bioinformatics in the clinical implementation of NGS is emphasized.[7] We also review the issue of physician education which also is an important consideration for the successful implementation of NGS in the clinical workplace. NGS technologies represent a golden opportunity for the next generation of pathologists to be at the leading edge of the personalized medicine approaches coming our way. Often under-emphasized issues of data access and control as well as potential ethical implications of whole genome NGS sequencing are also discussed. Despite some challenges, it's hard not to be optimistic about the future of personalized genome sequencing and its potential impact on patient care and the advancement of knowledge of human biology and disease in the near future.
Collapse
Affiliation(s)
- Rama R Gullapalli
- Department of Pathology, University of Pittsburgh Medical Centre, A701, Scaife Hall, 3550 Terrace Street, Pittsburgh, PA
| | | | | | | | | |
Collapse
|
57
|
Cheung M, Kwan H. Fighting Outbreaks with Bacterial Genomics: Case Review and Workflow Proposal. Public Health Genomics 2012; 15:341-51. [DOI: 10.1159/000342770] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2012] [Accepted: 08/20/2012] [Indexed: 01/09/2023] Open
|
58
|
Moorthie S, Hall A, Wright CF. Informatics and clinical genome sequencing: opening the black box. Genet Med 2012; 15:165-71. [PMID: 22975759 DOI: 10.1038/gim.2012.116] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Adoption of whole-genome sequencing as a routine biomedical tool is dependent not only on the availability of new high-throughput sequencing technologies, but also on the concomitant development of methods and tools for data collection, analysis, and interpretation. It would also be enormously facilitated by the development of decision support systems for clinicians and consideration of how such information can best be incorporated into care pathways. Here we present an overview of the data analysis and interpretation pipeline, the wider informatics needs, and some of the relevant ethical and legal issues.
Collapse
|
59
|
Castellana S, Romani M, Valente EM, Mazza T. A solid quality-control analysis of AB SOLiD short-read sequencing data. Brief Bioinform 2012; 14:684-95. [PMID: 22877770 DOI: 10.1093/bib/bbs048] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Next generation sequencers have greatly improved our ability to mine polymorphisms and mutations out of entire (or portions of) genomes. The reliability of their outputs, though, showed to be very related to the sequencing chemistry and to deeply affect the quality of the downstream analyses. We focus here on the two-base color code chemistry of AB SOLiD sequencers and propose a comprehensive quality control methodological and software pipeline. We used existing and custom tools to detect and purge short-reads of some common flaws due to sequencing errors and chemical hitches. We apply them to a cohort of SOLiD 4 runs and measure their joint efficacy in terms of the resulting ability to detect the greatest possible number of true variants.
Collapse
Affiliation(s)
- Stefano Castellana
- IRCCS Casa Sollievo della Sofferenza, Mendel. Tel.: +39 0644160526; Fax: +39 0644160538;
| | | | | | | |
Collapse
|
60
|
Wang Q, Xia J, Jia P, Pao W, Zhao Z. Application of next generation sequencing to human gene fusion detection: computational tools, features and perspectives. Brief Bioinform 2012; 14:506-19. [PMID: 22877769 DOI: 10.1093/bib/bbs044] [Citation(s) in RCA: 84] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Gene fusions are important genomic events in human cancer because their fusion gene products can drive the development of cancer and thus are potential prognostic tools or therapeutic targets in anti-cancer treatment. Major advancements have been made in computational approaches for fusion gene discovery over the past 3 years due to improvements and widespread applications of high-throughput next generation sequencing (NGS) technologies. To identify fusions from NGS data, existing methods typically leverage the strengths of both sequencing technologies and computational strategies. In this article, we review the NGS and computational features of existing methods for fusion gene detection and suggest directions for future development.
Collapse
|
61
|
Stranneheim H, Lundeberg J. Stepping stones in DNA sequencing. Biotechnol J 2012; 7:1063-73. [PMID: 22887891 PMCID: PMC3472021 DOI: 10.1002/biot.201200153] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2012] [Revised: 06/21/2012] [Accepted: 07/04/2012] [Indexed: 12/11/2022]
Abstract
In recent years there have been tremendous advances in our ability to rapidly and cost-effectively sequence DNA. This has revolutionized the fields of genetics and biology, leading to a deeper understanding of the molecular events in life processes. The rapid technological advances have enormously expanded sequencing opportunities and applications, but also imposed strains and challenges on steps prior to sequencing and in the downstream process of handling and analysis of these massive amounts of sequence data. Traditionally, sequencing has been limited to small DNA fragments of approximately one thousand bases (derived from the organism's genome) due to issues in maintaining a high sequence quality and accuracy for longer read lengths. Although many technological breakthroughs have been made, currently the commercially available massively parallel sequencing methods have not been able to resolve this issue. However, recent announcements in nanopore sequencing hold the promise of removing this read-length limitation, enabling sequencing of larger intact DNA fragments. The ability to sequence longer intact DNA with high accuracy is a major stepping stone towards greatly simplifying the downstream analysis and increasing the power of sequencing compared to today. This review covers some of the technical advances in sequencing that have opened up new frontiers in genomics.
Collapse
Affiliation(s)
- Henrik Stranneheim
- Science for Life Laboratory, KTH Royal Institute of Technology, Stockholm, Sweden.
| | | |
Collapse
|
62
|
Fischer M, Snajder R, Pabinger S, Dander A, Schossig A, Zschocke J, Trajanoski Z, Stocker G. SIMPLEX: cloud-enabled pipeline for the comprehensive analysis of exome sequencing data. PLoS One 2012; 7:e41948. [PMID: 22870267 PMCID: PMC3411592 DOI: 10.1371/journal.pone.0041948] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2012] [Accepted: 06/28/2012] [Indexed: 01/24/2023] Open
Abstract
In recent studies, exome sequencing has proven to be a successful screening tool for the identification of candidate genes causing rare genetic diseases. Although underlying targeted sequencing methods are well established, necessary data handling and focused, structured analysis still remain demanding tasks. Here, we present a cloud-enabled autonomous analysis pipeline, which comprises the complete exome analysis workflow. The pipeline combines several in-house developed and published applications to perform the following steps: (a) initial quality control, (b) intelligent data filtering and pre-processing, (c) sequence alignment to a reference genome, (d) SNP and DIP detection, (e) functional annotation of variants using different approaches, and (f) detailed report generation during various stages of the workflow. The pipeline connects the selected analysis steps, exposes all available parameters for customized usage, performs required data handling, and distributes computationally expensive tasks either on a dedicated high-performance computing infrastructure or on the Amazon cloud environment (EC2). The presented application has already been used in several research projects including studies to elucidate the role of rare genetic diseases. The pipeline is continuously tested and is publicly available under the GPL as a VirtualBox or Cloud image at http://simplex.i-med.ac.at; additional supplementary data is provided at http://www.icbi.at/exome.
Collapse
Affiliation(s)
- Maria Fischer
- Division for Bioinformatics, Biocenter, Innsbruck Medical University, Innsbruck, Austria
| | - Rene Snajder
- Division for Bioinformatics, Biocenter, Innsbruck Medical University, Innsbruck, Austria
- Oncotyrol, Center for Personalized Cancer Medicine, Innsbruck, Austria
| | - Stephan Pabinger
- Division for Bioinformatics, Biocenter, Innsbruck Medical University, Innsbruck, Austria
| | - Andreas Dander
- Division for Bioinformatics, Biocenter, Innsbruck Medical University, Innsbruck, Austria
- Oncotyrol, Center for Personalized Cancer Medicine, Innsbruck, Austria
| | - Anna Schossig
- Division of Human Genetics, Biocenter, Innsbruck Medical University, Innsbruck, Austria
| | - Johannes Zschocke
- Division of Human Genetics, Biocenter, Innsbruck Medical University, Innsbruck, Austria
| | - Zlatko Trajanoski
- Division for Bioinformatics, Biocenter, Innsbruck Medical University, Innsbruck, Austria
| | - Gernot Stocker
- Division for Bioinformatics, Biocenter, Innsbruck Medical University, Innsbruck, Austria
| |
Collapse
|
63
|
Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms. BMC Bioinformatics 2012; 13:170. [PMID: 22808927 PMCID: PMC3489510 DOI: 10.1186/1471-2105-13-170] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2012] [Accepted: 06/26/2012] [Indexed: 01/14/2023] Open
Abstract
Background The k-mer hash length is a key factor affecting the output of de novo transcriptome assembly packages using de Bruijn graph algorithms. Assemblies constructed with varying single k-mer choices might result in the loss of unique contiguous sequences (contigs) and relevant biological information. A common solution to this problem is the clustering of single k-mer assemblies. Even though annotation is one of the primary goals of a transcriptome assembly, the success of assembly strategies does not consider the impact of k-mer selection on the annotation output. This study provides an in-depth k-mer selection analysis that is focused on the degree of functional annotation achieved for a non-model organism where no reference genome information is available. Individual k-mers and clustered assemblies (CA) were considered using three representative software packages. Pair-wise comparison analyses (between individual k-mers and CAs) were produced to reveal missing Kyoto Encyclopedia of Genes and Genomes (KEGG) ortholog identifiers (KOIs), and to determine a strategy that maximizes the recovery of biological information in a de novo transcriptome assembly. Results Analyses of single k-mer assemblies resulted in the generation of various quantities of contigs and functional annotations within the selection window of k-mers (k-19 to k-63). For each k-mer in this window, generated assemblies contained certain unique contigs and KOIs that were not present in the other k-mer assemblies. Producing a non-redundant CA of k-mers 19 to 63 resulted in a more complete functional annotation than any single k-mer assembly. However, a fraction of unique annotations remained (~0.19 to 0.27% of total KOIs) in the assemblies of individual k-mers (k-19 to k-63) that were not present in the non-redundant CA. A workflow to recover these unique annotations is presented. Conclusions This study demonstrated that different k-mer choices result in various quantities of unique contigs per single k-mer assembly which affects biological information that is retrievable from the transcriptome. This undesirable effect can be minimized, but not eliminated, with clustering of multi-k assemblies with redundancy removal. The complete extraction of biological information in de novo transcriptomics studies requires both the production of a CA and efforts to identify unique contigs that are present in individual k-mer assemblies but not in the CA.
Collapse
|
64
|
Naidoo N, Pawitan Y, Soong R, Cooper DN, Ku CS. Human genetics and genomics a decade after the release of the draft sequence of the human genome. Hum Genomics 2012; 5:577-622. [PMID: 22155605 PMCID: PMC3525251 DOI: 10.1186/1479-7364-5-6-577] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade.
Collapse
Affiliation(s)
- Nasheen Naidoo
- Centre for Molecular Epidemiology, Department of Epidemiology and Public Health, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | | | | | | | | |
Collapse
|
65
|
Muyle A, Zemp N, Deschamps C, Mousset S, Widmer A, Marais GAB. Rapid de novo evolution of X chromosome dosage compensation in Silene latifolia, a plant with young sex chromosomes. PLoS Biol 2012; 10:e1001308. [PMID: 22529744 PMCID: PMC3328428 DOI: 10.1371/journal.pbio.1001308] [Citation(s) in RCA: 124] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2011] [Accepted: 03/01/2012] [Indexed: 11/18/2022] Open
Abstract
Silene latifolia is a dioecious plant with heteromorphic sex chromosomes that have originated only ∼10 million years ago and is a promising model organism to study sex chromosome evolution in plants. Previous work suggests that S. latifolia XY chromosomes have gradually stopped recombining and the Y chromosome is undergoing degeneration as in animal sex chromosomes. However, this work has been limited by the paucity of sex-linked genes available. Here, we used 35 Gb of RNA-seq data from multiple males (XY) and females (XX) of an S. latifolia inbred line to detect sex-linked SNPs and identified more than 1,700 sex-linked contigs (with X-linked and Y-linked alleles). Analyses using known sex-linked and autosomal genes, together with simulations indicate that these newly identified sex-linked contigs are reliable. Using read numbers, we then estimated expression levels of X-linked and Y-linked alleles in males and found an overall trend of reduced expression of Y-linked alleles, consistent with a widespread ongoing degeneration of the S. latifolia Y chromosome. By comparing expression intensities of X-linked alleles in males and females, we found that X-linked allele expression increases as Y-linked allele expression decreases in males, which makes expression of sex-linked contigs similar in both sexes. This phenomenon is known as dosage compensation and has so far only been observed in evolutionary old animal sex chromosome systems. Our results suggest that dosage compensation has evolved in plants and that it can quickly evolve de novo after the origin of sex chromosomes.
Collapse
Affiliation(s)
- Aline Muyle
- Laboratoire de Biométrie et Biologie Evolutive (UMR 5558), CNRS/Université Lyon 1, Villeurbanne, France
| | - Niklaus Zemp
- Institute of Integrative Biology (IBZ), ETH Zurich, Zürich, Switzerland
| | | | - Sylvain Mousset
- Laboratoire de Biométrie et Biologie Evolutive (UMR 5558), CNRS/Université Lyon 1, Villeurbanne, France
| | - Alex Widmer
- Institute of Integrative Biology (IBZ), ETH Zurich, Zürich, Switzerland
- * E-mail: (GABM); (AW)
| | - Gabriel A. B. Marais
- Laboratoire de Biométrie et Biologie Evolutive (UMR 5558), CNRS/Université Lyon 1, Villeurbanne, France
- * E-mail: (GABM); (AW)
| |
Collapse
|
66
|
Archer J, Baillie G, Watson SJ, Kellam P, Rambaut A, Robertson DL. Analysis of high-depth sequence data for studying viral diversity: a comparison of next generation sequencing platforms using Segminator II. BMC Bioinformatics 2012; 13:47. [PMID: 22443413 PMCID: PMC3359224 DOI: 10.1186/1471-2105-13-47] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2011] [Accepted: 03/23/2012] [Indexed: 01/23/2023] Open
Abstract
Background Next generation sequencing provides detailed insight into the variation present within viral populations, introducing the possibility of treatment strategies that are both reactive and predictive. Current software tools, however, need to be scaled up to accommodate for high-depth viral data sets, which are often temporally or spatially linked. In addition, due to the development of novel sequencing platforms and chemistries, each with implicit strengths and weaknesses, it will be helpful for researchers to be able to routinely compare and combine data sets from different platforms/chemistries. In particular, error associated with a specific sequencing process must be quantified so that true biological variation may be identified. Results Segminator II was developed to allow for the efficient comparison of data sets derived from different sources. We demonstrate its usage by comparing large data sets from 12 influenza H1N1 samples sequenced on both the 454 Life Sciences and Illumina platforms, permitting quantification of platform error. For mismatches median error rates at 0.10 and 0.12%, respectively, suggested that both platforms performed similarly. For insertions and deletions median error rates within the 454 data (at 0.3 and 0.2%, respectively) were significantly higher than those within the Illumina data (0.004 and 0.006%, respectively). In agreement with previous observations these higher rates were strongly associated with homopolymeric stretches on the 454 platform. Outside of such regions both platforms had similar indel error profiles. Additionally, we apply our software to the identification of low frequency variants. Conclusion We have demonstrated, using Segminator II, that it is possible to distinguish platform specific error from biological variation using data derived from two different platforms. We have used this approach to quantify the amount of error present within the 454 and Illumina platforms in relation to genomic location as well as location on the read. Given that next generation data is increasingly important in the analysis of drug-resistance and vaccine trials, this software will be useful to the pathogen research community. A zip file containing the source code and jar file is freely available for download from http://www.bioinf.manchester.ac.uk/segminator/.
Collapse
Affiliation(s)
- John Archer
- Computational and Evolutionary Biology, Faculty of Life Sciences, University of Manchester, Manchester, UK.
| | | | | | | | | | | |
Collapse
|
67
|
Stein LD. An introduction to the informatics of "next-generation" sequencing. ACTA ACUST UNITED AC 2012; Chapter 11:11.1.1-11.1.9. [PMID: 22161566 DOI: 10.1002/0471250953.bi1101s36] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Next-generation sequencing (NGS) packs the sequencing throughput of a 2000's-era genome center into a single affordable machine. However, software developed for conventional sequencing technologies is often inadequate to deal with the nature of NGS technologies, which produce short, massively parallel reads. This unit surveys the software packages that are available for managing and analyzing NGS data.
Collapse
Affiliation(s)
- Lincoln D Stein
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| |
Collapse
|
68
|
Lin X, Tang W, Ahmad S, Lu J, Colby CC, Zhu J, Yu Q. Applications of targeted gene capture and next-generation sequencing technologies in studies of human deafness and other genetic disabilities. Hear Res 2012; 288:67-76. [PMID: 22269275 DOI: 10.1016/j.heares.2012.01.004] [Citation(s) in RCA: 79] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/01/2011] [Revised: 01/04/2012] [Accepted: 01/06/2012] [Indexed: 01/25/2023]
Abstract
The goal of sequencing the entire human genome for $1000 is almost in sight. However, the total costs including DNA sequencing, data management, and analysis to yield a clear data interpretation are unlikely to be lowered significantly any time soon to make studies on a population scale and daily clinical uses feasible. Alternatively, the targeted enrichment of specific groups of disease and biological pathway-focused genes and the capture of up to an entire human exome (~1% of the genome) allowing an unbiased investigation of the complete protein-coding regions in the genome are now routine. Targeted gene capture followed by sequencing with massively parallel next-generation sequencing (NGS) has the advantages of 1) significant cost saving, 2) higher sequencing accuracy because of deeper achievable coverage, 3) a significantly shorter turnaround time, and 4) a more feasible data set for a bioinformatic analysis outcome that is functionally interpretable. Gene capture combined with NGS has allowed a much greater number of samples to be examined than is currently practical with whole-genome sequencing. Such an approach promises to bring a paradigm shift to biomedical research of Mendelian disorders and their clinical diagnoses, ultimately enabling personalized medicine based on one's genetic profile. In this review, we describe major methodologies currently used for gene capture and detection of genetic variations by NGS. We will highlight applications of this technology in studies of genetic disorders and discuss issues pertaining to applications of this powerful technology in genetic screening and the discovery of genes implicated in syndromic and non-syndromic hearing loss.
Collapse
Affiliation(s)
- Xi Lin
- Department of Otolaryngology, Emory University School of Medicine, 615 Michael Street, Atlanta, GA 30322-3030, USA.
| | | | | | | | | | | | | |
Collapse
|
69
|
Neveling K, den Hollander AI, Cremers FPM, Collin RWJ. Identification and analysis of inherited retinal disease genes. Methods Mol Biol 2012; 935:3-23. [PMID: 23150357 DOI: 10.1007/978-1-62703-080-9_1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Inherited retinal diseases display a very high degree of clinical and genetic heterogeneity, which poses challenges in identifying the underlying defects in known genes and in identifying novel retinal disease genes. Here, we outline the state-of-the-art techniques to find the causative DNA variants, with special attention for next-generation sequencing which can combine molecular diagnostics and retinal disease gene identification.
Collapse
Affiliation(s)
- Kornelia Neveling
- Department of Human Genetics, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
| | | | | | | |
Collapse
|
70
|
Ji Y, Shi Y, Ding G, Li Y. A new strategy for better genome assembly from very short reads. BMC Bioinformatics 2011; 12:493. [PMID: 22208765 PMCID: PMC3268122 DOI: 10.1186/1471-2105-12-493] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2011] [Accepted: 12/30/2011] [Indexed: 11/29/2022] Open
Abstract
Background With the rapid development of the next generation sequencing (NGS) technology, large quantities of genome sequencing data have been generated. Because of repetitive regions of genomes and some other factors, assembly of very short reads is still a challenging issue. Results A novel strategy for improving genome assembly from very short reads is proposed. It can increase accuracies of assemblies by integrating de novo contigs, and produce comparative contigs by allowing multiple references without limiting to genomes of closely related strains. Comparative contigs are used to scaffold de novo contigs. Using simulated and real datasets, it is shown that our strategy can effectively improve qualities of assemblies of isolated microbial genomes and metagenomes. Conclusions With more and more reference genomes available, our strategy will be useful to improve qualities of genome assemblies from very short reads. Some scripts are provided to make our strategy applicable at http://code.google.com/p/cd-hybrid/.
Collapse
Affiliation(s)
- Yan Ji
- Bioinformatics Center, Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, PR China
| | | | | | | |
Collapse
|
71
|
Hu B, Xie G, Lo CC, Starkenburg SR, Chain PSG. Pathogen comparative genomics in the next-generation sequencing era: genome alignments, pangenomics and metagenomics. Brief Funct Genomics 2011; 10:322-33. [DOI: 10.1093/bfgp/elr042] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
|
72
|
Abstract
PURPOSE OF REVIEW The purpose of this review is to describe the new DNA sequencing technologies referred to as next-generation sequencing (NGS). These new methods are becoming central to research in human disease and are starting to be used in routine clinical care. RECENT FINDINGS Advances in instrumentation have dramatically reduced the cost of DNA sequencing. An individual's entire genome can now be sequenced for $7500. In addition, the software needed to analyze and help interpret this data is rapidly improving. This technology has been used by researchers to discover new genetic disorders and new disease associations. In the clinic, it can define the etiology in patients with undiagnosed genetic disorders and identify mutations in a cancer to help guide chemotherapy. SUMMARY Here we discuss how whole-exome sequencing and whole-genome sequencing are used in basic research and clinical care. These new techniques promise to speed research and affect how healthcare is delivered.
Collapse
|
73
|
Jiménez-Gómez JM. Next generation quantitative genetics in plants. FRONTIERS IN PLANT SCIENCE 2011; 2:77. [PMID: 22645550 PMCID: PMC3355736 DOI: 10.3389/fpls.2011.00077] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2011] [Accepted: 10/23/2011] [Indexed: 05/31/2023]
Abstract
Most characteristics in living organisms show continuous variation, which suggests that they are controlled by multiple genes. Quantitative trait loci (QTL) analysis can identify the genes underlying continuous traits by establishing associations between genetic markers and observed phenotypic variation in a segregating population. The new high-throughput sequencing (HTS) technologies greatly facilitate QTL analysis by providing genetic markers at genome-wide resolution in any species without previous knowledge of its genome. In addition HTS serves to quantify molecular phenotypes, which aids to identify the loci responsible for QTLs and to understand the mechanisms underlying diversity. The constant improvements in price, experimental protocols, computational pipelines, and statistical frameworks are making feasible the use of HTS for any research group interested in quantitative genetics. In this review I discuss the application of HTS for molecular marker discovery, population genotyping, and expression profiling in QTL analysis.
Collapse
Affiliation(s)
- José M. Jiménez-Gómez
- Department of Plant Breeding and Genetics, Max Planck Institute for Plant Breeding ResearchKöln, Germany
| |
Collapse
|
74
|
Abstract
New diseases continue to emerge in both human and animal populations, and the importance of animals, as reservoirs for viruses that can cause zoonoses are evident. Thus, an increased knowledge of the viral flora in animals, both in healthy and diseased individuals, is important both for animal and human health. Viral metagenomics is a culture-independent approach that is used to investigate the complete viral genetic populations of a sample. This review describes and discusses the different possible steps of a viral metagenomic study utilizing sequence-independent amplification, high-throughput sequencing, and bioinformatics to identify viruses. With this technology, multiple viruses can be detected simultaneously and novel and highly divergent viruses can be discovered and genetically characterized for the first time. This review also briefly discusses the applications of viral metagenomics in veterinary science and lists some of the viruses discovered within this field.
Collapse
Affiliation(s)
- Anne-Lie Blomström
- Section of Virology, Department of Biomedical Sciences and Veterinary Public Health, Swedish University of Agricultural Sciences, Uppsala, Sweden.
| |
Collapse
|