851
|
Shyr D, Liu Q. Next generation sequencing in cancer research and clinical application. Biol Proced Online 2013; 15:4. [PMID: 23406336 PMCID: PMC3599179 DOI: 10.1186/1480-9222-15-4] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2013] [Accepted: 02/09/2013] [Indexed: 01/29/2023] Open
Abstract
The wide application of next-generation sequencing (NGS), mainly through whole genome, exome and transcriptome sequencing, provides a high-resolution and global view of the cancer genome. Coupled with powerful bioinformatics tools, NGS promises to revolutionize cancer research, diagnosis and therapy. In this paper, we review the recent advances in NGS-based cancer genomic research as well as clinical application, summarize the current integrative oncogenomic projects, resources and computational algorithms, and discuss the challenge and future directions in the research and clinical application of cancer genomic sequencing.
Collapse
Affiliation(s)
- Derek Shyr
- Washington University, 63130, St. Louis, MO, USA
| | - Qi Liu
- Center for Quantitative Sciences, Vanderbilt University School of Medicine, 37232, Nashville, TN, USA
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, 37232, Nashville, TN, USA
| |
Collapse
|
852
|
AMY-tree: an algorithm to use whole genome SNP calling for Y chromosomal phylogenetic applications. BMC Genomics 2013; 14:101. [PMID: 23405914 PMCID: PMC3583733 DOI: 10.1186/1471-2164-14-101] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2012] [Accepted: 12/19/2012] [Indexed: 12/02/2022] Open
Abstract
Background Due to the rapid progress of next-generation sequencing (NGS) facilities, an explosion of human whole genome data will become available in the coming years. These data can be used to optimize and to increase the resolution of the phylogenetic Y chromosomal tree. Moreover, the exponential growth of known Y chromosomal lineages will require an automatic determination of the phylogenetic position of an individual based on whole genome SNP calling data and an up to date Y chromosomal tree. Results We present an automated approach, ‘AMY-tree’, which is able to determine the phylogenetic position of a Y chromosome using a whole genome SNP profile, independently from the NGS platform and SNP calling program, whereby mistakes in the SNP calling or phylogenetic Y chromosomal tree are taken into account. Moreover, AMY-tree indicates ambiguities within the present phylogenetic tree and points out new Y-SNPs which may be phylogenetically relevant. The AMY-tree software package was validated successfully on 118 whole genome SNP profiles of 109 males with different origins. Moreover, support was found for an unknown recurrent mutation, wrong reported mutation conversions and a large amount of new interesting Y-SNPs. Conclusions Therefore, AMY-tree is a useful tool to determine the Y lineage of a sample based on SNP calling, to identify Y-SNPs with yet unknown phylogenetic position and to optimize the Y chromosomal phylogenetic tree in the future. AMY-tree will not add lineages to the existing phylogenetic tree of the Y-chromosome but it is the first step to analyse whole genome SNP profiles in a phylogenetic framework.
Collapse
|
853
|
Kim SC, Jung Y, Park J, Cho S, Seo C, Kim J, Kim P, Park J, Seo J, Kim J, Park S, Jang I, Kim N, Yang JO, Lee B, Rho K, Jung Y, Keum J, Lee J, Han J, Kang S, Bae S, Choi SJ, Kim S, Lee JE, Kim W, Kim J, Lee S. A high-dimensional, deep-sequencing study of lung adenocarcinoma in female never-smokers. PLoS One 2013; 8:e55596. [PMID: 23405175 PMCID: PMC3566005 DOI: 10.1371/journal.pone.0055596] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2012] [Accepted: 12/27/2012] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Deep sequencing techniques provide a remarkable opportunity for comprehensive understanding of tumorigenesis at the molecular level. As omics studies become popular, integrative approaches need to be developed to move from a simple cataloguing of mutations and changes in gene expression to dissecting the molecular nature of carcinogenesis at the systemic level and understanding the complex networks that lead to cancer development. RESULTS Here, we describe a high-throughput, multi-dimensional sequencing study of primary lung adenocarcinoma tumors and adjacent normal tissues of six Korean female never-smoker patients. Our data encompass results from exome-seq, RNA-seq, small RNA-seq, and MeDIP-seq. We identified and validated novel genetic aberrations, including 47 somatic mutations and 19 fusion transcripts. One of the fusions involves the c-RET gene, which was recently reported to form fusion genes that may function as drivers of carcinogenesis in lung cancer patients. We also characterized gene expression profiles, which we integrated with genomic aberrations and gene regulations into functional networks. The most prominent gene network module that emerged indicates that disturbances in G2/M transition and mitotic progression are causally linked to tumorigenesis in these patients. Also, results from the analysis strongly suggest that several novel microRNA-target interactions represent key regulatory elements of the gene network. CONCLUSIONS Our study not only provides an overview of the alterations occurring in lung adenocarcinoma at multiple levels from genome to transcriptome and epigenome, but also offers a model for integrative genomics analysis and proposes potential target pathways for the control of lung adenocarcinoma.
Collapse
Affiliation(s)
- Sang Cheol Kim
- Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology, Daejeon, Korea
| | - Yeonjoo Jung
- Ewha Research Center for Systems Biology (ERCSB), Ewha Womans University, Seoul, Korea
- Division of Life and Pharmaceutical Sciences and the Center for Cell Signaling and Drug Discovery Research, Ewha Womans University, Seoul, Korea
| | - Jinah Park
- Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology, Daejeon, Korea
| | - Sooyoung Cho
- Ewha Research Center for Systems Biology (ERCSB), Ewha Womans University, Seoul, Korea
- Division of Life and Pharmaceutical Sciences and the Center for Cell Signaling and Drug Discovery Research, Ewha Womans University, Seoul, Korea
| | - Chaehwa Seo
- Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology, Daejeon, Korea
| | - Jaesang Kim
- Ewha Research Center for Systems Biology (ERCSB), Ewha Womans University, Seoul, Korea
- Division of Life and Pharmaceutical Sciences and the Center for Cell Signaling and Drug Discovery Research, Ewha Womans University, Seoul, Korea
| | - Pora Kim
- Ewha Research Center for Systems Biology (ERCSB), Ewha Womans University, Seoul, Korea
- Division of Life and Pharmaceutical Sciences and the Center for Cell Signaling and Drug Discovery Research, Ewha Womans University, Seoul, Korea
| | - Jehwan Park
- Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology, Daejeon, Korea
| | - Jihae Seo
- Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology, Daejeon, Korea
- Ewha Research Center for Systems Biology (ERCSB), Ewha Womans University, Seoul, Korea
- Division of Life and Pharmaceutical Sciences and the Center for Cell Signaling and Drug Discovery Research, Ewha Womans University, Seoul, Korea
| | - Jiwoong Kim
- Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology, Daejeon, Korea
| | - Seongjin Park
- Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology, Daejeon, Korea
| | - Insu Jang
- Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology, Daejeon, Korea
| | - Namshin Kim
- Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology, Daejeon, Korea
| | - Jin Ok Yang
- Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology, Daejeon, Korea
| | - Byungwook Lee
- Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology, Daejeon, Korea
| | - Kyoohyoung Rho
- Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology, Daejeon, Korea
| | - Yeonhwa Jung
- Ewha Research Center for Systems Biology (ERCSB), Ewha Womans University, Seoul, Korea
- Division of Life and Pharmaceutical Sciences and the Center for Cell Signaling and Drug Discovery Research, Ewha Womans University, Seoul, Korea
| | - Juhee Keum
- Ewha Research Center for Systems Biology (ERCSB), Ewha Womans University, Seoul, Korea
- Division of Life and Pharmaceutical Sciences and the Center for Cell Signaling and Drug Discovery Research, Ewha Womans University, Seoul, Korea
| | - Jinseon Lee
- Samsung Biomedical Research Institute (SBRI) and Cancer Research Institute, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Jungho Han
- Department of Pathology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Sangeun Kang
- Samsung Biomedical Research Institute (SBRI) and Cancer Research Institute, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Sujin Bae
- Samsung Biomedical Research Institute (SBRI) and Cancer Research Institute, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - So-Jung Choi
- Samsung Biomedical Research Institute (SBRI) and Cancer Research Institute, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | | | | | - Wankyu Kim
- Ewha Research Center for Systems Biology (ERCSB), Ewha Womans University, Seoul, Korea
- Division of Life and Pharmaceutical Sciences and the Center for Cell Signaling and Drug Discovery Research, Ewha Womans University, Seoul, Korea
| | - Jhingook Kim
- Samsung Biomedical Research Institute (SBRI) and Cancer Research Institute, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
- Department of Thoracic Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
- * E-mail: (SL); (JK)
| | - Sanghyuk Lee
- Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology, Daejeon, Korea
- Ewha Research Center for Systems Biology (ERCSB), Ewha Womans University, Seoul, Korea
- Division of Life and Pharmaceutical Sciences and the Center for Cell Signaling and Drug Discovery Research, Ewha Womans University, Seoul, Korea
- * E-mail: (SL); (JK)
| |
Collapse
|
854
|
McCormack JE, Hird SM, Zellmer AJ, Carstens BC, Brumfield RT. Applications of next-generation sequencing to phylogeography and phylogenetics. Mol Phylogenet Evol 2013; 66:526-38. [DOI: 10.1016/j.ympev.2011.12.007] [Citation(s) in RCA: 370] [Impact Index Per Article: 30.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2011] [Revised: 12/02/2011] [Accepted: 12/05/2011] [Indexed: 01/09/2023]
|
855
|
Hilbers FS, Meijers CM, Laros JFJ, van Galen M, Hoogerbrugge N, Vasen HFA, Nederlof PM, Wijnen JT, van Asperen CJ, Devilee P. Exome sequencing of germline DNA from non-BRCA1/2 familial breast cancer cases selected on the basis of aCGH tumor profiling. PLoS One 2013; 8:e55734. [PMID: 23383274 PMCID: PMC3561352 DOI: 10.1371/journal.pone.0055734] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2012] [Accepted: 12/30/2012] [Indexed: 12/17/2022] Open
Abstract
The bulk of familial breast cancer risk (∼70%) cannot be explained by mutations in the known predisposition genes, primarily BRCA1 and BRCA2. Underlying genetic heterogeneity in these cases is the probable explanation for the failure of all attempts to identify further high-risk alleles. While exome sequencing of non-BRCA1/2 breast cancer cases is a promising strategy to detect new high-risk genes, rational approaches to the rigorous pre-selection of cases are needed to reduce heterogeneity. We selected six families in which the tumours of multiple cases showed a specific genomic profile on array comparative genomic hybridization (aCGH). Linkage analysis in these families revealed a region on chromosome 4 with a LOD score of 2.49 under homogeneity. We then analysed the germline DNA of two patients from each family using exome sequencing. Initially focusing on the linkage region, no potentially pathogenic variants could be identified in more than one family. Variants outside the linkage region were then analysed, and we detected multiple possibly pathogenic variants in genes that encode DNA integrity maintenance proteins. However, further analysis led to the rejection of all variants due to poor co-segregation or a relatively high allele frequency in a control population. We concluded that using CGH results to focus on a sub-set of families for sequencing analysis did not enable us to identify a common genetic change responsible for the aggregation of breast cancer in these families. Our data also support the emerging view that non-BRCA1/2 hereditary breast cancer families have a very heterogeneous genetic basis.
Collapse
Affiliation(s)
- Florentine S Hilbers
- Department of Human Genetics, Leiden University Medical Centre, Leiden, The Netherlands.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
856
|
Swart EC, Bracht JR, Magrini V, Minx P, Chen X, Zhou Y, Khurana JS, Goldman AD, Nowacki M, Schotanus K, Jung S, Fulton RS, Ly A, McGrath S, Haub K, Wiggins JL, Storton D, Matese JC, Parsons L, Chang WJ, Bowen MS, Stover NA, Jones TA, Eddy SR, Herrick GA, Doak TG, Wilson RK, Mardis ER, Landweber LF. The Oxytricha trifallax macronuclear genome: a complex eukaryotic genome with 16,000 tiny chromosomes. PLoS Biol 2013; 11:e1001473. [PMID: 23382650 PMCID: PMC3558436 DOI: 10.1371/journal.pbio.1001473] [Citation(s) in RCA: 162] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2012] [Accepted: 12/12/2012] [Indexed: 01/03/2023] Open
Abstract
With more chromosomes than any other sequenced genome, the macronuclear genome of Oxytricha trifallax has a unique and complex architecture, including alternative fragmentation and predominantly single-gene chromosomes. The macronuclear genome of the ciliate Oxytricha trifallax displays an extreme and unique eukaryotic genome architecture with extensive genomic variation. During sexual genome development, the expressed, somatic macronuclear genome is whittled down to the genic portion of a small fraction (∼5%) of its precursor “silent” germline micronuclear genome by a process of “unscrambling” and fragmentation. The tiny macronuclear “nanochromosomes” typically encode single, protein-coding genes (a small portion, 10%, encode 2–8 genes), have minimal noncoding regions, and are differentially amplified to an average of ∼2,000 copies. We report the high-quality genome assembly of ∼16,000 complete nanochromosomes (∼50 Mb haploid genome size) that vary from 469 bp to 66 kb long (mean ∼3.2 kb) and encode ∼18,500 genes. Alternative DNA fragmentation processes ∼10% of the nanochromosomes into multiple isoforms that usually encode complete genes. Nucleotide diversity in the macronucleus is very high (SNP heterozygosity is ∼4.0%), suggesting that Oxytricha trifallax may have one of the largest known effective population sizes of eukaryotes. Comparison to other ciliates with nonscrambled genomes and long macronuclear chromosomes (on the order of 100 kb) suggests several candidate proteins that could be involved in genome rearrangement, including domesticated MULE and IS1595-like DDE transposases. The assembly of the highly fragmented Oxytricha macronuclear genome is the first completed genome with such an unusual architecture. This genome sequence provides tantalizing glimpses into novel molecular biology and evolution. For example, Oxytricha maintains tens of millions of telomeres per cell and has also evolved an intriguing expansion of telomere end-binding proteins. In conjunction with the micronuclear genome in progress, the O. trifallax macronuclear genome will provide an invaluable resource for investigating programmed genome rearrangements, complementing studies of rearrangements arising during evolution and disease. The macronuclear genome of the ciliate Oxytricha trifallax, contained in its somatic nucleus, has a unique genome architecture. Unlike its diploid germline genome, which is transcriptionally inactive during normal cellular growth, the macronuclear genome is fragmented into at least 16,000 tiny (∼3.2 kb mean length) chromosomes, most of which encode single actively transcribed genes and are differentially amplified to a few thousand copies each. The smallest chromosome is just 469 bp, while the largest is 66 kb and encodes a single enormous protein. We found considerable variation in the genome, including frequent alternative fragmentation patterns, generating chromosome isoforms with shared sequence. We also found limited variation in chromosome amplification levels, though insufficient to explain mRNA transcript level variation. Another remarkable feature of Oxytricha's macronuclear genome is its inordinate fondness for telomeres. In conjunction with its possession of tens of millions of chromosome-ending telomeres per macronucleus, we show that Oxytricha has evolved multiple putative telomere-binding proteins. In addition, we identified two new domesticated transposase-like protein classes that we propose may participate in the process of genome rearrangement. The macronuclear genome now provides a crucial resource for ongoing studies of genome rearrangement processes that use Oxytricha as an experimental or comparative model.
Collapse
Affiliation(s)
- Estienne C. Swart
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey, United States of America
| | - John R. Bracht
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey, United States of America
| | - Vincent Magrini
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Patrick Minx
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Xiao Chen
- Department of Molecular Biology, Princeton University, Princeton, New Jersey, United States of America
| | - Yi Zhou
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey, United States of America
| | - Jaspreet S. Khurana
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey, United States of America
| | - Aaron D. Goldman
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey, United States of America
| | - Mariusz Nowacki
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey, United States of America
- Institute of Cell Biology, University of Bern, Bern, Switzerland
| | - Klaas Schotanus
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey, United States of America
| | - Seolkyoung Jung
- Janelia Farm Research Campus, Howard Hughes Medical Institute, Ashburn, Virginia, United States of America
| | - Robert S. Fulton
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Amy Ly
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Sean McGrath
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Kevin Haub
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Jessica L. Wiggins
- Sequencing Core Facility, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Donna Storton
- Sequencing Core Facility, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - John C. Matese
- Sequencing Core Facility, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Lance Parsons
- Bioinformatics Group, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Wei-Jen Chang
- Department of Biology, Hamilton College, Clinton, New York, United States of America
| | - Michael S. Bowen
- Biology Department, Bradley University, Peoria, Illinois, United States of America
| | - Nicholas A. Stover
- Biology Department, Bradley University, Peoria, Illinois, United States of America
| | - Thomas A. Jones
- Janelia Farm Research Campus, Howard Hughes Medical Institute, Ashburn, Virginia, United States of America
| | - Sean R. Eddy
- Janelia Farm Research Campus, Howard Hughes Medical Institute, Ashburn, Virginia, United States of America
| | - Glenn A. Herrick
- Biology Department, University of Utah, Salt Lake City, Utah, United States of America
| | - Thomas G. Doak
- Department of Biology, University of Indiana, Bloomington, Indiana, United States of America
| | - Richard K. Wilson
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Elaine R. Mardis
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Laura F. Landweber
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey, United States of America
- * E-mail:
| |
Collapse
|
857
|
Barrett MT, Lenkiewicz E, Evers L, Holley T, Ruiz C, Bubendorf L, Sekulic A, Ramanathan RK, Von Hoff DD. Clonal evolution and therapeutic resistance in solid tumors. Front Pharmacol 2013; 4:2. [PMID: 23372550 PMCID: PMC3556559 DOI: 10.3389/fphar.2013.00002] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2012] [Accepted: 01/07/2013] [Indexed: 01/08/2023] Open
Abstract
Tumors frequently arise as a result of an acquired genomic instability and the subsequent evolution of neoplastic populations with variable genomes. A barrier to the study of the somatic genetics of human solid tumors in vivo is the presence of admixtures of non-neoplastic cells with normal genomes in patient samples. These can obscure the presence of somatic aberrations including mutations, homozygous deletions, and breakpoints in biopsies of interest. Furthermore, clinical samples frequently contain multiple neoplastic populations that cannot be distinguished by morphology. Consequently, it is difficult to determine whether mutations detected in a sample of interest are concurrent in a single clonal population or if they occur in distinct cell populations in the same sample. The advent of targeted therapies increases the selection for preexisting populations. However the asymmetric distribution of therapeutic targets in clonal populations provides a mechanism for the rapid evolution of resistant disease. Thus, there is a need to not only isolate tumor from normal cells, but to also enrich distinct populations of clonal neoplastic cells in order to apply genome technologies to identify clinically relevant genomic aberrations that drive disease in patients in vivo. To address this we have applied single and multiparameter DNA content based flow assays to the study of solid tumors. Our work has identified examples of clonal resistance to effective therapies. This includes androgen withdrawal in advanced prostate cancer. In addition we demonstrate examples of co-existing clonal populations with highly aberrant genomes and ploidies in a wide variety of solid tumors. We propose that clonal analysis of tumors, based on flow cytometry and high resolution genome analyses of purified neoplastic populations, provides a unique approach to the study of therapeutic responses and the evolution of resistance.
Collapse
Affiliation(s)
- Michael T Barrett
- The Translational Genomics Research Institute Scottsdale, AZ, USA ; Mayo Clinic Arizona Scottsdale, AZ, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
858
|
Jan M, Snyder TM, Corces-Zimmerman MR, Vyas P, Weissman IL, Quake SR, Majeti R. Clonal evolution of preleukemic hematopoietic stem cells precedes human acute myeloid leukemia. Sci Transl Med 2013; 4:149ra118. [PMID: 22932223 DOI: 10.1126/scitranslmed.3004315] [Citation(s) in RCA: 576] [Impact Index Per Article: 48.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Given that most bone marrow cells are short-lived, the accumulation of multiple leukemogenic mutations in a single clonal lineage has been difficult to explain. We propose that serial acquisition of mutations occurs in self-renewing hematopoietic stem cells (HSCs). We investigated this model through genomic analysis of HSCs from six patients with de novo acute myeloid leukemia (AML). Using exome sequencing, we identified mutations present in individual AML patients harboring the FLT3-ITD (internal tandem duplication) mutation. We then screened the residual HSCs and detected some of these mutations including mutations in the NPM1, TET2, and SMC1A genes. Finally, through single-cell analysis, we determined that a clonal progression of multiple mutations occurred in the HSCs of some AML patients. These preleukemic HSCs suggest the clonal evolution of AML genomes from founder mutations, revealing a potential mechanism contributing to relapse. Such preleukemic HSCs may constitute a cellular reservoir that should be targeted therapeutically for more durable remissions.
Collapse
Affiliation(s)
- Max Jan
- Program in Cancer Biology, Cancer Institute, Institute for Stem Cell Biology and Regenerative Medicine, and Ludwig Center, Stanford University School of Medicine, Palo Alto, CA 94305, USA
| | | | | | | | | | | | | |
Collapse
|
859
|
Computational and bioinformatics frameworks for next-generation whole exome and genome sequencing. ScientificWorldJournal 2013; 2013:730210. [PMID: 23365548 PMCID: PMC3556895 DOI: 10.1155/2013/730210] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2012] [Accepted: 11/22/2012] [Indexed: 12/28/2022] Open
Abstract
It has become increasingly apparent that one of the major hurdles in the genomic age will be the bioinformatics challenges of next-generation sequencing. We provide an overview of a general framework of bioinformatics analysis. For each of the three stages of (1) alignment, (2) variant calling, and (3) filtering and annotation, we describe the analysis required and survey the different software packages that are used. Furthermore, we discuss possible future developments as data sources grow and highlight opportunities for new bioinformatics tools to be developed.
Collapse
|
860
|
Wang Y, Lu J, Yu J, Gibbs RA, Yu F. An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data. Genome Res 2013; 23:833-42. [PMID: 23296920 PMCID: PMC3638139 DOI: 10.1101/gr.146084.112] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Next-generation sequencing is a powerful approach for discovering genetic variation. Sensitive variant calling and haplotype inference from population sequencing data remain challenging. We describe methods for high-quality discovery, genotyping, and phasing of SNPs for low-coverage (approximately 5×) sequencing of populations, implemented in a pipeline called SNPTools. Our pipeline contains several innovations that specifically address challenges caused by low-coverage population sequencing: (1) effective base depth (EBD), a nonparametric statistic that enables more accurate statistical modeling of sequencing data; (2) variance ratio scoring, a variance-based statistic that discovers polymorphic loci with high sensitivity and specificity; and (3) BAM-specific binomial mixture modeling (BBMM), a clustering algorithm that generates robust genotype likelihoods from heterogeneous sequencing data. Last, we develop an imputation engine that refines raw genotype likelihoods to produce high-quality phased genotypes/haplotypes. Designed for large population studies, SNPTools' input/output (I/O) and storage aware design leads to improved computing performance on large sequencing data sets. We apply SNPTools to the International 1000 Genomes Project (1000G) Phase 1 low-coverage data set and obtain genotyping accuracy comparable to that of SNP microarray.
Collapse
Affiliation(s)
- Yi Wang
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | | | | | | | | |
Collapse
|
861
|
Chen L, Li Y, Lin CH, Chan THM, Chow RKK, Song Y, Liu M, Yuan YF, Fu L, Kong KL, Qi L, Li Y, Zhang N, Tong AHY, Kwong DLW, Man K, Lo CM, Lok S, Tenen DG, Guan XY. Recoding RNA editing of AZIN1 predisposes to hepatocellular carcinoma. Nat Med 2013; 19:209-16. [PMID: 23291631 DOI: 10.1038/nm.3043] [Citation(s) in RCA: 412] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2011] [Accepted: 11/21/2012] [Indexed: 01/14/2023]
Abstract
A better understanding of human hepatocellular carcinoma (HCC) pathogenesis at the molecular level will facilitate the discovery of tumor-initiating events. Transcriptome sequencing revealed that adenosine-to-inosine (A→I) RNA editing of AZIN1 (encoding antizyme inhibitor 1) is increased in HCC specimens. A→I editing of AZIN1 transcripts, specifically regulated by ADAR1 (encoding adenosine deaminase acting on RNA-1), results in a serine-to-glycine substitution at residue 367 of AZIN1, located in β-strand 15 (β15) and predicted to cause a conformational change, induced a cytoplasmic-to-nuclear translocation and conferred gain-of-function phenotypes that were manifested by augmented tumor-initiating potential and more aggressive behavior. Compared with wild-type AZIN1 protein, the edited form has a stronger affinity to antizyme, and the resultant higher AZIN1 protein stability promotes cell proliferation through the neutralization of antizyme-mediated degradation of ornithine decarboxylase (ODC) and cyclin D1 (CCND1). Collectively, A→I RNA editing of AZIN1 may be a potential driver in the pathogenesis of human cancers, particularly HCC.
Collapse
Affiliation(s)
- Leilei Chen
- Department of Clinical Oncology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
862
|
Abstract
Our genome, the 6 billion bp of DNA that contain the blueprint of a human being, has become the focus of intense interest in medicine in the past two decades. Two developments have contributed to this situation: (1) the genetic basis of more and more diseases has been discovered, especially of malignant diseases, and (2) at the same time, our abilities to analyze our genome have increased exponentially through technological breakthroughs. We can expect genomics to become ever more relevant for day-to-day treatment decisions and patient management. It is therefore of great importance for physicians, especially those who are treating patients with malignant diseases, to become familiar with our genome and the technologies that are currently available for genomics analysis. This review provides a brief overview of the organization of our genome, high-throughput sequence analysis methods, and the analysis of leukemia genomes using next-generation sequencing (NGS) technologies.
Collapse
Affiliation(s)
- Stefan K Bohlander
- 1Department of Molecular Medicine and Pathology, Faculty of Medical and Health Sciences, University of Auckland, Auckland, New Zealand
| |
Collapse
|
863
|
Michaeli M, Noga H, Tabibian-Keissar H, Barshack I, Mehr R. Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing. Front Immunol 2012; 3:386. [PMID: 23293637 PMCID: PMC3531709 DOI: 10.3389/fimmu.2012.00386] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2012] [Accepted: 11/30/2012] [Indexed: 01/10/2023] Open
Abstract
High-throughput sequencing (HTS) yields tens of thousands to millions of sequences that require a large amount of pre-processing work to clean various artifacts. Such cleaning cannot be performed manually. Existing programs are not suitable for immunoglobulin (Ig) genes, which are variable and often highly mutated. This paper describes Ig High-Throughput Sequencing Cleaner (Ig-HTS-Cleaner), a program containing a simple cleaning procedure that successfully deals with pre-processing of Ig sequences derived from HTS, and Ig Insertion—Deletion Identifier (Ig-Indel-Identifier), a program for identifying legitimate and artifact insertions and/or deletions (indels). Our programs were designed for analyzing Ig gene sequences obtained by 454 sequencing, but they are applicable to all types of sequences and sequencing platforms. Ig-HTS-Cleaner and Ig-Indel-Identifier have been implemented in Java and saved as executable JAR files, supported on Linux and MS Windows. No special requirements are needed in order to run the programs, except for correctly constructing the input files as explained in the text. The programs' performance has been tested and validated on real and simulated data sets.
Collapse
Affiliation(s)
- Miri Michaeli
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University Ramat Gan, Israel
| | | | | | | | | |
Collapse
|
864
|
Bioinformatic perspectives in the neuronal ceroid lipofuscinoses. Biochim Biophys Acta Mol Basis Dis 2012; 1832:1831-41. [PMID: 23274885 DOI: 10.1016/j.bbadis.2012.12.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2012] [Revised: 12/16/2012] [Accepted: 12/19/2012] [Indexed: 02/06/2023]
Abstract
The neuronal ceroid lipofuscinoses (NCLs) are a group of rare genetic diseases characterised clinically by the progressive deterioration of mental, motor and visual functions and histopathologically by the intracellular accumulation of autofluorescent lipopigment - ceroid - in affected tissues. The NCLs are clinically and genetically heterogeneous and more than 14 genetically distinct NCL subtypes have been described to date (CLN1-CLN14) (Haltia and Goebel, 2012 [1]). In this review we will chronologically summarise work which has led over the years to identification of NCL genes, and outline the potential of novel genomic techniques and related bioinformatic approaches for further genetic dissection and diagnosis of NCLs. This article is part of a Special Issue entitled: The Neuronal Ceroid Lipofuscinoses or Batten Disease.
Collapse
|
865
|
Genotyping of fanconi anemia patients by whole exome sequencing: advantages and challenges. PLoS One 2012; 7:e52648. [PMID: 23285130 PMCID: PMC3527584 DOI: 10.1371/journal.pone.0052648] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2012] [Accepted: 11/20/2012] [Indexed: 11/19/2022] Open
Abstract
Fanconi anemia (FA) is a rare genomic instability syndrome. Disease-causing are biallelic mutations in any one of at least 15 genes encoding members of the FA/BRCA pathway of DNA-interstrand crosslink repair. Patients are diagnosed based upon phenotypical manifestations and the diagnosis of FA is confirmed by the hypersensitivity of cells to DNA interstrand crosslinking agents. Customary molecular diagnostics has become increasingly cumbersome, time-consuming and expensive the more FA genes have been identified. We performed Whole Exome Sequencing (WES) in four FA patients in order to investigate the potential of this method for FA genotyping. In search of an optimal WES methodology we explored different enrichment and sequencing techniques. In each case we were able to identify the pathogenic mutations so that WES provided both, complementation group assignment and mutation detection in a single approach. The mutations included homozygous and heterozygous single base pair substitutions and a two-base-pair duplication in FANCJ, -D1, or -D2. Different WES strategies had no critical influence on the individual outcome. However, database errors and in particular pseudogenes impose obstacles that may prevent correct data perception and interpretation, and thus cause pitfalls. With these difficulties in mind, our results show that WES is a valuable tool for the molecular diagnosis of FA and a sufficiently safe technique, capable of engaging increasingly in competition with classical genetic approaches.
Collapse
|
866
|
Exome-assistant: a rapid and easy detection of disease-related genes and genetic variations from exome sequencing. BMC Genomics 2012; 13:692. [PMID: 23231371 PMCID: PMC3539923 DOI: 10.1186/1471-2164-13-692] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2012] [Accepted: 11/22/2012] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Protein-coding regions in human genes harbor 85% of the mutations that are associated with disease-related traits. Compared with whole-genome sequencing of complex samples, exome sequencing serves as an alternative option because of its dramatically reduced cost. In fact, exome sequencing has been successfully applied to identify the cause of several Mendelian disorders, such as Miller and Schinzel-Giedio syndrome. However, there remain great challenges in handling the huge data generated by exome sequencing and in identifying potential disease-related genetic variations. RESULTS In this study, Exome-assistant (http://122.228.158.106/exomeassistant), a convenient tool for submitting and annotating single nucleotide polymorphisms (SNPs) and insertion/deletion variations (InDels), was developed to rapidly detect candidate disease-related genetic variations from exome sequencing projects. Versatile filter criteria are provided by Exome-assistant to meet different users' requirements. Exome-assistant consists of four modules: the single case module, the two cases module, the multiple cases module, and the reanalysis module. The two cases and multiple cases modules allow users to identify sample-specific and common variations. The multiple cases module also supports family-based studies and Mendelian filtering. The identified candidate disease-related genetic variations can be annotated according to their sample features. CONCLUSIONS In summary, by exploring exome sequencing data, Exome-assistant can provide researchers with detailed biological insights into genetic variation events and permits the identification of potential genetic causes of human diseases and related traits.
Collapse
|
867
|
Holley T, Lenkiewicz E, Evers L, Tembe W, Ruiz C, Gsponer JR, Rentsch CA, Bubendorf L, Stapleton M, Amorese D, Legendre C, Cunliffe HE, McCullough AE, Pockaj B, Craig D, Carpten J, Von Hoff D, Iacobuzio-Donahue C, Barrett MT. Deep clonal profiling of formalin fixed paraffin embedded clinical samples. PLoS One 2012; 7:e50586. [PMID: 23226320 PMCID: PMC3511535 DOI: 10.1371/journal.pone.0050586] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2012] [Accepted: 10/23/2012] [Indexed: 01/15/2023] Open
Abstract
Formalin fixed paraffin embedded (FFPE) tissues are a vast resource of annotated clinical samples. As such, they represent highly desirable and informative materials for the application of high definition genomics for improved patient management and to advance the development of personalized therapeutics. However, a limitation of FFPE tissues is the variable quality of DNA extracted for analyses. Furthermore, admixtures of non-tumor and polyclonal neoplastic cell populations limit the number of biopsies that can be studied and make it difficult to define cancer genomes in patient samples. To exploit these valuable tissues we applied flow cytometry-based methods to isolate pure populations of tumor cell nuclei from FFPE tissues and developed a methodology compatible with oligonucleotide array CGH and whole exome sequencing analyses. These were used to profile a variety of tumors (breast, brain, bladder, ovarian and pancreas) including the genomes and exomes of matching fresh frozen and FFPE pancreatic adenocarcinoma samples.
Collapse
Affiliation(s)
- Tara Holley
- Clinical Translational Research Division, Translational Genomics Research Institute, Scottsdale, Arizona, United States of America
| | - Elizabeth Lenkiewicz
- Clinical Translational Research Division, Translational Genomics Research Institute, Scottsdale, Arizona, United States of America
| | - Lisa Evers
- Clinical Translational Research Division, Translational Genomics Research Institute, Scottsdale, Arizona, United States of America
| | - Waibhav Tembe
- Collaborative Bioinformatics Center, Translational Genomics Research Institute, Phoenix, Arizona, United States of America
| | - Christian Ruiz
- Institute for Pathology, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Joel R. Gsponer
- Institute for Pathology, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Cyrill A. Rentsch
- Institute for Pathology, University Hospital Basel, University of Basel, Basel, Switzerland
- Department of Urology, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Lukas Bubendorf
- Institute for Pathology, University Hospital Basel, University of Basel, Basel, Switzerland
| | | | - Doug Amorese
- NuGEN, San Carlos, California, United States of America
| | - Christophe Legendre
- Collaborative Bioinformatics Center, Translational Genomics Research Institute, Phoenix, Arizona, United States of America
| | - Heather E. Cunliffe
- Computational Biology Division, Translational Genomics Research Institute, Phoenix, Arizona, United States of America
| | - Ann E. McCullough
- Department of Laboratory Medicine, Mayo Clinic, Scottsdale, Arizona, United States of America
| | - Barbara Pockaj
- Department of Surgery, Mayo Clinic, Scottsdale, Arizona, United States of America
| | - David Craig
- Neurogenomics Division, Translational Genomics Research Institute, Phoenix, Arizona, United States of America
| | - John Carpten
- Integrated Cancer Genomics Division, Translational Genomics Research Institute, Phoenix, Arizona, United States of America
| | - Daniel Von Hoff
- Clinical Translational Research Division, Translational Genomics Research Institute, Scottsdale, Arizona, United States of America
- Virginia G. Piper Cancer Center, Scottsdale Healthcare, Scottsdale, Arizona, United States of America
| | | | - Michael T. Barrett
- Clinical Translational Research Division, Translational Genomics Research Institute, Scottsdale, Arizona, United States of America
- * E-mail:
| |
Collapse
|
868
|
Aurrecoechea C, Barreto A, Brestelli J, Brunk BP, Cade S, Doherty R, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, Heiges M, Hu S, Iodice J, Kissinger JC, Kraemer ET, Li W, Pinney DF, Pitts B, Roos DS, Srinivasamoorthy G, Stoeckert CJ, Wang H, Warrenfeltz S. EuPathDB: the eukaryotic pathogen database. Nucleic Acids Res 2012; 41:D684-91. [PMID: 23175615 PMCID: PMC3531183 DOI: 10.1093/nar/gks1113] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
EuPathDB (http://eupathdb.org) resources include 11 databases supporting eukaryotic pathogen genomic and functional genomic data, isolate data and phylogenomics. EuPathDB resources are built using the same infrastructure and provide a sophisticated search strategy system enabling complex interrogations of underlying data. Recent advances in EuPathDB resources include the design and implementation of a new data loading workflow, a new database supporting Piroplasmida (i.e. Babesia and Theileria), the addition of large amounts of new data and data types and the incorporation of new analysis tools. New data include genome sequences and annotation, strand-specific RNA-seq data, splice junction predictions (based on RNA-seq), phosphoproteomic data, high-throughput phenotyping data, single nucleotide polymorphism data based on high-throughput sequencing (HTS) and expression quantitative trait loci data. New analysis tools enable users to search for DNA motifs and define genes based on their genomic colocation, view results from searches graphically (i.e. genes mapped to chromosomes or isolates displayed on a map) and analyze data from columns in result tables (word cloud and histogram summaries of column content). The manuscript herein describes updates to EuPathDB since the previous report published in NAR in 2010.
Collapse
Affiliation(s)
- Cristina Aurrecoechea
- Center for Tropical & Emerging Global Diseases, University of Georgia, Athens, GA 30602, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
869
|
Xuan J, Yu Y, Qing T, Guo L, Shi L. Next-generation sequencing in the clinic: promises and challenges. Cancer Lett 2012; 340:284-95. [PMID: 23174106 DOI: 10.1016/j.canlet.2012.11.025] [Citation(s) in RCA: 215] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2012] [Revised: 11/13/2012] [Accepted: 11/13/2012] [Indexed: 02/06/2023]
Abstract
The advent of next generation sequencing (NGS) technologies has revolutionized the field of genomics, enabling fast and cost-effective generation of genome-scale sequence data with exquisite resolution and accuracy. Over the past years, rapid technological advances led by academic institutions and companies have continued to broaden NGS applications from research to the clinic. A recent crop of discoveries have highlighted the medical impact of NGS technologies on Mendelian and complex diseases, particularly cancer. However, the ever-increasing pace of NGS adoption presents enormous challenges in terms of data processing, storage, management and interpretation as well as sequencing quality control, which hinder the translation from sequence data into clinical practice. In this review, we first summarize the technical characteristics and performance of current NGS platforms. We further highlight advances in the applications of NGS technologies towards the development of clinical diagnostics and therapeutics. Common issues in NGS workflows are also discussed to guide the selection of NGS platforms and pipelines for specific research purposes.
Collapse
Affiliation(s)
- Jiekun Xuan
- School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, China; National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA
| | | | | | | | | |
Collapse
|
870
|
Yang X, Todd JA, Clayton D, Wallace C. Extra-binomial variation approach for analysis of pooled DNA sequencing data. Bioinformatics 2012; 28:2898-904. [PMID: 22976083 PMCID: PMC3496343 DOI: 10.1093/bioinformatics/bts553] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2011] [Revised: 08/03/2012] [Accepted: 09/05/2012] [Indexed: 11/25/2022] Open
Abstract
MOTIVATION The invention of next-generation sequencing technology has made it possible to study the rare variants that are more likely to pinpoint causal disease genes. To make such experiments financially viable, DNA samples from several subjects are often pooled before sequencing. This induces large between-pool variation which, together with other sources of experimental error, creates over-dispersed data. Statistical analysis of pooled sequencing data needs to appropriately model this additional variance to avoid inflating the false-positive rate. RESULTS We propose a new statistical method based on an extra-binomial model to address the over-dispersion and apply it to pooled case-control data. We demonstrate that our model provides a better fit to the data than either a standard binomial model or a traditional extra-binomial model proposed by Williams and can analyse both rare and common variants with lower or more variable pool depths compared to the other methods. AVAILABILITY Package 'extraBinomial' is on http://cran.r-project.org/. CONTACT chris.wallace@cimr.cam.ac.uk. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics Online.
Collapse
Affiliation(s)
- Xin Yang
- Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Wellcome Trust/MRC Building, Addenbrooke's Hospital, Cambridge CB2 0XY, UK
| | | | | | | |
Collapse
|
871
|
Liu X, Wang J, Chen L. Whole-exome sequencing reveals recurrent somatic mutation networks in cancer. Cancer Lett 2012; 340:270-6. [PMID: 23153794 DOI: 10.1016/j.canlet.2012.11.002] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2012] [Revised: 10/30/2012] [Accepted: 11/02/2012] [Indexed: 11/26/2022]
Abstract
The second-generation sequencing technologies have been extensively used to reveal the mechanism of tumorigenesis and find critical genes in cancer progression that can be potential targets of clinic treatment. Exome is a part of genome formed by exons which are the protein-coding portions of genes. The whole-exome sequencing information can reflect the mutations of the protein-coding region in the genome and depict the causal relationship between the mutations and phenotypes. Now, many network-based methods have been developed to identify cancer driver modules or pathways, which not only provide new insights into molecular mechanism of disease progression at network level but also can avoid low coverage or lowly recurrent on disease samples in contrast to individual driver genes. In this review, we focus on the recent advances on network-based methods for identifying cancer driver modules or pathways, including methods of whole-exome sequencing, somatic mutation detection, driver mutation identification, and mutation network reconstruction.
Collapse
Affiliation(s)
- Xiaoping Liu
- Key Laboratory of Systems Biology, SIBS-Novo Nordisk Translational Research Centre for PreDiabetes, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | | | | |
Collapse
|
872
|
Systems genetics in "-omics" era: current and future development. Theory Biosci 2012; 132:1-16. [PMID: 23138757 DOI: 10.1007/s12064-012-0168-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2012] [Accepted: 10/25/2012] [Indexed: 02/06/2023]
Abstract
The systems genetics is an emerging discipline that integrates high-throughput expression profiling technology and systems biology approaches for revealing the molecular mechanism of complex traits, and will improve our understanding of gene functions in the biochemical pathway and genetic interactions between biological molecules. With the rapid advances of microarray analysis technologies, bioinformatics is extensively used in the studies of gene functions, SNP-SNP genetic interactions, LD block-block interactions, miRNA-mRNA interactions, DNA-protein interactions, protein-protein interactions, and functional mapping for LD blocks. Based on bioinformatics panel, which can integrate "-omics" datasets to extract systems knowledge and useful information for explaining the molecular mechanism of complex traits, systems genetics is all about to enhance our understanding of biological processes. Systems biology has provided systems level recognition of various biological phenomena, and constructed the scientific background for the development of systems genetics. In addition, the next-generation sequencing technology and post-genome wide association studies empower the discovery of new gene and rare variants. The integration of different strategies will help to propose novel hypothesis and perfect the theoretical framework of systems genetics, which will make contribution to the future development of systems genetics, and open up a whole new area of genetics.
Collapse
|
873
|
Gullapalli RR, Desai KV, Santana-Santos L, Kant JA, Becich MJ. Next generation sequencing in clinical medicine: Challenges and lessons for pathology and biomedical informatics. J Pathol Inform 2012; 3:40. [PMID: 23248761 PMCID: PMC3519097 DOI: 10.4103/2153-3539.103013] [Citation(s) in RCA: 95] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2012] [Accepted: 07/19/2012] [Indexed: 11/25/2022] Open
Abstract
The Human Genome Project (HGP) provided the initial draft of mankind's DNA sequence in 2001. The HGP was produced by 23 collaborating laboratories using Sanger sequencing of mapped regions as well as shotgun sequencing techniques in a process that occupied 13 years at a cost of ~$3 billion. Today, Next Generation Sequencing (NGS) techniques represent the next phase in the evolution of DNA sequencing technology at dramatically reduced cost compared to traditional Sanger sequencing. A single laboratory today can sequence the entire human genome in a few days for a few thousand dollars in reagents and staff time. Routine whole exome or even whole genome sequencing of clinical patients is well within the realm of affordability for many academic institutions across the country. This paper reviews current sequencing technology methods and upcoming advancements in sequencing technology as well as challenges associated with data generation, data manipulation and data storage. Implementation of routine NGS data in cancer genomics is discussed along with potential pitfalls in the interpretation of the NGS data. The overarching importance of bioinformatics in the clinical implementation of NGS is emphasized.[7] We also review the issue of physician education which also is an important consideration for the successful implementation of NGS in the clinical workplace. NGS technologies represent a golden opportunity for the next generation of pathologists to be at the leading edge of the personalized medicine approaches coming our way. Often under-emphasized issues of data access and control as well as potential ethical implications of whole genome NGS sequencing are also discussed. Despite some challenges, it's hard not to be optimistic about the future of personalized genome sequencing and its potential impact on patient care and the advancement of knowledge of human biology and disease in the near future.
Collapse
Affiliation(s)
- Rama R Gullapalli
- Department of Pathology, University of Pittsburgh Medical Centre, A701, Scaife Hall, 3550 Terrace Street, Pittsburgh, PA
| | | | | | | | | |
Collapse
|
874
|
Wilm A, Aw PPK, Bertrand D, Yeo GHT, Ong SH, Wong CH, Khor CC, Petric R, Hibberd ML, Nagarajan N. LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets. Nucleic Acids Res 2012; 40:11189-201. [PMID: 23066108 PMCID: PMC3526318 DOI: 10.1093/nar/gks918] [Citation(s) in RCA: 962] [Impact Index Per Article: 74.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
The study of cell-population heterogeneity in a range of biological systems, from viruses to bacterial isolates to tumor samples, has been transformed by recent advances in sequencing throughput. While the high-coverage afforded can be used, in principle, to identify very rare variants in a population, existing ad hoc approaches frequently fail to distinguish true variants from sequencing errors. We report a method (LoFreq) that models sequencing run-specific error rates to accurately call variants occurring in <0.05% of a population. Using simulated and real datasets (viral, bacterial and human), we show that LoFreq has near-perfect specificity, with significantly improved sensitivity compared with existing methods and can efficiently analyze deep Illumina sequencing datasets without resorting to approximations or heuristics. We also present experimental validation for LoFreq on two different platforms (Fluidigm and Sequenom) and its application to call rare somatic variants from exome sequencing datasets for gastric cancer. Source code and executables for LoFreq are freely available at http://sourceforge.net/projects/lofreq/.
Collapse
Affiliation(s)
- Andreas Wilm
- Genome Institute of Singapore, 60 Biopolis Street, Genome, #02-01, Singapore 138672, Singapore
| | | | | | | | | | | | | | | | | | | |
Collapse
|
875
|
Welch JS, Ley TJ, Link DC, Miller CA, Larson DE, Koboldt DC, Wartman LD, Lamprecht TL, Liu F, Xia J, Kandoth C, Fulton RS, McLellan MD, Dooling DJ, Wallis JW, Chen K, Harris CC, Schmidt HK, Kalicki-Veizer JM, Lu C, Zhang Q, Lin L, O'Laughlin MD, McMichael JF, Delehaunty KD, Fulton LA, Magrini VJ, McGrath SD, Demeter RT, Vickery TL, Hundal J, Cook LL, Swift GW, Reed JP, Alldredge PA, Wylie TN, Walker JR, Watson MA, Heath SE, Shannon WD, Varghese N, Nagarajan R, Payton JE, Baty JD, Kulkarni S, Klco JM, Tomasson MH, Westervelt P, Walter MJ, Graubert TA, DiPersio JF, Ding L, Mardis ER, Wilson RK. The origin and evolution of mutations in acute myeloid leukemia. Cell 2012; 150:264-78. [PMID: 22817890 DOI: 10.1016/j.cell.2012.06.023] [Citation(s) in RCA: 1248] [Impact Index Per Article: 96.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2011] [Revised: 04/27/2012] [Accepted: 06/24/2012] [Indexed: 10/28/2022]
Abstract
Most mutations in cancer genomes are thought to be acquired after the initiating event, which may cause genomic instability and drive clonal evolution. However, for acute myeloid leukemia (AML), normal karyotypes are common, and genomic instability is unusual. To better understand clonal evolution in AML, we sequenced the genomes of M3-AML samples with a known initiating event (PML-RARA) versus the genomes of normal karyotype M1-AML samples and the exomes of hematopoietic stem/progenitor cells (HSPCs) from healthy people. Collectively, the data suggest that most of the mutations found in AML genomes are actually random events that occurred in HSPCs before they acquired the initiating mutation; the mutational history of that cell is "captured" as the clone expands. In many cases, only one or two additional, cooperating mutations are needed to generate the malignant founding clone. Cells from the founding clone can acquire additional cooperating mutations, yielding subclones that can contribute to disease progression and/or relapse.
Collapse
Affiliation(s)
- John S Welch
- Department of Medicine, Washington University, St. Louis, MO 63110, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
876
|
Sifrim A, Van Houdt JKJ, Tranchevent LC, Nowakowska B, Sakai R, Pavlopoulos GA, Devriendt K, Vermeesch JR, Moreau Y, Aerts J. Annotate-it: a Swiss-knife approach to annotation, analysis and interpretation of single nucleotide variation in human disease. Genome Med 2012; 4:73. [PMID: 23013645 PMCID: PMC3580443 DOI: 10.1186/gm374] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2012] [Revised: 09/14/2012] [Accepted: 09/26/2012] [Indexed: 12/18/2022] Open
Abstract
The increasing size and complexity of exome/genome sequencing data requires new tools for clinical geneticists to discover disease-causing variants. Bottlenecks in identifying the causative variation include poor cross-sample querying, constantly changing functional annotation and not considering existing knowledge concerning the phenotype. We describe a methodology that facilitates exploration of patient sequencing data towards identification of causal variants under different genetic hypotheses. Annotate-it facilitates handling, analysis and interpretation of high-throughput single nucleotide variant data. We demonstrate our strategy using three case studies. Annotate-it is freely available and test data are accessible to all users at http://www.annotate-it.org.
Collapse
Affiliation(s)
- Alejandro Sifrim
- KU Leuven, Department of Electrical Engineering-ESAT, SCD-SISTA, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
- IBBT Future Health Department, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
| | - Jeroen KJ Van Houdt
- KU Leuven, Centre for Human Genetics, University Hospital Gasthuisberg, Herestraat 49, 3000 Leuven, Belgium
| | - Leon-Charles Tranchevent
- KU Leuven, Department of Electrical Engineering-ESAT, SCD-SISTA, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
- IBBT Future Health Department, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
| | - Beata Nowakowska
- KU Leuven, Centre for Human Genetics, University Hospital Gasthuisberg, Herestraat 49, 3000 Leuven, Belgium
| | - Ryo Sakai
- KU Leuven, Department of Electrical Engineering-ESAT, SCD-SISTA, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
- IBBT Future Health Department, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
| | - Georgios A Pavlopoulos
- KU Leuven, Department of Electrical Engineering-ESAT, SCD-SISTA, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
- IBBT Future Health Department, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
| | - Koen Devriendt
- KU Leuven, Centre for Human Genetics, University Hospital Gasthuisberg, Herestraat 49, 3000 Leuven, Belgium
| | - Joris R Vermeesch
- KU Leuven, Centre for Human Genetics, University Hospital Gasthuisberg, Herestraat 49, 3000 Leuven, Belgium
| | - Yves Moreau
- KU Leuven, Department of Electrical Engineering-ESAT, SCD-SISTA, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
- IBBT Future Health Department, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
| | - Jan Aerts
- KU Leuven, Department of Electrical Engineering-ESAT, SCD-SISTA, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
- IBBT Future Health Department, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
| |
Collapse
|
877
|
Raineri E, Ferretti L, Esteve-Codina A, Nevado B, Heath S, Pérez-Enciso M. SNP calling by sequencing pooled samples. BMC Bioinformatics 2012; 13:239. [PMID: 22992255 PMCID: PMC3475117 DOI: 10.1186/1471-2105-13-239] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2011] [Accepted: 09/06/2012] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Performing high throughput sequencing on samples pooled from different individuals is a strategy to characterize genetic variability at a small fraction of the cost required for individual sequencing. In certain circumstances some variability estimators have even lower variance than those obtained with individual sequencing. SNP calling and estimating the frequency of the minor allele from pooled samples, though, is a subtle exercise for at least three reasons. First, sequencing errors may have a much larger relevance than in individual SNP calling: while their impact in individual sequencing can be reduced by setting a restriction on a minimum number of reads per allele, this would have a strong and undesired effect in pools because it is unlikely that alleles at low frequency in the pool will be read many times. Second, the prior allele frequency for heterozygous sites in individuals is usually 0.5 (assuming one is not analyzing sequences coming from, e.g. cancer tissues), but this is not true in pools: in fact, under the standard neutral model, singletons (i.e. alleles of minimum frequency) are the most common class of variants because P(f) ∝ 1/f and they occur more often as the sample size increases. Third, an allele appearing only once in the reads from a pool does not necessarily correspond to a singleton in the set of individuals making up the pool, and vice versa, there can be more than one read - or, more likely, none - from a true singleton. RESULTS To improve upon existing theory and software packages, we have developed a Bayesian approach for minor allele frequency (MAF) computation and SNP calling in pools (and implemented it in a program called snape): the approach takes into account sequencing errors and allows users to choose different priors. We also set up a pipeline which can simulate the coalescence process giving rise to the SNPs, the pooling procedure and the sequencing. We used it to compare the performance of snape to that of other packages. CONCLUSIONS We present a software which helps in calling SNPs in pooled samples: it has good power while retaining a low false discovery rate (FDR). The method also provides the posterior probability that a SNP is segregating and the full posterior distribution of f for every SNP. In order to test the behaviour of our software, we generated (through simulated coalescence) artificial genomes and computed the effect of a pooled sequencing protocol, followed by SNP calling. In this setting, snape has better power and False Discovery Rate (FDR) than the comparable packages samtools, PoPoolation, Varscan : for N = 50 chromosomes, snape has power ≈ 35%and FDR ≈ 2.5%. snape is available at http://code.google.com/p/snape-pooled/ (source code and precompiled binaries).
Collapse
Affiliation(s)
- Emanuele Raineri
- Centro Nacional de Análisis Genómico, Parc Científic de Barcelona, Barcelona, 08028, Spain.
| | | | | | | | | | | |
Collapse
|
878
|
High throughput sequencing approaches to mutation discovery in the mouse. Mamm Genome 2012; 23:499-513. [PMID: 22991087 DOI: 10.1007/s00335-012-9424-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2012] [Accepted: 07/19/2012] [Indexed: 12/19/2022]
Abstract
Phenotype-driven approaches in mice are powerful strategies for the discovery of genes and gene functions and for unravelling complex biological mechanisms. Traditional methods for mutation discovery are reliable and robust, but they can also be laborious and time consuming. Recently, high-throughput sequencing (HTS) technologies have revolutionised the process of forward genetics in mice by paving the way to rapid mutation discovery. However, successful application of HTS for mutation discovery relies heavily on the sequencing approach employed and strategies for data analysis. Here we review current HTS applications and resources for mutation discovery and provide an overview of the practical considerations for HTS implementation and data analysis.
Collapse
|
879
|
Yang X, Charlebois P, Gnerre S, Coole MG, Lennon NJ, Levin JZ, Qu J, Ryan EM, Zody MC, Henn MR. De novo assembly of highly diverse viral populations. BMC Genomics 2012; 13:475. [PMID: 22974120 PMCID: PMC3469330 DOI: 10.1186/1471-2164-13-475] [Citation(s) in RCA: 152] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2012] [Accepted: 09/06/2012] [Indexed: 01/14/2023] Open
Abstract
Background Extensive genetic diversity in viral populations within infected hosts and the divergence of variants from existing reference genomes impede the analysis of deep viral sequencing data. A de novo population consensus assembly is valuable both as a single linear representation of the population and as a backbone on which intra-host variants can be accurately mapped. The availability of consensus assemblies and robustly mapped variants are crucial to the genetic study of viral disease progression, transmission dynamics, and viral evolution. Existing de novo assembly techniques fail to robustly assemble ultra-deep sequence data from genetically heterogeneous populations such as viruses into full-length genomes due to the presence of extensive genetic variability, contaminants, and variable sequence coverage. Results We present VICUNA, a de novo assembly algorithm suitable for generating consensus assemblies from genetically heterogeneous populations. We demonstrate its effectiveness on Dengue, Human Immunodeficiency and West Nile viral populations, representing a range of intra-host diversity. Compared to state-of-the-art assemblers designed for haploid or diploid systems, VICUNA recovers full-length consensus and captures insertion/deletion polymorphisms in diverse samples. Final assemblies maintain a high base calling accuracy. VICUNA program is publicly available at:
http://www.broadinstitute.org/scientific-community/science/projects/viral-genomics/ viral-genomics-analysis-software. Conclusions We developed VICUNA, a publicly available software tool, that enables consensus assembly of ultra-deep sequence derived from diverse viral populations. While VICUNA was developed for the analysis of viral populations, its application to other heterogeneous sequence data sets such as metagenomic or tumor cell population samples may prove beneficial in these fields of research.
Collapse
Affiliation(s)
- Xiao Yang
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
880
|
High depth, whole-genome sequencing of cholera isolates from Haiti and the Dominican Republic. BMC Genomics 2012; 13:468. [PMID: 22963323 PMCID: PMC3473251 DOI: 10.1186/1471-2164-13-468] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2012] [Accepted: 08/28/2012] [Indexed: 11/16/2022] Open
Abstract
Background Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x); four of the seven isolates were previously sequenced. Results Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961), 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways. Conclusions Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.
Collapse
|
881
|
Forster M, Forster P, Elsharawy A, Hemmrich G, Kreck B, Wittig M, Thomsen I, Stade B, Barann M, Ellinghaus D, Petersen BS, May S, Melum E, Schilhabel MB, Keller A, Schreiber S, Rosenstiel P, Franke A. From next-generation sequencing alignments to accurate comparison and validation of single-nucleotide variants: the pibase software. Nucleic Acids Res 2012; 41:e16. [PMID: 22965131 PMCID: PMC3592472 DOI: 10.1093/nar/gks836] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Scientists working with single-nucleotide variants (SNVs), inferred by next-generation sequencing software, often need further information regarding true variants, artifacts and sequence coverage gaps. In clinical diagnostics, e.g. SNVs must usually be validated by visual inspection or several independent SNV-callers. We here demonstrate that 0.5-60% of relevant SNVs might not be detected due to coverage gaps, or might be misidentified. Even low error rates can overwhelm the true biological signal, especially in clinical diagnostics, in research comparing healthy with affected cells, in archaeogenetic dating or in forensics. For these reasons, we have developed a package called pibase, which is applicable to diploid and haploid genome, exome or targeted enrichment data. pibase extracts details on nucleotides from alignment files at user-specified coordinates and identifies reproducible genotypes, if present. In test cases pibase identifies genotypes at 99.98% specificity, 10-fold better than other tools. pibase also provides pair-wise comparisons between healthy and affected cells using nucleotide signals (10-fold more accurately than a genotype-based approach, as we show in our case study of monozygotic twins). This comparison tool also solves the problem of detecting allelic imbalance within heterozygous SNVs in copy number variation loci, or in heterogeneous tumor sequences.
Collapse
Affiliation(s)
- Michael Forster
- Institute of Clinical Molecular Biology, Christian-Albrechts-University Kiel, D-24105 Kiel, Germany.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
882
|
Ma S, Bao JYJ, Kwan PS, Chan YP, Tong CM, Fu L, Zhang N, Tong AHY, Qin YR, Tsao SW, Chan KW, Lok S, Guan XY. Identification of PTK6, via RNA sequencing analysis, as a suppressor of esophageal squamous cell carcinoma. Gastroenterology 2012; 143:675-686.e12. [PMID: 22705009 DOI: 10.1053/j.gastro.2012.06.007] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/24/2011] [Revised: 05/31/2012] [Accepted: 06/04/2012] [Indexed: 12/20/2022]
Abstract
BACKGROUND & AIMS Esophageal squamous cell carcinoma (ESCC) is the most commonly observed histologic subtype of esophageal cancer. ESCC is believed to develop via accumulation of numerous genetic alterations, including inactivation of tumor suppressor genes and activation of oncogenes. We searched for transcripts that were altered in human ESCC samples compared with nontumor tissues. METHODS We performed integrative transcriptome sequencing (RNA-Seq) analysis using ESCC samples from 3 patients and adjacent nontumor tissues to identify transcripts that were altered in ESCC tissue. We performed molecular and functional studies of the transcripts identified and investigated the mechanisms of alteration. RESULTS We identified protein tyrosine kinase 6 (PTK6) as a transcript that was significantly down-regulated in ESCC tissues and cell lines compared with nontumor tissues or immortalized normal esophageal cell lines. The promoter of the PTK6 gene was inactivated in ESCC tissues at least in part via hypermethylation and histone deacetylation. Knockdown of PTK6 in KYSE30 ESCC cells using small hairpin RNAs increased their ability to form foci, migrate, and invade extracellular matrix in culture and form tumors in nude mice. Overexpression of PTK6 in these cells reduced their proliferation in culture and tumor formation in mice. PTK6 reduced phosphorylation of Akt and glycogen synthase kinase (GSK)3β, leading to activation of β-catenin. CONCLUSIONS PTK6 was identified as a transcript that is down-regulated in human ESCC tissues via epigenetic modification at the PTK6 locus. Its product appears to regulate cell proliferation by reducing phosphorylation of Akt and GSK3β, leading to activation of β-catenin. Reduced levels of PTK6 promote growth of xenograft tumors in mice; it might be developed as a marker of ESCC.
Collapse
Affiliation(s)
- Stephanie Ma
- Department of Pathology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Jessie Y J Bao
- Genome Research Centre, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Pak Shing Kwan
- Department of Pathology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Yuen Piu Chan
- Department of Pathology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Carol M Tong
- Department of Pathology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Li Fu
- Department of Clinical Oncology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Na Zhang
- Genome Research Centre, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Amy H Y Tong
- Genome Research Centre, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Yan-Ru Qin
- Department of Clinical Oncology, First Affiliated Hospital, Zhengzhou University, Zhengzhou, China
| | - Sai Wah Tsao
- Department of Anatomy, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Kwok Wah Chan
- Department of Pathology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Si Lok
- Genome Research Centre, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China.
| | - Xin-Yuan Guan
- Department of Clinical Oncology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China.
| |
Collapse
|
883
|
Spans L, Atak ZK, Van Nieuwerburgh F, Deforce D, Lerut E, Aerts S, Claessens F. Variations in the exome of the LNCaP prostate cancer cell line. Prostate 2012; 72:1317-27. [PMID: 22213130 DOI: 10.1002/pros.22480] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/13/2011] [Accepted: 12/04/2011] [Indexed: 11/09/2022]
Abstract
BACKGROUND The LNCaP cell line is widely used as a model for prostate cancer. However, information on protein-changing mutations, genetic heterogeneity and genetic (in)stability is largely lacking for these cells. METHODS Next-generation sequencing of the LNCaP exome revealed many single nucleotide variants (SNVs). To help identify the mutations that are most likely drivers of the oncogenic process, we developed an in silico protocol, which can be adapted for other exome analyses. RESULTS We detected 1,802 non-synonymous SNVs and 218 small insertions and deletions in the LNCaP exome. We confirm the known mutations in the androgen receptor and the PTEN gene, but most other mutations remained undescribed until now. The presence of 38 out of 42 SNVs was confirmed in monoclonal as well as in polyclonal LNCaP derivatives. Moreover, most variants were also detectable in LNCaP mRNA. CONCLUSIONS We provide an extensive database of genetic variations in the protein-coding part of the genome of LNCaP cells, which should be taken into consideration when using LNCaP cells or its derivatives as models for prostate cancer. From the analysis of several LNCaP-derived cultures and clones, we can confirm that the cell line is heterozygous for a large number of variants and that both the variant and the wild-type allele can be simultaneously expressed as mRNA. The fact that the SNVs in the E-cadherin, CDK4, Notch1, and PlexinB1 genes are absent in some of the subclones strongly indicates a degree of genetic instability.
Collapse
Affiliation(s)
- Lien Spans
- Molecular Endocrinology Laboratory, Department of Molecular Cell Biology, University of Leuven, 3000 Leuven, Belgium
| | | | | | | | | | | | | |
Collapse
|
884
|
Vidal RO, do Nascimento LC, Mondego JMC, Pereira GAG, Carazzolle MF. Identification of SNPs in RNA-seq data of two cultivars of Glycine max (soybean) differing in drought resistance. Genet Mol Biol 2012; 35:331-4. [PMID: 22802718 PMCID: PMC3392885 DOI: 10.1590/s1415-47572012000200014] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
The legume Glycine max (soybean) plays an important economic role in the international commodities market, with a world production of almost 260 million tons for the 2009/2010 harvest. The increase in drought events in the last decade has caused production losses in recent harvests. This fact compels us to understand the drought tolerance mechanisms in soybean, taking into account its variability among commercial and developing cultivars. In order to identify single nucleotide polymorphisms (SNPs) in genes up-regulated during drought stress, we evaluated suppression subtractive libraries (SSH) from two contrasting cultivars upon water deprivation: sensitive (BR 16) and tolerant (Embrapa 48). A total of 2,222 soybean genes were up-regulated in both cultivars. Our method identified more than 6,000 SNPs in tolerant and sensitive Brazilian cultivars in those drought stress related genes. Among these SNPs, 165 (in 127 genes) are positioned at soybean chromosome ends, including transcription factors (MYB, WRKY) related to tolerance to abiotic stress.
Collapse
Affiliation(s)
- Ramon Oliveira Vidal
- Laboratório de Genômica e Expressão, Universidade Estadual de Campinas, Campinas, SP, Brazil
| | | | | | | | | |
Collapse
|
885
|
Huang J, Deng Q, Wang Q, Li KY, Dai JH, Li N, Zhu ZD, Zhou B, Liu XY, Liu RF, Fei QL, Chen H, Cai B, Zhou B, Xiao HS, Qin LX, Han ZG. Exome sequencing of hepatitis B virus-associated hepatocellular carcinoma. Nat Genet 2012; 44:1117-21. [PMID: 22922871 DOI: 10.1038/ng.2391] [Citation(s) in RCA: 307] [Impact Index Per Article: 23.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2011] [Accepted: 08/01/2012] [Indexed: 02/07/2023]
Abstract
Hepatocellular carcinoma (HCC) is one of the most common cancers worldwide and shows a propensity to metastasize and infiltrate adjacent and more distant tissues. HCC is associated with multiple risk factors, including hepatitis B virus (HBV) infection, which is especially prevalent in China. Here, we used exome sequencing to identify somatic mutations in ten HBV-positive individuals with HCC with portal vein tumor thromboses (PVTTs), intrahepatic metastases. Both C:G>A:T and T:A>A:T transversions were frequently found among the 331 non-silent mutations. Notably, ARID1A, which encodes a component of the SWI/SNF chromatin remodeling complex, was mutated in 14 of 110 (13%) HBV-associated HCC specimens. We used RNA interference to assess the roles of 91 of the confirmed mutated genes in cellular survival. The results suggest that seven of these genes, including VCAM1 and CDK14, may confer growth and infiltration capacity to HCC cells. This study provides a view of the landscape of somatic mutations that may be implicated in advanced HCC.
Collapse
Affiliation(s)
- Jian Huang
- Human Genome Center of Rui-Jin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
886
|
Zhou B. An empirical Bayes mixture model for SNP detection in pooled sequencing data. Bioinformatics 2012; 28:2569-75. [PMID: 22914221 DOI: 10.1093/bioinformatics/bts501] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Detecting single-nucleotide polymorphism (SNP) in pooled sequencing data is more challenging than in individual sequencing because of sampling variations across pools. To effectively differentiate SNP signal from sequencing error, appropriate estimation of the sequencing error is necessary. In this article, we propose an empirical Bayes mixture (EBM) model for SNP detection and allele frequency estimation in pooled sequencing data. RESULTS The proposed model reliably learns the error distribution by pooling information across pools and genomic positions. In addition, the proposed EBM model builds in characteristics unique to the pooled sequencing data, boosting the sensitivity of SNP detection. For large-scale inference in SNP detection, the EBM model provides a flexible and robust way for estimation and control of local false discovery rate. We demonstrate the performance of the proposed method through simulation studies and real data application. AVAILABILITY Implementation of this method is available at https://sites.google.com/site/zhouby98.
Collapse
Affiliation(s)
- Baiyu Zhou
- Department of Epidemiology & Population Health, Albert Einstein College of Medicine, Bronx, NY 10461, USA.
| |
Collapse
|
887
|
Aguilar C, Escalante A, Flores N, de Anda R, Riveros-McKay F, Gosset G, Morett E, Bolívar F. Genetic changes during a laboratory adaptive evolution process that allowed fast growth in glucose to an Escherichia coli strain lacking the major glucose transport system. BMC Genomics 2012; 13:385. [PMID: 22884033 PMCID: PMC3469383 DOI: 10.1186/1471-2164-13-385] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2012] [Accepted: 08/02/2012] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Escherichia coli strains lacking the phosphoenolpyruvate: carbohydrate phosphotransferase system (PTS), which is the major bacterial component involved in glucose transport and its phosphorylation, accumulate high amounts of phosphoenolpyruvate that can be diverted to the synthesis of commercially relevant products. However, these strains grow slowly in glucose as sole carbon source due to its inefficient transport and metabolism. Strain PB12, with 400% increased growth rate, was isolated after a 120 hours adaptive laboratory evolution process for the selection of faster growing derivatives in glucose. Analysis of the genetic changes that occurred in the PB12 strain that lacks PTS will allow a better understanding of the basis of its growth adaptation and, therefore, in the design of improved metabolic engineering strategies for enhancing carbon diversion into the aromatic pathways. RESULTS Whole genome analyses using two different sequencing methodologies: the Roche NimbleGen Inc. comparative genome sequencing technique, and high throughput sequencing with Illumina Inc. GAIIx, allowed the identification of the genetic changes that occurred in the PB12 strain. Both methods detected 23 non-synonymous and 22 synonymous point mutations. Several non-synonymous mutations mapped in regulatory genes (arcB, barA, rpoD, rna) and in other putative regulatory loci (yjjU, rssA and ypdA). In addition, a chromosomal deletion of 10,328 bp was detected that removed 12 genes, among them, the rppH, mutH and galR genes. Characterization of some of these mutated and deleted genes with their functions and possible functions, are presented. CONCLUSIONS The deletion of the contiguous rppH, mutH and galR genes that occurred simultaneously, is apparently the main reason for the faster growth of the evolved PB12 strain. In support of this interpretation is the fact that inactivation of the rppH gene in the parental PB11 strain substantially increased its growth rate, very likely by increasing glycolytic mRNA genes stability. Furthermore, galR inactivation allowed glucose transport by GalP into the cell. The deletion of mutH in an already stressed strain that lacks PTS is apparently responsible for the very high mutation rate observed.
Collapse
Affiliation(s)
- César Aguilar
- Departamento de Ingeniería Celular y Biocatálisis, Instituto de Biotecnología, Universidad Nacional Autónoma de México (UNAM), Cuernavaca, Morelos 62210, México
| | | | | | | | | | | | | | | |
Collapse
|
888
|
Thumma BR, Sharma N, Southerton SG. Transcriptome sequencing of Eucalyptus camaldulensis seedlings subjected to water stress reveals functional single nucleotide polymorphisms and genes under selection. BMC Genomics 2012; 13:364. [PMID: 22853646 PMCID: PMC3472208 DOI: 10.1186/1471-2164-13-364] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2012] [Accepted: 07/20/2012] [Indexed: 12/03/2022] Open
Abstract
Background Water stress limits plant survival and production in many parts of the world. Identification of genes and alleles responding to water stress conditions is important in breeding plants better adapted to drought. Currently there are no studies examining the transcriptome wide gene and allelic expression patterns under water stress conditions. We used RNA sequencing (RNA-seq) to identify the candidate genes and alleles and to explore the evolutionary signatures of selection. Results We studied the effect of water stress on gene expression in Eucalyptus camaldulensis seedlings derived from three natural populations. We used reference-guided transcriptome mapping to study gene expression. Several genes showed differential expression between control and stress conditions. Gene ontology (GO) enrichment tests revealed up-regulation of 140 stress-related gene categories and down-regulation of 35 metabolic and cell wall organisation gene categories. More than 190,000 single nucleotide polymorphisms (SNPs) were detected and 2737 of these showed differential allelic expression. Allelic expression of 52% of these variants was correlated with differential gene expression. Signatures of selection patterns were studied by estimating the proportion of nonsynonymous to synonymous substitution rates (Ka/Ks). The average Ka/Ks ratio among the 13,719 genes was 0.39 indicating that most of the genes are under purifying selection. Among the positively selected genes (Ka/Ks > 1.5) apoptosis and cell death categories were enriched. Of the 287 positively selected genes, ninety genes showed differential expression and 27 SNPs from 17 positively selected genes showed differential allelic expression between treatments. Conclusions Correlation of allelic expression of several SNPs with total gene expression indicates that these variants may be the cis-acting variants or in linkage disequilibrium with such variants. Enrichment of apoptosis and cell death gene categories among the positively selected genes reveals the past selection pressures experienced by the populations used in this study.
Collapse
Affiliation(s)
- Bala R Thumma
- CSIRO Plant Industry, Clunies Ross Street, Acton, ACT, Australia.
| | | | | |
Collapse
|
889
|
Fischer M, Snajder R, Pabinger S, Dander A, Schossig A, Zschocke J, Trajanoski Z, Stocker G. SIMPLEX: cloud-enabled pipeline for the comprehensive analysis of exome sequencing data. PLoS One 2012; 7:e41948. [PMID: 22870267 PMCID: PMC3411592 DOI: 10.1371/journal.pone.0041948] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2012] [Accepted: 06/28/2012] [Indexed: 01/24/2023] Open
Abstract
In recent studies, exome sequencing has proven to be a successful screening tool for the identification of candidate genes causing rare genetic diseases. Although underlying targeted sequencing methods are well established, necessary data handling and focused, structured analysis still remain demanding tasks. Here, we present a cloud-enabled autonomous analysis pipeline, which comprises the complete exome analysis workflow. The pipeline combines several in-house developed and published applications to perform the following steps: (a) initial quality control, (b) intelligent data filtering and pre-processing, (c) sequence alignment to a reference genome, (d) SNP and DIP detection, (e) functional annotation of variants using different approaches, and (f) detailed report generation during various stages of the workflow. The pipeline connects the selected analysis steps, exposes all available parameters for customized usage, performs required data handling, and distributes computationally expensive tasks either on a dedicated high-performance computing infrastructure or on the Amazon cloud environment (EC2). The presented application has already been used in several research projects including studies to elucidate the role of rare genetic diseases. The pipeline is continuously tested and is publicly available under the GPL as a VirtualBox or Cloud image at http://simplex.i-med.ac.at; additional supplementary data is provided at http://www.icbi.at/exome.
Collapse
Affiliation(s)
- Maria Fischer
- Division for Bioinformatics, Biocenter, Innsbruck Medical University, Innsbruck, Austria
| | - Rene Snajder
- Division for Bioinformatics, Biocenter, Innsbruck Medical University, Innsbruck, Austria
- Oncotyrol, Center for Personalized Cancer Medicine, Innsbruck, Austria
| | - Stephan Pabinger
- Division for Bioinformatics, Biocenter, Innsbruck Medical University, Innsbruck, Austria
| | - Andreas Dander
- Division for Bioinformatics, Biocenter, Innsbruck Medical University, Innsbruck, Austria
- Oncotyrol, Center for Personalized Cancer Medicine, Innsbruck, Austria
| | - Anna Schossig
- Division of Human Genetics, Biocenter, Innsbruck Medical University, Innsbruck, Austria
| | - Johannes Zschocke
- Division of Human Genetics, Biocenter, Innsbruck Medical University, Innsbruck, Austria
| | - Zlatko Trajanoski
- Division for Bioinformatics, Biocenter, Innsbruck Medical University, Innsbruck, Austria
| | - Gernot Stocker
- Division for Bioinformatics, Biocenter, Innsbruck Medical University, Innsbruck, Austria
| |
Collapse
|
890
|
Furney SJ, Turajlic S, Fenwick K, Lambros MB, MacKay A, Ricken G, Mitsopoulos C, Kozarewa I, Hakas J, Zvelebil M, Lord CJ, Ashworth A, Reis-Filho JS, Herlyn M, Murata H, Marais R. Genomic characterisation of acral melanoma cell lines. Pigment Cell Melanoma Res 2012; 25:488-92. [PMID: 22578220 DOI: 10.1111/j.1755-148x.2012.01016.x] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Acral melanoma is a rare melanoma subtype with distinct epidemiological, clinical and genetic features. To determine if acral melanoma cell lines are representative of this melanoma subtype, six lines were analysed by whole-exome sequencing and array comparative genomic hybridisation. We demonstrate that the cell lines display a mutation rate that is comparable to that of published primary and metastatic acral melanomas and observe a mutational signature suggestive of UV-induced mutagenesis in two of the cell lines. Mutations were identified in oncogenes and tumour suppressors previously linked to melanoma including BRAF, NRAS, KIT, PTEN and TP53, in cancer genes not previously linked to melanoma and in genes linked to DNA repair such as BRCA1 and BRCA2. Our findings provide strong circumstantial evidence to suggest that acral melanoma cell lines and acral tumours share genetic features in common and that these cells are therefore valuable tools to investigate the biology of this aggressive melanoma subtype. Data are available at: http://rock.icr.ac.uk/collaborations/Furney_et_al_2012/.
Collapse
Affiliation(s)
- Simon J Furney
- Signal Transduction Team, Division of Cancer Biology, Institute of Cancer Research, London, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
891
|
Single Nucleotide Polymorphism (SNP) Detection and Genotype Calling from Massively Parallel Sequencing (MPS) Data. STATISTICS IN BIOSCIENCES 2012; 5:3-25. [PMID: 24489615 DOI: 10.1007/s12561-012-9067-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Massively parallel sequencing (MPS), since its debut in 2005, has transformed the field of genomic studies. These new sequencing technologies have resulted in the successful identification of causal variants for several rare Mendelian disorders. They have also begun to deliver on their promise to explain some of the missing heritability from genome-wide association studies (GWAS) of complex traits. We anticipate a rapidly growing number of MPS-based studies for a diverse range of applications in the near future. One crucial and nearly inevitable step is to detect SNPs and call genotypes at the detected polymorphic sites from the sequencing data. Here, we review statistical methods that have been proposed in the past five years for this purpose. In addition, we discuss emerging issues and future directions related to SNP detection and genotype calling from MPS data.
Collapse
|
892
|
Medugorac I, Seichter D, Graf A, Russ I, Blum H, Göpel KH, Rothammer S, Förster M, Krebs S. Bovine polledness--an autosomal dominant trait with allelic heterogeneity. PLoS One 2012; 7:e39477. [PMID: 22737241 PMCID: PMC3380827 DOI: 10.1371/journal.pone.0039477] [Citation(s) in RCA: 95] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2012] [Accepted: 05/25/2012] [Indexed: 11/18/2022] Open
Abstract
The persistent horns are an important trait of speciation for the family Bovidae with complex morphogenesis taking place briefly after birth. The polledness is highly favourable in modern cattle breeding systems but serious animal welfare issues urge for a solution in the production of hornless cattle other than dehorning. Although the dominant inhibition of horn morphogenesis was discovered more than 70 years ago, and the causative mutation was mapped almost 20 years ago, its molecular nature remained unknown. Here, we report allelic heterogeneity of the POLLED locus. First, we mapped the POLLED locus to a ∼381-kb interval in a multi-breed case-control design. Targeted re-sequencing of an enlarged candidate interval (547 kb) in 16 sires with known POLLED genotype did not detect a common allele associated with polled status. In eight sires of Alpine and Scottish origin (four polled versus four horned), we identified a single candidate mutation, a complex 202 bp insertion-deletion event that showed perfect association to the polled phenotype in various European cattle breeds, except Holstein-Friesian. The analysis of the same candidate interval in eight Holsteins identified five candidate variants which segregate as a 260 kb haplotype also perfectly associated with the POLLED gene without recombination or interference with the 202 bp insertion-deletion. We further identified bulls which are progeny tested as homozygous polled but bearing both, 202 bp insertion-deletion and Friesian haplotype. The distribution of genotypes of the two putative POLLED alleles in large semi-random sample (1,261 animals) supports the hypothesis of two independent mutations.
Collapse
|
893
|
Xiong D, Li G, Li K, Xu Q, Pan Z, Ding F, Vedell P, Liu P, Cui P, Hua X, Jiang H, Yin Y, Zhu Z, Li X, Zhang B, Ma D, Wang Y, You M. Exome sequencing identifies MXRA5 as a novel cancer gene frequently mutated in non-small cell lung carcinoma from Chinese patients. Carcinogenesis 2012; 33:1797-805. [PMID: 22696596 DOI: 10.1093/carcin/bgs210] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Lung cancer has become the top killer among malignant tumors in China and is significantly associated with somatic genetic alterations. We performed exome sequencing of 14 non-small cell lung carcinomas (NSCLCs) with matched adjacent normal lung tissues extracted from Chinese patients. In addition to the lung cancer-related genes (TP53, EGFR, KRAS, PIK3CA, and ROS1), this study revealed "novel" genes not previously implicated in NSCLC. Especially, matrix-remodeling associated 5 was the second most frequently mutated gene in NSCLC (first is TP53). Subsequent Sanger sequencing of matrix-remodeling associated 5 in an additional sample set consisting of 52 paired tumor-normal DNA samples revealed that 15% of Chinese NSCLCs contained somatic mutations in matrix-remodeling associated 5. These findings, together with the results from pathway analysis, strongly indicate that altered extracellular matrix-remodeling may be involved in the etiology of NSCLC.
Collapse
Affiliation(s)
- Donghai Xiong
- Department of Pharmacology and Toxicology and Cancer Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
894
|
Kalender Atak Z, De Keersmaecker K, Gianfelici V, Geerdens E, Vandepoel R, Pauwels D, Porcu M, Lahortiga I, Brys V, Dirks WG, Quentmeier H, Cloos J, Cuppens H, Uyttebroeck A, Vandenberghe P, Cools J, Aerts S. High accuracy mutation detection in leukemia on a selected panel of cancer genes. PLoS One 2012; 7:e38463. [PMID: 22675565 PMCID: PMC3366948 DOI: 10.1371/journal.pone.0038463] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2011] [Accepted: 05/05/2012] [Indexed: 01/12/2023] Open
Abstract
With the advent of whole-genome and whole-exome sequencing, high-quality catalogs of recurrently mutated cancer genes are becoming available for many cancer types. Increasing access to sequencing technology, including bench-top sequencers, provide the opportunity to re-sequence a limited set of cancer genes across a patient cohort with limited processing time. Here, we re-sequenced a set of cancer genes in T-cell acute lymphoblastic leukemia (T-ALL) using Nimblegen sequence capture coupled with Roche/454 technology. First, we investigated how a maximal sensitivity and specificity of mutation detection can be achieved through a benchmark study. We tested nine combinations of different mapping and variant-calling methods, varied the variant calling parameters, and compared the predicted mutations with a large independent validation set obtained by capillary re-sequencing. We found that the combination of two mapping algorithms, namely BWA-SW and SSAHA2, coupled with the variant calling algorithm Atlas-SNP2 yields the highest sensitivity (95%) and the highest specificity (93%). Next, we applied this analysis pipeline to identify mutations in a set of 58 cancer genes, in a panel of 18 T-ALL cell lines and 15 T-ALL patient samples. We confirmed mutations in known T-ALL drivers, including PHF6, NF1, FBXW7, NOTCH1, KRAS, NRAS, PIK3CA, and PTEN. Interestingly, we also found mutations in several cancer genes that had not been linked to T-ALL before, including JAK3. Finally, we re-sequenced a small set of 39 candidate genes and identified recurrent mutations in TET1, SPRY3 and SPRY4. In conclusion, we established an optimized analysis pipeline for Roche/454 data that can be applied to accurately detect gene mutations in cancer, which led to the identification of several new candidate T-ALL driver mutations.
Collapse
Affiliation(s)
| | - Kim De Keersmaecker
- Center for Human Genetics, KU Leuven, Leuven, Belgium
- Center for the Biology of Disease, VIB, Leuven, Belgium
| | - Valentina Gianfelici
- Center for Human Genetics, KU Leuven, Leuven, Belgium
- Center for the Biology of Disease, VIB, Leuven, Belgium
| | - Ellen Geerdens
- Center for Human Genetics, KU Leuven, Leuven, Belgium
- Center for the Biology of Disease, VIB, Leuven, Belgium
| | - Roel Vandepoel
- Center for Human Genetics, KU Leuven, Leuven, Belgium
- Center for the Biology of Disease, VIB, Leuven, Belgium
| | - Daphnie Pauwels
- Center for Human Genetics, KU Leuven, Leuven, Belgium
- Center for the Biology of Disease, VIB, Leuven, Belgium
| | - Michaël Porcu
- Center for Human Genetics, KU Leuven, Leuven, Belgium
- Center for the Biology of Disease, VIB, Leuven, Belgium
| | - Idoya Lahortiga
- Center for Human Genetics, KU Leuven, Leuven, Belgium
- Center for the Biology of Disease, VIB, Leuven, Belgium
| | - Vanessa Brys
- Genomics Core Facility, University Hospitals Leuven, Leuven, Belgium
| | - Willy G. Dirks
- Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH, Braunschweig, Germany
| | - Hilmar Quentmeier
- Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH, Braunschweig, Germany
| | - Jacqueline Cloos
- Pediatric Oncology/Hematology and Hematology, VU Medical Center, Amsterdam, The Netherlands
| | - Harry Cuppens
- Genomics Core Facility, University Hospitals Leuven, Leuven, Belgium
| | - Anne Uyttebroeck
- Pediatric Hemato-oncology, University Hospitals Leuven, Leuven, Belgium
| | | | - Jan Cools
- Center for Human Genetics, KU Leuven, Leuven, Belgium
- Center for the Biology of Disease, VIB, Leuven, Belgium
- * E-mail: (SA); (J. Cools)
| | - Stein Aerts
- Center for Human Genetics, KU Leuven, Leuven, Belgium
- * E-mail: (SA); (J. Cools)
| |
Collapse
|
895
|
Diagnosis of fanconi anemia: mutation analysis by next-generation sequencing. Anemia 2012; 2012:132856. [PMID: 22720145 PMCID: PMC3374947 DOI: 10.1155/2012/132856] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2011] [Accepted: 03/21/2012] [Indexed: 12/31/2022] Open
Abstract
Fanconi anemia (FA) is a rare genetic instability syndrome characterized by developmental defects, bone marrow failure, and a high cancer risk. Fifteen genetic subtypes have been distinguished. The majority of patients (≈85%) belong to the subtypes A (≈60%), C (≈15%) or G (≈10%), while a minority (≈15%) is distributed over the remaining 12 subtypes. All subtypes seem to fit within the “classical” FA phenotype, except for D1 and N patients, who have more severe clinical symptoms. Since FA patients need special clinical management, the diagnosis should be firmly established, to exclude conditions with overlapping phenotypes. A valid FA diagnosis requires the detection of pathogenic mutations in a FA gene and/or a positive result from a chromosomal breakage test. Identification of the pathogenic mutations is also important for adequate genetic counselling and to facilitate prenatal or preimplantation genetic diagnosis. Here we describe and validate a comprehensive protocol for the molecular diagnosis of FA, based on massively parallel sequencing. We used this approach to identify BRCA2, FANCD2, FANCI and FANCL mutations in novel unclassified FA patients.
Collapse
|
896
|
|
897
|
GATA2 zinc finger 1 mutations associated with biallelic CEBPA mutations define a unique genetic entity of acute myeloid leukemia. Blood 2012; 120:395-403. [PMID: 22649106 DOI: 10.1182/blood-2012-01-403220] [Citation(s) in RCA: 123] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Cytogenetically normal acute myeloid leukemia (CN-AML) with biallelic CEBPA gene mutations (biCEPBA) represents a distinct disease entity with a favorable clinical outcome. So far, it is not known whether other genetic alterations cooperate with biCEBPA mutations during leukemogenesis. To identify additional mutations, we performed whole exome sequencing of 5 biCEBPA patients and detected somatic GATA2 zinc finger 1 (ZF1) mutations in 2 of 5 cases. Both GATA2 and CEBPA are transcription factors crucial for hematopoietic development. Inherited or acquired mutations in both genes have been associated with leukemogenesis. Further mutational screening detected novel GATA2 ZF1 mutations in 13 of 33 biCEBPA-positive CN-AML patients (13/33, 39.4%). No GATA2 mutations were found in 38 CN-AML patients with a monoallelic CEBPA mutation and in 89 CN-AML patients with wild-type CEBPA status. The presence of additional GATA2 mutations (n=10) did not significantly influence the clinical outcome of 26 biCEBPA-positive patients. In reporter gene assays, all tested GATA2 ZF1 mutants showed reduced capacity to enhance CEBPA-mediated activation of transcription, suggesting that the GATA2 ZF1 mutations may collaborate with biCEPBA mutations to deregulate target genes during malignant transformation. We thus provide evidence for a genetically distinct subgroup of CN-AML. The German AML cooperative group trials 1999 and 2008 are registered with the identifiers NCT00266136 and NCT01382147 at www.clinicaltrials.gov.
Collapse
|
898
|
Li M, Stoneking M. A new approach for detecting low-level mutations in next-generation sequence data. Genome Biol 2012; 13:R34. [PMID: 22621726 PMCID: PMC3446287 DOI: 10.1186/gb-2012-13-5-r34] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2011] [Revised: 05/14/2012] [Accepted: 05/23/2012] [Indexed: 01/01/2023] Open
Abstract
We propose a new method that incorporates population re-sequencing data, distribution of reads, and strand bias in detecting low-level mutations. The method can accurately identify low-level mutations down to a level of 2.3%, with an average coverage of 500×, and with a false discovery rate of less than 1%. In addition, we also discuss other problems in detecting low-level mutations, including chimeric reads and sample cross-contamination, and provide possible solutions to them.
Collapse
Affiliation(s)
- Mingkun Li
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, D04103, Leipzig, Germany.
| | | |
Collapse
|
899
|
Kwok H, Tong AHY, Lin CH, Lok S, Farrell PJ, Kwong DLW, Chiang AKS. Genomic sequencing and comparative analysis of Epstein-Barr virus genome isolated from primary nasopharyngeal carcinoma biopsy. PLoS One 2012; 7:e36939. [PMID: 22590638 PMCID: PMC3349645 DOI: 10.1371/journal.pone.0036939] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2012] [Accepted: 04/16/2012] [Indexed: 12/15/2022] Open
Abstract
Whether certain Epstein-Barr virus (EBV) strains are associated with pathogenesis of nasopharyngeal carcinoma (NPC) is still an unresolved question. In the present study, EBV genome contained in a primary NPC tumor biopsy was amplified by Polymerase Chain Reaction (PCR), and sequenced using next-generation (Illumina) and conventional dideoxy-DNA sequencing. The EBV genome, designated HKNPC1 (Genbank accession number JQ009376) is a type 1 EBV of approximately 171.5 kb. The virus appears to be a uniform strain in line with accepted monoclonal nature of EBV in NPC but is heterogeneous at 172 nucleotide positions. Phylogenetic analysis with the four published EBV strains, B95-8, AG876, GD1, and GD2, indicated HKNPC1 was more closely related to the Chinese NPC patient-derived strains, GD1 and GD2. HKNPC1 contains 1,589 single nucleotide variations (SNVs) and 132 insertions or deletions (indels) in comparison to the reference EBV sequence (accession number NC007605). When compared to AG876, a strain derived from Ghanaian Burkitt's lymphoma, we found 322 SNVs, of which 76 were non-synonymous SNVs and were shared amongst the Chinese GD1, GD2 and HKNPC1 isolates. We observed 88 non-synonymous SNVs shared only by HKNPC1 and GD2, the only other NPC tumor-derived strain reported thus far. Non-synonymous SNVs were mainly found in the latent, tegument and glycoprotein genes. The same point mutations were found in glycoprotein (BLLF1 and BALF4) genes of GD1, GD2 and HKNPC1 strains and might affect cell type specific binding. Variations in LMP1 and EBNA3B epitopes and mutations in Cp (11404 C>T) and Qp (50134 G>C) found in GD1, GD2 and HKNPC1 could potentially affect CD8+ T cell recognition and latent gene expression pattern in NPC, respectively. In conclusion, we showed that whole genome sequencing of EBV in NPC may facilitate discovery of previously unknown variations of pathogenic significance.
Collapse
Affiliation(s)
- Hin Kwok
- Department of Paediatrics and Adolescent Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Amy H. Y. Tong
- Genome Research Centre, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Chi Ho Lin
- Genome Research Centre, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Si Lok
- Genome Research Centre, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Paul J. Farrell
- Section of Virology, Imperial College Faculty of Medicine, London, United Kingdom
| | - Dora L. W. Kwong
- Department of Clinical Oncology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Alan K. S. Chiang
- Department of Paediatrics and Adolescent Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
- * E-mail:
| |
Collapse
|
900
|
Saunders CT, Wong WSW, Swamy S, Becq J, Murray LJ, Cheetham RK. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. ACTA ACUST UNITED AC 2012; 28:1811-7. [PMID: 22581179 DOI: 10.1093/bioinformatics/bts271] [Citation(s) in RCA: 1207] [Impact Index Per Article: 92.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
MOTIVATION Whole genome and exome sequencing of matched tumor-normal sample pairs is becoming routine in cancer research. The consequent increased demand for somatic variant analysis of paired samples requires methods specialized to model this problem so as to sensitively call variants at any practical level of tumor impurity. RESULTS We describe Strelka, a method for somatic SNV and small indel detection from sequencing data of matched tumor-normal samples. The method uses a novel Bayesian approach which represents continuous allele frequencies for both tumor and normal samples, while leveraging the expected genotype structure of the normal. This is achieved by representing the normal sample as a mixture of germline variation with noise, and representing the tumor sample as a mixture of the normal sample with somatic variation. A natural consequence of the model structure is that sensitivity can be maintained at high tumor impurity without requiring purity estimates. We demonstrate that the method has superior accuracy and sensitivity on impure samples compared with approaches based on either diploid genotype likelihoods or general allele-frequency tests. AVAILABILITY The Strelka workflow source code is available at ftp://strelka@ftp.illumina.com/. CONTACT csaunders@illumina.com
Collapse
|