901
|
Pfeffer G, Elliott HR, Griffin H, Barresi R, Miller J, Marsh J, Evilä A, Vihola A, Hackman P, Straub V, Dick DJ, Horvath R, Santibanez-Koref M, Udd B, Chinnery PF. Titin mutation segregates with hereditary myopathy with early respiratory failure. ACTA ACUST UNITED AC 2012; 135:1695-713. [PMID: 22577215 DOI: 10.1093/brain/aws102] [Citation(s) in RCA: 101] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
In 2001, we described an autosomal dominant myopathy characterized by neuromuscular ventilatory failure in ambulant patients. Here we describe the underlying genetic basis for the disorder, and we define the neuromuscular, respiratory and radiological phenotype in a study of 31 mutation carriers followed for up to 31 years. A combination of genome-wide linkage and whole exome sequencing revealed the likely causal genetic variant in the titin (TTN) gene (g.274375T>C; p.Cys30071Arg) within a shared haplotype of 2.93 Mbp on chromosome 2. This segregated with the phenotype in 21 individuals from the original family, nine subjects in a second family with the same highly selective pattern of muscle involvement on magnetic resonance imaging and a third familial case with a similar phenotype. Comparing the mutation carriers revealed novel features not apparent in our original report. The clinical presentation included predominant distal, proximal or respiratory muscle weakness. The age of onset was highly variable, from early adulthood, and including a mild phenotype in advanced age. Muscle weakness was earlier onset and more severe in the lower extremities in nearly all patients. Seven patients also had axial muscle weakness. Respiratory function studies demonstrated a gradual deterioration over time, reflecting the progressive nature of this condition. Cardiomyopathy was not present in any of our patients despite up to 31 years of follow-up. Magnetic resonance muscle imaging was performed in 21 affected patients and revealed characteristic abnormalities with semitendinosus involvement in 20 of 21 patients studied, including 3 patients who were presymptomatic. Diagnostic muscle histopathology most frequently revealed eosinophilic inclusions (inclusion bodies) and rimmed vacuoles, but was non-specific in a minority of patients. These findings have important clinical implications. This disease should be considered in patients with adult-onset proximal or distal myopathy and early respiratory failure, even in the presence of non-specific muscle pathology. Muscle magnetic resonance imaging findings are characteristic and should be considered as an initial investigation, and if positive should prompt screening for mutations in TTN. With 363 exons, screening TTN presented a major challenge until recently. However, whole exome sequencing provides a reliable cost-effective approach, providing the gene of interest is adequately captured.
Collapse
Affiliation(s)
- Gerald Pfeffer
- Institute of Genetic Medicine, Central Parkway, Newcastle, NE1 3BZ, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
902
|
Stone EA. Joint genotyping on the fly: identifying variation among a sequenced panel of inbred lines. Genome Res 2012; 22:966-74. [PMID: 22367192 PMCID: PMC3337441 DOI: 10.1101/gr.129122.111] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2011] [Accepted: 02/21/2012] [Indexed: 02/03/2023]
Abstract
High-throughput sequencing is enabling remarkably deep surveys of genomic variation. It is now possible to completely sequence multiple individuals from a single species, yet the identification of variation among them remains an evolving computational challenge. This challenge is compounded for experimental organisms when strains are studied instead of individuals. In response, we present the Joint Genotyper for Inbred Lines (JGIL) as a method for obtaining genotypes and identifying variation among a large panel of inbred strains or lines. JGIL inputs the sequence reads from each line after their alignment to a common reference. Its probabilistic model includes site-specific parameters common to all lines that describe the frequency of nucleotides segregating in the population from which the inbred panel was derived. The distribution of line genotypes is conditional on these parameters and reflects the experimental design. Site-specific error probabilities, also common to all lines, parameterize the distribution of reads conditional on line genotype and realized coverage. Both sets of parameters are estimated per site from the aggregate read data, and posterior probabilities are calculated to decode the genotype of each line. We present an application of JGIL to 162 inbred Drosophila melanogaster lines from the Drosophila Genetic Reference Panel. We explore by simulation the effect of varying coverage, sequencing error, mapping error, and the number of lines. In doing so, we illustrate how JGIL is robust to moderate levels of error. Supported by these analyses, we advocate the importance of modeling the data and the experimental design when possible.
Collapse
Affiliation(s)
- Eric A Stone
- Department of Genetics and Bioinformatics Research Center, North Carolina State University, Raleigh, NC 27603, USA.
| |
Collapse
|
903
|
Gerstung M, Beisel C, Rechsteiner M, Wild P, Schraml P, Moch H, Beerenwinkel N. Reliable detection of subclonal single-nucleotide variants in tumour cell populations. Nat Commun 2012; 3:811. [PMID: 22549840 DOI: 10.1038/ncomms1814] [Citation(s) in RCA: 180] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2011] [Accepted: 03/30/2012] [Indexed: 01/06/2023] Open
|
904
|
Determination of RET Sequence Variation in an MEN2 Unaffected Cohort Using Multiple-Sample Pooling and Next-Generation Sequencing. J Thyroid Res 2012; 2012:318232. [PMID: 22545224 PMCID: PMC3321559 DOI: 10.1155/2012/318232] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/07/2011] [Accepted: 01/23/2012] [Indexed: 11/30/2022] Open
Abstract
Multisample, nonindexed pooling combined with next-generation sequencing (NGS) was used to discover RET proto-oncogene sequence variation within a cohort known to be unaffected by multiple endocrine neoplasia type 2 (MEN2). DNA samples (113 Caucasians, 23 persons of other ethnicities) were amplified for RET intron 9 to intron 16 and then divided into 5 pools of <30 samples each before library prep and NGS. Two controls were included in this study, a single sample and a pool of 50 samples that had been previously sequenced by the same NGS methods. All 59 variants previously detected in the 50-pool control were present. Of the 61 variants detected in the unaffected cohort, 20 variants were novel changes. Several variants were validated by high-resolution melting analysis and Sanger sequencing, and their allelic frequencies correlated well with those determined by NGS. The results from this unaffected cohort will be added to the RET MEN2 database.
Collapse
|
905
|
Swaminathan K, Chae WB, Mitros T, Varala K, Xie L, Barling A, Glowacka K, Hall M, Jezowski S, Ming R, Hudson M, Juvik JA, Rokhsar DS, Moose SP. A framework genetic map for Miscanthus sinensis from RNAseq-based markers shows recent tetraploidy. BMC Genomics 2012; 13:142. [PMID: 22524439 PMCID: PMC3355032 DOI: 10.1186/1471-2164-13-142] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2012] [Accepted: 04/24/2012] [Indexed: 11/24/2022] Open
Abstract
Background Miscanthus (subtribe Saccharinae, tribe Andropogoneae, family Poaceae) is a genus of temperate perennial C4 grasses whose high biomass production makes it, along with its close relatives sugarcane and sorghum, attractive as a biofuel feedstock. The base chromosome number of Miscanthus (x = 19) is different from that of other Saccharinae and approximately twice that of the related Sorghum bicolor (x = 10), suggesting large-scale duplications may have occurred in recent ancestors of Miscanthus. Owing to the complexity of the Miscanthus genome and the complications of self-incompatibility, a complete genetic map with a high density of markers has not yet been developed. Results We used deep transcriptome sequencing (RNAseq) from two M. sinensis accessions to define 1536 single nucleotide variants (SNVs) for a GoldenGate™ genotyping array, and found that simple sequence repeat (SSR) markers defined in sugarcane are often informative in M. sinensis. A total of 658 SNP and 210 SSR markers were validated via segregation in a full sibling F1 mapping population. Using 221 progeny from this mapping population, we constructed a genetic map for M. sinensis that resolves into 19 linkage groups, the haploid chromosome number expected from cytological evidence. Comparative genomic analysis documents a genome-wide duplication in Miscanthus relative to Sorghum bicolor, with subsequent insertional fusion of a pair of chromosomes. The utility of the map is confirmed by the identification of two paralogous C4-pyruvate, phosphate dikinase (C4-PPDK) loci in Miscanthus, at positions syntenic to the single orthologous gene in Sorghum. Conclusions The genus Miscanthus experienced an ancestral tetraploidy and chromosome fusion prior to its diversification, but after its divergence from the closely related sugarcane clade. The recent timing of this tetraploidy complicates discovery and mapping of genetic markers for Miscanthus species, since alleles and fixed differences between paralogs are comparable. These difficulties can be overcome by careful analysis of segregation patterns in a mapping population and genotyping of doubled haploids. The genetic map for Miscanthus will be useful in biological discovery and breeding efforts to improve this emerging biofuel crop, and also provide a valuable resource for understanding genomic responses to tetraploidy and chromosome fusion.
Collapse
Affiliation(s)
- Kankshita Swaminathan
- Energy Biosciences Institute, Institute for Genomic Biology, University of Illinois Urbana, Urbana, IL 61801, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
906
|
Abstract
Short tandem repeats (STRs) have a wide range of applications, including medical genetics, forensics, and genetic genealogy. High-throughput sequencing (HTS) has the potential to profile hundreds of thousands of STR loci. However, mainstream bioinformatics pipelines are inadequate for the task. These pipelines treat STR mapping as gapped alignment, which results in cumbersome processing times and a biased sampling of STR alleles. Here, we present lobSTR, a novel method for profiling STRs in personal genomes. lobSTR harnesses concepts from signal processing and statistical learning to avoid gapped alignment and to address the specific noise patterns in STR calling. The speed and reliability of lobSTR exceed the performance of current mainstream algorithms for STR profiling. We validated lobSTR's accuracy by measuring its consistency in calling STRs from whole-genome sequencing of two biological replicates from the same individual, by tracing Mendelian inheritance patterns in STR alleles in whole-genome sequencing of a HapMap trio, and by comparing lobSTR results to traditional molecular techniques. Encouraged by the speed and accuracy of lobSTR, we used the algorithm to conduct a comprehensive survey of STR variations in a deeply sequenced personal genome. We traced the mutation dynamics of close to 100,000 STR loci and observed more than 50,000 STR variations in a single genome. lobSTR's implementation is an end-to-end solution. The package accepts raw sequencing reads and provides the user with the genotyping results. It is written in C/C++, includes multi-threading capabilities, and is compatible with the BAM format.
Collapse
Affiliation(s)
- Melissa Gymrek
- Harvard-MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | | | | | | |
Collapse
|
907
|
Liu P, Morrison C, Wang L, Xiong D, Vedell P, Cui P, Hua X, Ding F, Lu Y, James M, Ebben JD, Xu H, Adjei AA, Head K, Andrae JW, Tschannen MR, Jacob H, Pan J, Zhang Q, Van den Bergh F, Xiao H, Lo KC, Patel J, Richmond T, Watt MA, Albert T, Selzer R, Anderson M, Wang J, Wang Y, Starnes S, Yang P, You M. Identification of somatic mutations in non-small cell lung carcinomas using whole-exome sequencing. Carcinogenesis 2012; 33:1270-6. [PMID: 22510280 DOI: 10.1093/carcin/bgs148] [Citation(s) in RCA: 170] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Lung cancer is the leading cause of cancer-related death, with non-small cell lung cancer (NSCLC) being the predominant form of the disease. Most lung cancer is caused by the accumulation of genomic alterations due to tobacco exposure. To uncover its mutational landscape, we performed whole-exome sequencing in 31 NSCLCs and their matched normal tissue samples. We identified both common and unique mutation spectra and pathway activation in lung adenocarcinomas and squamous cell carcinomas, two major histologies in NSCLC. In addition to identifying previously known lung cancer genes (TP53, KRAS, EGFR, CDKN2A and RB1), the analysis revealed many genes not previously implicated in this malignancy. Notably, a novel gene CSMD3 was identified as the second most frequently mutated gene (next to TP53) in lung cancer. We further demonstrated that loss of CSMD3 results in increased proliferation of airway epithelial cells. The study provides unprecedented insights into mutational processes, cellular pathways and gene networks associated with lung cancer. Of potential immediate clinical relevance, several highly mutated genes identified in our study are promising druggable targets in cancer therapy including ALK, CTNNA3, DCC, MLL3, PCDHIIX, PIK3C2B, PIK3CG and ROCK2.
Collapse
Affiliation(s)
- Pengyuan Liu
- Department of Physiology and Cancer Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
908
|
Homolka A, Eder T, Kopecky D, Berenyi M, Burg K, Fluch S. Allele discovery of ten candidate drought-response genes in Austrian oak using a systematically informatics approach based on 454 amplicon sequencing. BMC Res Notes 2012; 5:175. [PMID: 22472016 PMCID: PMC3420255 DOI: 10.1186/1756-0500-5-175] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2011] [Accepted: 04/03/2012] [Indexed: 12/01/2022] Open
Abstract
Background Rise of temperatures and shortening of available water as result of predicted climate change will impose significant pressure on long-lived forest tree species. Discovering allelic variation present in drought related genes of two Austrian oak species can be the key to understand mechanisms of natural selection and provide forestry with key tools to cope with future challenges. Results In the present study we have used Roche 454 sequencing and developed a bioinformatic pipeline to process multiplexed tagged amplicons in order to identify single nucleotide polymorphisms and allelic sequences of ten candidate genes related to drought/osmotic stress from sessile oak (Quercus robur) and sessile oak (Q. petraea) individuals. Out of these, eight genes of 336 oak individuals growing in Austria have been detected with a total number of 158 polymorphic sites. Allele numbers ranged from ten to 52 with observed heterozygosity ranging from 0.115 to 0.640. All loci deviated from Hardy-Weinberg equilibrium and linkage disequilibrium was found among six combinations of loci. Conclusions We have characterized 183 alleles of drought related genes from oak species and detected first evidences of natural selection. Beside the potential for marker development, we have created an expandable bioinformatic pipeline for the analysis of next generation sequencing data.
Collapse
Affiliation(s)
- Andreas Homolka
- Health and Environment Department, AIT Austrian Institute of Technology, Tulln, A-3430, Austria.
| | | | | | | | | | | |
Collapse
|
909
|
Neuman JA, Isakov O, Shomron N. Analysis of insertion-deletion from deep-sequencing data: software evaluation for optimal detection. Brief Bioinform 2012; 14:46-55. [PMID: 22707752 DOI: 10.1093/bib/bbs013] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Insertion and deletion (indel) mutations, the most common type of structural variance in the human genome, affect a multitude of human traits and diseases. New sequencing technologies, such as deep sequencing, allow massive throughput of sequence data and greatly contribute to the field of disease causing mutation detection, in general, and indel detection, specifically. In order to infer indel presence (indel calling), the deep-sequencing data have to undergo comprehensive computational analysis. Selecting which indel calling software to use can often skew the results and inherent tool limitations may affect downstream analysis. In order to better understand these inter-software differences, we evaluated the performance of several indel calling software for short indel (1-10 nt) detection. We compared the software's sensitivity and predictive values in the presence of varying parameters such as read depth (coverage), read length, indel size and frequency. We pinpoint several key features that assist successful experimental design and appropriate tool selection. Our study may also serve as a basis for future evaluation of additional indel calling methods.
Collapse
Affiliation(s)
- Joseph A Neuman
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | | | | |
Collapse
|
910
|
Izutsu M, Zhou J, Sugiyama Y, Nishimura O, Aizu T, Toyoda A, Fujiyama A, Agata K, Fuse N. Genome features of "Dark-fly", a Drosophila line reared long-term in a dark environment. PLoS One 2012; 7:e33288. [PMID: 22432011 PMCID: PMC3303825 DOI: 10.1371/journal.pone.0033288] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2011] [Accepted: 02/08/2012] [Indexed: 11/22/2022] Open
Abstract
Organisms are remarkably adapted to diverse environments by specialized metabolisms, morphology, or behaviors. To address the molecular mechanisms underlying environmental adaptation, we have utilized a Drosophila melanogaster line, termed “Dark-fly”, which has been maintained in constant dark conditions for 57 years (1400 generations). We found that Dark-fly exhibited higher fecundity in dark than in light conditions, indicating that Dark-fly possesses some traits advantageous in darkness. Using next-generation sequencing technology, we determined the whole genome sequence of Dark-fly and identified approximately 220,000 single nucleotide polymorphisms (SNPs) and 4,700 insertions or deletions (InDels) in the Dark-fly genome compared to the genome of the Oregon-R-S strain, a control strain. 1.8% of SNPs were classified as non-synonymous SNPs (nsSNPs: i.e., they alter the amino acid sequence of gene products). Among them, we detected 28 nonsense mutations (i.e., they produce a stop codon in the protein sequence) in the Dark-fly genome. These included genes encoding an olfactory receptor and a light receptor. We also searched runs of homozygosity (ROH) regions as putative regions selected during the population history, and found 21 ROH regions in the Dark-fly genome. We identified 241 genes carrying nsSNPs or InDels in the ROH regions. These include a cluster of alpha-esterase genes that are involved in detoxification processes. Furthermore, analysis of structural variants in the Dark-fly genome showed the deletion of a gene related to fatty acid metabolism. Our results revealed unique features of the Dark-fly genome and provided a list of potential candidate genes involved in environmental adaptation.
Collapse
Affiliation(s)
- Minako Izutsu
- Laboratory for Biodiversity, Global COE Program, Graduate School of Science, Kyoto University, Kyoto, Japan
- Laboratory for Molecular Developmental Biology, Graduate School of Science, Kyoto University, Kyoto, Japan
| | - Jun Zhou
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Yuzo Sugiyama
- Laboratory for Biodiversity, Global COE Program, Graduate School of Science, Kyoto University, Kyoto, Japan
| | - Osamu Nishimura
- Laboratory for Biodiversity, Global COE Program, Graduate School of Science, Kyoto University, Kyoto, Japan
| | - Tomoyuki Aizu
- Comparative Genomics Laboratory, National Institute of Genetics, Mishima, Japan
| | - Atsushi Toyoda
- Comparative Genomics Laboratory, National Institute of Genetics, Mishima, Japan
| | - Asao Fujiyama
- Comparative Genomics Laboratory, National Institute of Genetics, Mishima, Japan
| | - Kiyokazu Agata
- Laboratory for Biodiversity, Global COE Program, Graduate School of Science, Kyoto University, Kyoto, Japan
- Laboratory for Molecular Developmental Biology, Graduate School of Science, Kyoto University, Kyoto, Japan
| | - Naoyuki Fuse
- Laboratory for Biodiversity, Global COE Program, Graduate School of Science, Kyoto University, Kyoto, Japan
- * E-mail:
| |
Collapse
|
911
|
Simon UK, Trajanoski S, Kroneis T, Sedlmayr P, Guelly C, Guttenberger H. Accession-Specific Haplotypes of the Internal Transcribed Spacer Region in Arabidopsis thaliana--A Means for Barcoding Populations. Mol Biol Evol 2012; 29:2231-9. [DOI: 10.1093/molbev/mss093] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
|
912
|
McElroy KE, Luciani F, Thomas T. GemSIM: general, error-model based simulator of next-generation sequencing data. BMC Genomics 2012; 13:74. [PMID: 22336055 PMCID: PMC3305602 DOI: 10.1186/1471-2164-13-74] [Citation(s) in RCA: 107] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2011] [Accepted: 02/15/2012] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND GemSIM, or General Error-Model based SIMulator, is a next-generation sequencing simulator capable of generating single or paired-end reads for any sequencing technology compatible with the generic formats SAM and FASTQ (including Illumina and Roche/454). GemSIM creates and uses empirically derived, sequence-context based error models to realistically emulate individual sequencing runs and/or technologies. Empirical fragment length and quality score distributions are also used. Reads may be drawn from one or more genomes or haplotype sets, facilitating simulation of deep sequencing, metagenomic, and resequencing projects. RESULTS We demonstrate GemSIM's value by deriving error models from two different Illumina sequencing runs and one Roche/454 run, and comparing and contrasting the resulting error profiles of each run. Overall error rates varied dramatically, both between individual Illumina runs, between the first and second reads in each pair, and between datasets from Illumina and Roche/454 technologies. Indels were markedly more frequent in Roche/454 than Illumina and both technologies suffered from an increase in error rates near the end of each read.The effects of these different profiles on low-frequency SNP-calling accuracy were investigated by analysing simulated sequencing data for a mixture of bacterial haplotypes. In general, SNP-calling using VarScan was only accurate for SNPs with frequency > 3%, independent of which error model was used to simulate the data. Variation between error profiles interacted strongly with VarScan's 'minumum average quality' parameter, resulting in different optimal settings for different sequencing runs. CONCLUSIONS Next-generation sequencing has unprecedented potential for assessing genetic diversity, however analysis is complicated as error profiles can vary noticeably even between different runs of the same technology. Simulation with GemSIM can help overcome this problem, by providing insights into the error profiles of individual sequencing runs and allowing researchers to assess the effects of these errors on downstream data analysis.
Collapse
Affiliation(s)
- Kerensa E McElroy
- Centre for Marine Bio-Innovation and School of Biotechnology and Biomolecular Sciences, UNSW, Sydney, NSW, Australia
| | | | | |
Collapse
|
913
|
Cancer exome analysis reveals a T-cell-dependent mechanism of cancer immunoediting. Nature 2012; 482:400-4. [PMID: 22318521 PMCID: PMC3874809 DOI: 10.1038/nature10755] [Citation(s) in RCA: 976] [Impact Index Per Article: 75.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2011] [Accepted: 12/02/2011] [Indexed: 01/03/2023]
Abstract
Cancer immunoediting, the process whereby the immune system controls tumour outgrowth and shapes tumour immunogenicity, is comprised of three phases: elimination, equilibrium and escape1–5. Although many immune components that participate in this process are known, its underlying mechanisms remain poorly defined. A central tenet of cancer immunoediting is that T cell recognition of tumour antigens drives the immunologic destruction or sculpting of a developing cancer. However, our current understanding of tumour antigens comes largely from analyses of cancers that develop in immunocompetent hosts and thus may have already been edited. Little is known about the antigens expressed in nascent tumour cells, whether they are sufficient to induce protective anti-tumour immune responses or whether their expression is modulated by the immune system. Here, using massively parallel sequencing, we characterize expressed mutations in highly immunogenic methylcholanthrene-induced sarcomas derived from immunodeficient Rag2−/− mice which phenotypically resemble nascent primary tumour cells1,3,5. Employing class I prediction algorithms, we identify mutant spectrin-β2 as a potential rejection antigen of the d42m1 sarcoma and validate this prediction by conventional antigen expression cloning and detection. We also demonstrate that cancer immunoediting of d42m1 occurs via a T cell-dependent immunoselection process that promotes outgrowth of pre-existing tumour cell clones lacking highly antigenic mutant spectrin-β2 and other potential strong antigens. These results demonstrate that the strong immunogenicity of an unedited tumour can be ascribed to expression of highly antigenic mutant proteins and show that outgrowth of tumour cells that lack these strong antigens via a T cell-dependent immunoselection process represents one mechanism of cancer immunoediting.
Collapse
|
914
|
VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res 2012; 22:568-76. [PMID: 22300766 DOI: 10.1101/gr.129684.111] [Citation(s) in RCA: 3562] [Impact Index Per Article: 274.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Cancer is a disease driven by genetic variation and mutation. Exome sequencing can be utilized for discovering these variants and mutations across hundreds of tumors. Here we present an analysis tool, VarScan 2, for the detection of somatic mutations and copy number alterations (CNAs) in exome data from tumor-normal pairs. Unlike most current approaches, our algorithm reads data from both samples simultaneously; a heuristic and statistical algorithm detects sequence variants and classifies them by somatic status (germline, somatic, or LOH); while a comparison of normalized read depth delineates relative copy number changes. We apply these methods to the analysis of exome sequence data from 151 high-grade ovarian tumors characterized as part of the Cancer Genome Atlas (TCGA). We validated some 7790 somatic coding mutations, achieving 93% sensitivity and 85% precision for single nucleotide variant (SNV) detection. Exome-based CNA analysis identified 29 large-scale alterations and 619 focal events per tumor on average. As in our previous analysis of these data, we observed frequent amplification of oncogenes (e.g., CCNE1, MYC) and deletion of tumor suppressors (NF1, PTEN, and CDKN2A). We searched for additional recurrent focal CNAs using the correlation matrix diagonal segmentation (CMDS) algorithm, which identified 424 significant events affecting 582 genes. Taken together, our results demonstrate the robust performance of VarScan 2 for somatic mutation and CNA detection and shed new light on the landscape of genetic alterations in ovarian cancer.
Collapse
|
915
|
Turajlic S, Furney SJ, Lambros MB, Mitsopoulos C, Kozarewa I, Geyer FC, MacKay A, Hakas J, Zvelebil M, Lord CJ, Ashworth A, Thomas M, Stamp G, Larkin J, Reis-Filho JS, Marais R. Whole genome sequencing of matched primary and metastatic acral melanomas. Genome Res 2012; 22:196-207. [PMID: 22183965 PMCID: PMC3266028 DOI: 10.1101/gr.125591.111] [Citation(s) in RCA: 125] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2011] [Accepted: 11/29/2011] [Indexed: 12/25/2022]
Abstract
Next generation sequencing has enabled systematic discovery of mutational spectra in cancer samples. Here, we used whole genome sequencing to characterize somatic mutations and structural variation in a primary acral melanoma and its lymph node metastasis. Our data show that the somatic mutational rates in this acral melanoma sample pair were more comparable to the rates reported in cancer genomes not associated with mutagenic exposure than in the genome of a melanoma cell line or the transcriptome of melanoma short-term cultures. Despite the perception that acral skin is sun-protected, the dominant mutational signature in these samples is compatible with damage due to ultraviolet light exposure. A nonsense mutation in ERCC5 discovered in both the primary and metastatic tumors could also have contributed to the mutational signature through accumulation of unrepaired dipyrimidine lesions. However, evidence of transcription-coupled repair was suggested by the lower mutational rate in the transcribed regions and expressed genes. The primary and the metastasis are highly similar at the level of global gene copy number alterations, loss of heterozygosity and single nucleotide variation (SNV). Furthermore, the majority of the SNVs in the primary tumor were propagated in the metastasis and one nonsynonymous coding SNV and one splice site mutation appeared to arise de novo in the metastatic lesion.
Collapse
Affiliation(s)
- Samra Turajlic
- Signal Transduction Team, Division of Cancer Biology, Institute of Cancer Research, London SW3 6JB, United Kingdom
| | - Simon J. Furney
- Signal Transduction Team, Division of Cancer Biology, Institute of Cancer Research, London SW3 6JB, United Kingdom
| | - Maryou B. Lambros
- Molecular Pathology Team, The Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London SW3 6JB, United Kingdom
| | - Costas Mitsopoulos
- Cancer Informatics, The Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London SW3 6JB, United Kingdom
| | - Iwanka Kozarewa
- Division of Breast Cancer Research, The Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London SW3 6JB, United Kingdom
| | - Felipe C. Geyer
- Molecular Pathology Team, The Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London SW3 6JB, United Kingdom
| | - Alan MacKay
- Molecular Pathology Team, The Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London SW3 6JB, United Kingdom
| | - Jarle Hakas
- Cancer Informatics, The Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London SW3 6JB, United Kingdom
| | - Marketa Zvelebil
- Cancer Informatics, The Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London SW3 6JB, United Kingdom
| | - Christopher J. Lord
- Division of Breast Cancer Research, The Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London SW3 6JB, United Kingdom
| | - Alan Ashworth
- Division of Breast Cancer Research, The Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London SW3 6JB, United Kingdom
| | - Meirion Thomas
- Department of Surgery, Royal Marsden Hospital, London SW3 6JJ, United Kingdom
| | - Gordon Stamp
- Department of Histopathology, Royal Marsden Hospital, London SW3 6JJ, United Kingdom
| | - James Larkin
- Melanoma Unit, Royal Marsden Hospital, London SW3 6JJ, United Kingdom
| | - Jorge S. Reis-Filho
- Molecular Pathology Team, The Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London SW3 6JB, United Kingdom
| | - Richard Marais
- Signal Transduction Team, Division of Cancer Biology, Institute of Cancer Research, London SW3 6JB, United Kingdom
| |
Collapse
|
916
|
Vermaat JS, Nijman IJ, Koudijs MJ, Gerritse FL, Scherer SJ, Mokry M, Roessingh WM, Lansu N, de Bruijn E, van Hillegersberg R, van Diest PJ, Cuppen E, Voest EE. Primary colorectal cancers and their subsequent hepatic metastases are genetically different: implications for selection of patients for targeted treatment. Clin Cancer Res 2012; 18:688-99. [PMID: 22173549 DOI: 10.1158/1078-0432.ccr-11-1965] [Citation(s) in RCA: 125] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
PURPOSE In the era of DNA-guided personalized cancer treatment, it is essential to conduct predictive analysis on the tissue that matters. Here, we analyzed genetic differences between primary colorectal adenocarcinomas (CRC) and their respective hepatic metastasis. EXPERIMENTAL DESIGN The primary CRC and the subsequent hepatic metastasis of 21 patients with CRC were analyzed using targeted deep-sequencing of DNA isolated from formalin-fixed, paraffin-embedded archived material. RESULTS We have interrogated the genetic constitution of a designed "Cancer Mini-Genome" consisting of all exons of 1,264 genes associated with pathways relevant to cancer. In total, 6,696 known and 1,305 novel variations were identified in 1,174 and 667 genes, respectively, including 817 variants that potentially altered protein function. On average, 83 (SD = 69) potentially function-impairing variations were gained in the metastasis and 70 (SD = 48) variations were lost, showing that the primary tumor and hepatic metastasis are genetically significantly different. Besides novel and known variations in genes such as KRAS, BRAF, KDR, FLT1, PTEN, and PI3KCA, aberrations in the up/downstream genes of EGFR/PI3K/VEGF-pathways and other pathways (mTOR, TGFβ, etc.) were also detected, potentially influencing therapeutic responsiveness. Chemotherapy between removal of the primary tumor and the metastasis (N = 11) did not further increase the amount of genetic variation. CONCLUSION Our study indicates that the genetic characteristics of the hepatic metastases are different from those of the primary CRC tumor. As a consequence, the choice of treatment in studies investigating targeted therapies should ideally be based on the genetic properties of the metastasis rather than on those of the primary tumor.
Collapse
Affiliation(s)
- Joost S Vermaat
- Department of Medical Oncology, University Medical Center Utrecht, The Netherlands
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
917
|
Strickler SR, Bombarely A, Mueller LA. Designing a transcriptome next-generation sequencing project for a nonmodel plant species. AMERICAN JOURNAL OF BOTANY 2012; 99:257-66. [PMID: 22268224 DOI: 10.3732/ajb.1100292] [Citation(s) in RCA: 134] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
The application of next-generation sequencing (NGS) to transcriptomics, commonly called RNA-seq, allows the nearly complete characterization of transcriptomic events occurring in a specific tissue. It has proven particularly useful in nonmodel species, which often lack the resources available for sequenced organisms. Mainly, RNA-seq does not require a reference genome to gain useful transcriptomic information. In this review, the application of RNA-seq to nonmodel plant species will be addressed. Important experimental considerations from presequencing issues to postsequencing analysis, including sample and platform selection, and useful bioinformatics tools for assembly and data analysis, are covered. Methods of assembling RNA-seq data and analyses commonly performed with RNA-seq data, including single nucleotide polymorphism detection and analysis of differential expression, are explored. In addition, studies that have used RNA-seq to elucidate nonmodel plant transcriptomics are highlighted.
Collapse
Affiliation(s)
- Susan R Strickler
- Boyce Thompson Institute for Plant Research, Tower Road, Ithaca, New York 14853, USA
| | | | | |
Collapse
|
918
|
Roth A, Ding J, Morin R, Crisan A, Ha G, Giuliany R, Bashashati A, Hirst M, Turashvili G, Oloumi A, Marra MA, Aparicio S, Shah SP. JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data. ACTA ACUST UNITED AC 2012; 28:907-13. [PMID: 22285562 PMCID: PMC3315723 DOI: 10.1093/bioinformatics/bts053] [Citation(s) in RCA: 120] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Identification of somatic single nucleotide variants (SNVs) in tumour genomes is a necessary step in defining the mutational landscapes of cancers. Experimental designs for genome-wide ascertainment of somatic mutations now routinely include next-generation sequencing (NGS) of tumour DNA and matched constitutional DNA from the same individual. This allows investigators to control for germline polymorphisms and distinguish somatic mutations that are unique to the tumour, thus reducing the burden of labour-intensive and expensive downstream experiments needed to verify initial predictions. In order to make full use of such paired datasets, computational tools for simultaneous analysis of tumour-normal paired sequence data are required, but are currently under-developed and under-represented in the bioinformatics literature. RESULTS In this contribution, we introduce two novel probabilistic graphical models called JointSNVMix1 and JointSNVMix2 for jointly analysing paired tumour-normal digital allelic count data from NGS experiments. In contrast to independent analysis of the tumour and normal data, our method allows statistical strength to be borrowed across the samples and therefore amplifies the statistical power to identify and distinguish both germline and somatic events in a unified probabilistic framework. AVAILABILITY The JointSNVMix models and four other models discussed in the article are part of the JointSNVMix software package available for download at http://compbio.bccrc.ca CONTACT sshah@bccrc.ca SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Andrew Roth
- Department of Molecular Oncology, BC Cancer Agency, BC, Canada
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
919
|
Abstract
The issue of heterozygosity continues to be a challenge in the analysis of genome sequences. In this article, we describe the use of allele ratios to distinguish biologically significant single-nucleotide variants from background noise. An application of this approach is the identification of lethal mutations in Caenorhabditis elegans essential genes, which must be maintained by the presence of a wild-type allele on a balancer. The h448 allele of let-504 is rescued by the duplication balancer sDp2. We readily identified the extent of the duplication when the percentage of read support for the lesion was between 70 and 80%. Examination of the EMS-induced changes throughout the genome revealed that these mutations exist in contiguous blocks. During early embryonic division in self-fertilizing C. elegans, alkylated guanines pair with thymines. As a result, EMS-induced changes become fixed as either G→A or C→T changes along the length of the chromosome. Thus, examination of the distribution of EMS-induced changes revealed the mutational and recombinational history of the chromosome, even generations later. We identified the mutational change responsible for the h448 mutation and sequenced PCR products for an additional four alleles, correlating let-504 with the DNA-coding region for an ortholog of a NFκB-activating protein, NKAP. Our results confirm that whole-genome sequencing is an efficient and inexpensive way of identifying nucleotide alterations responsible for lethal phenotypes and can be applied on a large scale to identify the molecular basis of essential genes.
Collapse
|
920
|
You N, Murillo G, Su X, Zeng X, Xu J, Ning K, Zhang S, Zhu J, Cui X. SNP calling using genotype model selection on high-throughput sequencing data. ACTA ACUST UNITED AC 2012; 28:643-50. [PMID: 22253293 DOI: 10.1093/bioinformatics/bts001] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION A review of the available single nucleotide polymorphism (SNP) calling procedures for Illumina high-throughput sequencing (HTS) platform data reveals that most rely mainly on base-calling and mapping qualities as sources of error when calling SNPs. Thus, errors not involved in base-calling or alignment, such as those in genomic sample preparation, are not accounted for. RESULTS A novel method of consensus and SNP calling, Genotype Model Selection (GeMS), is given which accounts for the errors that occur during the preparation of the genomic sample. Simulations and real data analyses indicate that GeMS has the best performance balance of sensitivity and positive predictive value among the tested SNP callers. AVAILABILITY The GeMS package can be downloaded from https://sites.google.com/a/bioinformatics.ucr.edu/xinping-cui/home/software or http://computationalbioenergy.org/software.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Na You
- Department of Statistical Science, School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou 510275, China
| | | | | | | | | | | | | | | | | |
Collapse
|
921
|
Challis D, Yu J, Evani US, Jackson AR, Paithankar S, Coarfa C, Milosavljevic A, Gibbs RA, Yu F. An integrative variant analysis suite for whole exome next-generation sequencing data. BMC Bioinformatics 2012; 13:8. [PMID: 22239737 PMCID: PMC3292476 DOI: 10.1186/1471-2105-13-8] [Citation(s) in RCA: 213] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2011] [Accepted: 01/12/2012] [Indexed: 11/24/2022] Open
Abstract
Background Whole exome capture sequencing allows researchers to cost-effectively sequence the coding regions of the genome. Although the exome capture sequencing methods have become routine and well established, there is currently a lack of tools specialized for variant calling in this type of data. Results Using statistical models trained on validated whole-exome capture sequencing data, the Atlas2 Suite is an integrative variant analysis pipeline optimized for variant discovery on all three of the widely used next generation sequencing platforms (SOLiD, Illumina, and Roche 454). The suite employs logistic regression models in conjunction with user-adjustable cutoffs to accurately separate true SNPs and INDELs from sequencing and mapping errors with high sensitivity (96.7%). Conclusion We have implemented the Atlas2 Suite and applied it to 92 whole exome samples from the 1000 Genomes Project. The Atlas2 Suite is available for download at http://sourceforge.net/projects/atlas2/. In addition to a command line version, the suite has been integrated into the Genboree Workbench, allowing biomedical scientists with minimal informatics expertise to remotely call, view, and further analyze variants through a simple web interface. The existing genomic databases displayed via the Genboree browser also streamline the process from variant discovery to functional genomics analysis, resulting in an off-the-shelf toolkit for the broader community.
Collapse
Affiliation(s)
- Danny Challis
- The Human Genome Sequencing Center, Baylor College of Medicine, Houston, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
922
|
Ding L, Ley TJ, Larson DE, Miller CA, Koboldt DC, Welch JS, Ritchey JK, Young MA, Lamprecht T, McLellan MD, McMichael JF, Wallis JW, Lu C, Shen D, Harris CC, Dooling DJ, Fulton RS, Fulton LL, Chen K, Schmidt H, Kalicki-Veizer J, Magrini VJ, Cook L, McGrath SD, Vickery TL, Wendl MC, Heath S, Watson MA, Link DC, Tomasson MH, Shannon WD, Payton JE, Kulkarni S, Westervelt P, Walter MJ, Graubert TA, Mardis ER, Wilson RK, DiPersio JF. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature 2012; 481:506-10. [PMID: 22237025 PMCID: PMC3267864 DOI: 10.1038/nature10738] [Citation(s) in RCA: 1582] [Impact Index Per Article: 121.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2011] [Accepted: 11/29/2011] [Indexed: 12/03/2022]
Abstract
Most patients with acute myeloid leukemia (AML) die from progressive disease after relapse, which is associated with clonal evolution at the cytogenetic level1,2. To determine the mutational spectrum associated with relapse, we sequenced the primary tumor and relapse genomes from 8 AML patients, and validated hundreds of somatic mutations using deep sequencing; this allowed us to precisely define clonality and clonal evolution patterns at relapse. Besides discovering novel, recurrently mutated genes (e.g. WAC, SMC3, DIS3, DDX41, and DAXX) in AML, we found two major clonal evolution patterns during AML relapse: 1) the founding clone in the primary tumor gained mutations and evolved into the relapse clone, or 2) a subclone of the founding clone survived initial therapy, gained additional mutations, and expanded at relapse. In all cases, chemotherapy failed to eradicate the founding clone. The comparison of relapse-specific vs. primary tumor mutations in all 8 cases revealed an increase in transversions, probably due to DNA damage caused by cytotoxic chemotherapy. These data demonstrate that AML relapse is associated with the addition of new mutations and clonal evolution, which is shaped in part by the chemotherapy that the patients receive to establish and maintain remissions.
Collapse
Affiliation(s)
- Li Ding
- The Genome Institute, Washington University, St Louis, Missouri 63108, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
923
|
Gundry M, Vijg J. Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants. Mutat Res 2012; 729:1-15. [PMID: 22016070 PMCID: PMC3237897 DOI: 10.1016/mrfmmm.2011.10.001 10.1016/j.mrfmmm.2011.10.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2011] [Revised: 09/23/2011] [Accepted: 10/05/2011] [Indexed: 08/15/2023]
Abstract
DNA mutations are the source of genetic variation within populations. The majority of mutations with observable effects are deleterious. In humans mutations in the germ line can cause genetic disease. In somatic cells multiple rounds of mutations and selection lead to cancer. The study of genetic variation has progressed rapidly since the completion of the draft sequence of the human genome. Recent advances in sequencing technology, most importantly the introduction of massively parallel sequencing (MPS), have resulted in more than a hundred-fold reduction in the time and cost required for sequencing nucleic acids. These improvements have greatly expanded the use of sequencing as a practical tool for mutation analysis. While in the past the high cost of sequencing limited mutation analysis to selectable markers or small forward mutation targets assumed to be representative for the genome overall, current platforms allow whole genome sequencing for less than $5000. This has already given rise to direct estimates of germline mutation rates in multiple organisms including humans by comparing whole genome sequences between parents and offspring. Here we present a brief history of the field of mutation research, with a focus on classical tools for the measurement of mutation rates. We then review MPS, how it is currently applied and the new insight into human and animal mutation frequencies and spectra that has been obtained from whole genome sequencing. While great progress has been made, we note that the single most important limitation of current MPS approaches for mutation analysis is the inability to address low-abundance mutations that turn somatic tissues into mosaics of cells. Such mutations are at the basis of intra-tumor heterogeneity, with important implications for clinical diagnosis, and could also contribute to somatic diseases other than cancer, including aging. Some possible approaches to gain access to low-abundance mutations are discussed, with a brief overview of new sequencing platforms that are currently waiting in the wings to advance this exploding field even further.
Collapse
Affiliation(s)
- Michael Gundry
- Albert Einstein College of Medicine, Department of Genetics, New York, NY 10461, United States
| | | |
Collapse
|
924
|
Bayés M, Heath S, Gut IG. Applications of second generation sequencing technologies in complex disorders. Curr Top Behav Neurosci 2012; 12:321-343. [PMID: 22331695 DOI: 10.1007/7854_2011_196] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Second generation sequencing (2ndGS) technologies generate unprecedented amounts of sequence data very rapidly and at relatively limited costs, allowing the sequence of a human genome to be completed in a few weeks. The principle is on the basis of generating millions of relatively short reads from amplified single DNA fragments using iterative cycles of nucleotide extensions. However, the data generated on this scale present new challenges in interpretation, data analysis and data management. 2ndGS technologies are becoming widespread and are profoundly impacting biomedical research. Common applications include whole-genome sequencing, target resequencing, characterization of structural and copy number variation, profiling epigenetic modifications, transcriptome sequencing and identification of infectious agents. New methodologies and instruments that will enable to sequence the complete human genome in less than a day at a cost of less than $1,000 are currently in development.
Collapse
Affiliation(s)
- Mònica Bayés
- Centro Nacional de Análisis Genómico, C/Baldiri Reixac 4, 08028, Barcelona, Spain,
| | | | | |
Collapse
|
925
|
Marroni F, Pinosio S, Morgante M. The quest for rare variants: pooled multiplexed next generation sequencing in plants. FRONTIERS IN PLANT SCIENCE 2012; 3:133. [PMID: 22754557 PMCID: PMC3384946 DOI: 10.3389/fpls.2012.00133] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2012] [Accepted: 06/04/2012] [Indexed: 05/08/2023]
Abstract
Next generation sequencing (NGS) instruments produce an unprecedented amount of sequence data at contained costs. This gives researchers the possibility of designing studies with adequate power to identify rare variants at a fraction of the economic and labor resources required by individual Sanger sequencing. As of today, few research groups working in plant sciences have exploited this potentiality, showing that pooled NGS provides results in excellent agreement with those obtained by individual Sanger sequencing. The aim of this review is to convey to the reader the general ideas underlying the use of pooled NGS for the identification of rare variants. To facilitate a thorough understanding of the possibilities of the method, we will explain in detail the possible experimental and analytical approaches and discuss their advantages and disadvantages. We will show that information on allele frequency obtained by pooled NGS can be used to accurately compute basic population genetics indexes such as allele frequency, nucleotide diversity, and Tajima's D. Finally, we will discuss applications and future perspectives of the multiplexed NGS approach.
Collapse
Affiliation(s)
- Fabio Marroni
- Istituto di Genomica Applicata,Udine, Italy
- *Correspondence: Fabio Marroni, Istituto di Genomica Applicata, Via J. Linussio 51, 33100 Udine, Italy. e-mail:
| | - Sara Pinosio
- Istituto di Genomica Applicata,Udine, Italy
- CNR, Istituto di Genetica Vegetale, Sezione di Firenze,Firenze, Italy
| | - Michele Morgante
- Istituto di Genomica Applicata,Udine, Italy
- Dipartimento di Scienze Agrarie e Ambientali, Università di Udine,Udine, Italy
| |
Collapse
|
926
|
Next-generation sequencing reveals phylogeographic structure and a species tree for recent bird divergences. Mol Phylogenet Evol 2012; 62:397-406. [DOI: 10.1016/j.ympev.2011.10.012] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2011] [Revised: 09/20/2011] [Accepted: 10/15/2011] [Indexed: 02/03/2023]
|
927
|
Abstract
Norovirus (NoV) is an emerging RNA virus that has been associated with global epidemics of gastroenteritis. Each global epidemic arises with the emergence of novel antigenic variants. While the majority of NoV infections are mild and self-limiting, in the young, elderly, and immunocompromised, severe and prolonged illness can result. As yet, there is no vaccine or therapeutic treatment to prevent or control infection. In order to design effective control strategies, it is important to understand the mechanisms and source of the new antigenic variants. In this study, we used next-generation sequencing (NGS) technology to investigate genetic diversification in three contexts: the impact of a NoV transmission event on viral diversity and the contribution to diversity of intrahost evolution over both a short period of time (10 days), in accordance with a typical acute NoV infection, and a prolonged period of time (288 days), as observed for NoV chronic infections of immunocompromised individuals. Investigations of the transmission event revealed that minor variants at frequencies as low as 0.01% were successfully transmitted, indicating that transmission is an important source of diversity at the interhost level of NoV evolution. Our results also suggest that chronically infected immunocompromised subjects represent a potential reservoir for the emergence of new viral variants. In contrast, in a typical acute NoV infection, the viral population was highly homogenous and relatively stable. These results indicate that the evolution of NoV occurs through multiple mechanisms.
Collapse
|
928
|
Recurrent mutations in the U2AF1 splicing factor in myelodysplastic syndromes. Nat Genet 2011; 44:53-7. [PMID: 22158538 PMCID: PMC3247063 DOI: 10.1038/ng.1031] [Citation(s) in RCA: 482] [Impact Index Per Article: 34.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2011] [Accepted: 11/09/2011] [Indexed: 12/14/2022]
Abstract
Myelodysplastic syndromes (MDS) are hematopoietic stem cell disorders that often progress to chemotherapy-resistant secondary acute myeloid leukemia (sAML). We used whole-genome sequencing to perform an unbiased comprehensive screen to discover the somatic mutations in a sample from an individual with sAML and genotyped the loci containing these mutations in the matched MDS sample. Here we show that a missense mutation affecting the serine at codon 34 (Ser34) in U2AF1 was recurrently present in 13 out of 150 (8.7%) subjects with de novo MDS, and we found suggestive evidence of an increased risk of progression to sAML associated with this mutation. U2AF1 is a U2 auxiliary factor protein that recognizes the AG splice acceptor dinucleotide at the 3' end of introns, and the alterations in U2AF1 are located in highly conserved zinc fingers of this protein. Mutant U2AF1 promotes enhanced splicing and exon skipping in reporter assays in vitro. This previously unidentified, recurrent mutation in U2AF1 implicates altered pre-mRNA splicing as a potential mechanism for MDS pathogenesis.
Collapse
|
929
|
Larson DE, Harris CC, Chen K, Koboldt DC, Abbott TE, Dooling DJ, Ley TJ, Mardis ER, Wilson RK, Ding L. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. ACTA ACUST UNITED AC 2011; 28:311-7. [PMID: 22155872 DOI: 10.1093/bioinformatics/btr665] [Citation(s) in RCA: 461] [Impact Index Per Article: 32.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION The sequencing of tumors and their matched normals is frequently used to study the genetic composition of cancer. Despite this fact, there remains a dearth of available software tools designed to compare sequences in pairs of samples and identify sites that are likely to be unique to one sample. RESULTS In this article, we describe the mathematical basis of our SomaticSniper software for comparing tumor and normal pairs. We estimate its sensitivity and precision, and present several common sources of error resulting in miscalls. AVAILABILITY AND IMPLEMENTATION Binaries are freely available for download at http://gmt.genome.wustl.edu/somatic-sniper/current/, implemented in C and supported on Linux and Mac OS X. CONTACT delarson@wustl.edu; lding@wustl.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David E Larson
- The Genome Institute, Washington University, St Louis, MO 63108, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
930
|
McElroy K, Luciani F, Hui J, Rice S, Thomas T. Bacteriophage evolution drives Pseudomonas aeruginosa PAO1 biofilm diversification. BMC Bioinformatics 2011. [PMCID: PMC3277246 DOI: 10.1186/1471-2105-12-s11-a2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
931
|
Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat Rev Genet 2011; 13:36-46. [PMID: 22124482 DOI: 10.1038/nrg3117] [Citation(s) in RCA: 1122] [Impact Index Per Article: 80.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Repetitive DNA sequences are abundant in a broad range of species, from bacteria to mammals, and they cover nearly half of the human genome. Repeats have always presented technical challenges for sequence alignment and assembly programs. Next-generation sequencing projects, with their short read lengths and high data volumes, have made these challenges more difficult. From a computational perspective, repeats create ambiguities in alignment and assembly, which, in turn, can produce biases and errors when interpreting results. Simply ignoring repeats is not an option, as this creates problems of its own and may mean that important biological phenomena are missed. We discuss the computational problems surrounding repeats and describe strategies used by current bioinformatics systems to solve them.
Collapse
|
932
|
Williams LE, Wernegreen JJ. Purifying selection, sequence composition, and context-specific indel mutations shape intraspecific variation in a bacterial endosymbiont. Genome Biol Evol 2011; 4:44-51. [PMID: 22117087 PMCID: PMC3268670 DOI: 10.1093/gbe/evr128] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Comparative genomics of closely related bacterial strains can clarify mutational processes and selective forces that impact genetic variation. Among primary bacterial endosymbionts of insects, such analyses have revealed ongoing genome reduction, raising questions about the ultimate evolutionary fate of these partnerships. Here, we explored genomic variation within Blochmannia vafer, an obligate mutualist of the ant Camponotus vafer. Polymorphism analysis of the Illumina data set used previously for de novo assembly revealed a second Bl. vafer genotype. To determine why a single ant colony contained two symbiont genotypes, we examined polymorphisms in 12 C. vafer mitochondrial sequences assembled from the Illumina data; the spectrum of variants suggests that the colony contained two maternal lineages, each harboring a distinct Bl. vafer genotype. Comparing the two Bl. vafer genotypes revealed that purifying selection purged most indels and nonsynonymous differences from protein-coding genes. We also discovered that indels occur frequently in multimeric simple sequence repeats, which are relatively abundant in Bl. vafer and may play a more substantial role in generating variation in this ant mutualist than in the aphid endosymbiont Buchnera. Finally, we explored how an apparent relocation of the origin of replication in Bl. vafer and the resulting shift in strand-associated mutational pressures may have caused accelerated gene loss and an elevated rate of indel polymorphisms in the region spanning the origin relocation. Combined, these results point to significant impacts of purifying selection on genomic polymorphisms as well as distinct patterns of indels associated with unusual genomic features of Blochmannia.
Collapse
Affiliation(s)
| | - Jennifer J. Wernegreen
- Institute for Genome Sciences and Policy, Duke University
- Nicholas School of the Environment, Duke University
- Corresponding author: E-mail:
| |
Collapse
|
933
|
Depledge DP, Palser AL, Watson SJ, Lai IYC, Gray ER, Grant P, Kanda RK, Leproust E, Kellam P, Breuer J. Specific capture and whole-genome sequencing of viruses from clinical samples. PLoS One 2011; 6:e27805. [PMID: 22125625 PMCID: PMC3220689 DOI: 10.1371/journal.pone.0027805] [Citation(s) in RCA: 167] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2011] [Accepted: 10/25/2011] [Indexed: 11/29/2022] Open
Abstract
Whole genome sequencing of viruses directly from clinical samples is integral for understanding the genetics of host-virus interactions. Here, we report the use of sample sparing target enrichment (by hybridisation) for viral nucleic acid separation and deep-sequencing of herpesvirus genomes directly from a range of clinical samples including saliva, blood, virus vesicles, cerebrospinal fluid, and tumour cell lines. We demonstrate the effectiveness of the method by deep-sequencing 13 highly cell-associated human herpesvirus genomes and generating full length genome alignments at high read depth. Moreover, we show the specificity of the method enables the study of viral population structures and their diversity within a range of clinical samples types.
Collapse
Affiliation(s)
- Daniel P Depledge
- Division of Infection and Immunity, University College London, London, United Kingdom.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
934
|
Angeloni F, Wagemaker N, Vergeer P, Ouborg J. Genomic toolboxes for conservation biologists. Evol Appl 2011; 5:130-43. [PMID: 25568036 PMCID: PMC3353346 DOI: 10.1111/j.1752-4571.2011.00217.x] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2011] [Accepted: 10/18/2011] [Indexed: 12/01/2022] Open
Abstract
Conservation genetics is expanding its research horizon with a genomic approach, by incorporating the modern techniques of next-generation sequencing (NGS). Application of NGS overcomes many limitations of conservation genetics. First, NGS allows for genome-wide screening of markers, which may lead to a more representative estimation of genetic variation within and between populations. Second, NGS allows for distinction between neutral and non-neutral markers. By screening populations on thousands of single nucleotide polymorphism markers, signals of selection can be found for some markers. Variation in these markers will give insight into functional rather than neutral genetic variation. Third, NGS facilitates the study of gene expression. Conservation genomics will increase our insight in how the environment and genes interact to affect phenotype and fitness. In addition, the NGS approach opens a way to study processes such as inbreeding depression and local adaptation mechanistically. Conservation genetics programs are directed to a fundamental understanding of the processes involved in conservation genetics and should preferably be started in species for which large databases on ecology, demography and genetics are available. Here, we describe and illustrate the connection between the application of NGS technologies and the research questions in conservation. The perspectives of conservation genomics programs are also discussed.
Collapse
Affiliation(s)
- Francesco Angeloni
- Institute for Water and Wetland Research (IWWR), Department of Molecular Ecology, Radboud University Nijmegen AJ Nijmegen, The Netherlands
| | - Niels Wagemaker
- Institute for Water and Wetland Research (IWWR), Department of Molecular Ecology, Radboud University Nijmegen AJ Nijmegen, The Netherlands
| | - Philippine Vergeer
- Institute for Water and Wetland Research (IWWR), Department of Molecular Ecology, Radboud University Nijmegen AJ Nijmegen, The Netherlands
| | - Joop Ouborg
- Institute for Water and Wetland Research (IWWR), Department of Molecular Ecology, Radboud University Nijmegen AJ Nijmegen, The Netherlands
| |
Collapse
|
935
|
Gundry M, Li W, Maqbool SB, Vijg J. Direct, genome-wide assessment of DNA mutations in single cells. Nucleic Acids Res 2011; 40:2032-40. [PMID: 22086961 PMCID: PMC3300019 DOI: 10.1093/nar/gkr949] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
DNA mutations are the inevitable consequences of errors that arise during replication and repair of DNA damage. Because of their random and infrequent occurrence, quantification and characterization of DNA mutations in the genome of somatic cells has been difficult. Random, low-abundance mutations are currently inaccessible by standard high-throughput sequencing approaches because they cannot be distinguished from sequencing errors. One way to circumvent this problem and simultaneously account for the mutational heterogeneity within tissues is whole genome sequencing of a representative number of single cells. Here, we show elevated mutation levels in single cells from Drosophila melanogaster S2 and mouse embryonic fibroblast populations after treatment with the powerful mutagen N-ethyl-N-nitrosourea. This method can be applied as a direct measure of exposure to mutagenic agents and for assessing genotypic heterogeneity within tissues or cell populations.
Collapse
Affiliation(s)
- Michael Gundry
- Department of Genetics, Albert Einstein College of Medicine, New York, NY 10461, USA
| | | | | | | |
Collapse
|
936
|
Ding J, Bashashati A, Roth A, Oloumi A, Tse K, Zeng T, Haffari G, Hirst M, Marra MA, Condon A, Aparicio S, Shah SP. Feature-based classifiers for somatic mutation detection in tumour-normal paired sequencing data. ACTA ACUST UNITED AC 2011; 28:167-75. [PMID: 22084253 PMCID: PMC3259434 DOI: 10.1093/bioinformatics/btr629] [Citation(s) in RCA: 105] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Motivation: The study of cancer genomes now routinely involves using next-generation sequencing technology (NGS) to profile tumours for single nucleotide variant (SNV) somatic mutations. However, surprisingly few published bioinformatics methods exist for the specific purpose of identifying somatic mutations from NGS data and existing tools are often inaccurate, yielding intolerably high false prediction rates. As such, the computational problem of accurately inferring somatic mutations from paired tumour/normal NGS data remains an unsolved challenge. Results: We present the comparison of four standard supervised machine learning algorithms for the purpose of somatic SNV prediction in tumour/normal NGS experiments. To evaluate these approaches (random forest, Bayesian additive regression tree, support vector machine and logistic regression), we constructed 106 features representing 3369 candidate somatic SNVs from 48 breast cancer genomes, originally predicted with naive methods and subsequently revalidated to establish ground truth labels. We trained the classifiers on this data (consisting of 1015 true somatic mutations and 2354 non-somatic mutation positions) and conducted a rigorous evaluation of these methods using a cross-validation framework and hold-out test NGS data from both exome capture and whole genome shotgun platforms. All learning algorithms employing predictive discriminative approaches with feature selection improved the predictive accuracy over standard approaches by statistically significant margins. In addition, using unsupervised clustering of the ground truth ‘false positive’ predictions, we noted several distinct classes and present evidence suggesting non-overlapping sources of technical artefacts illuminating important directions for future study. Availability: Software called MutationSeq and datasets are available from http://compbio.bccrc.ca. Contact:saparicio@bccrc.ca Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiarui Ding
- Department of Molecular Oncology, BC Cancer Agency, Vancouver, BC, Canada
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
937
|
Renaud G, Neves P, Folador EL, Ferreira CG, Passetti F. Segtor: rapid annotation of genomic coordinates and single nucleotide variations using segment trees. PLoS One 2011; 6:e26715. [PMID: 22069465 PMCID: PMC3206052 DOI: 10.1371/journal.pone.0026715] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2011] [Accepted: 10/03/2011] [Indexed: 12/28/2022] Open
Abstract
Various research projects often involve determining the relative position of genomic coordinates, intervals, single nucleotide variations (SNVs), insertions, deletions and translocations with respect to genes and their potential impact on protein translation. Due to the tremendous increase in throughput brought by the use of next-generation sequencing, investigators are routinely faced with the need to annotate very large datasets. We present Segtor, a tool to annotate large sets of genomic coordinates, intervals, SNVs, indels and translocations. Our tool uses segment trees built using the start and end coordinates of the genomic features the user wishes to use instead of storing them in a database management system. The software also produces annotation statistics to allow users to visualize how many coordinates were found within various portions of genes. Our system currently can be made to work with any species available on the UCSC Genome Browser. Segtor is a suitable tool for groups, especially those with limited access to programmers or with interest to analyze large amounts of individual genomes, who wish to determine the relative position of very large sets of mapped reads and subsequently annotate observed mutations between the reads and the reference. Segtor (http://lbbc.inca.gov.br/segtor/) is an open-source tool that can be freely downloaded for non-profit use. We also provide a web interface for testing purposes.
Collapse
Affiliation(s)
- Gabriel Renaud
- Bioinformatics Unit, Clinical Research Coordination, Instituto Nacional de Cancer (INCA), Centro, Rio de Janeiro, Brazil
| | - Pedro Neves
- Bioinformatics Unit, Clinical Research Coordination, Instituto Nacional de Cancer (INCA), Centro, Rio de Janeiro, Brazil
| | - Edson Luiz Folador
- Bioinformatics Unit, Clinical Research Coordination, Instituto Nacional de Cancer (INCA), Centro, Rio de Janeiro, Brazil
| | - Carlos Gil Ferreira
- Clinical Research Coordination, Instituto Nacional de Cancer (INCA), Centro, Rio de Janeiro, Brazil
| | - Fabio Passetti
- Bioinformatics Unit, Clinical Research Coordination, Instituto Nacional de Cancer (INCA), Centro, Rio de Janeiro, Brazil
- * E-mail:
| |
Collapse
|
938
|
Day-Williams AG, McLay K, Drury E, Edkins S, Coffey AJ, Palotie A, Zeggini E. An evaluation of different target enrichment methods in pooled sequencing designs for complex disease association studies. PLoS One 2011; 6:e26279. [PMID: 22069447 PMCID: PMC3206031 DOI: 10.1371/journal.pone.0026279] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2011] [Accepted: 09/23/2011] [Indexed: 01/27/2023] Open
Abstract
Pooled sequencing can be a cost-effective approach to disease variant discovery, but its applicability in association studies remains unclear. We compare sequence enrichment methods coupled to next-generation sequencing in non-indexed pools of 1, 2, 10, 20 and 50 individuals and assess their ability to discover variants and to estimate their allele frequencies. We find that pooled resequencing is most usefully applied as a variant discovery tool due to limitations in estimating allele frequency with high enough accuracy for association studies, and that in-solution hybrid-capture performs best among the enrichment methods examined regardless of pool size.
Collapse
Affiliation(s)
- Aaron G. Day-Williams
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Kirsten McLay
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
- The Genome Analysis Centre, Norwich, United Kingdom
| | - Eleanor Drury
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Sarah Edkins
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Alison J. Coffey
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Aarno Palotie
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Program in Medical and Population Genetics and Genetic Analysis Platform, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Department of Medical Genetics, University of Helsinki and University Central Hospital, Helsinki, Finland
| | - Eleftheria Zeggini
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| |
Collapse
|
939
|
Altmann A, Weber P, Quast C, Rex-Haffner M, Binder EB, Müller-Myhsok B. vipR: variant identification in pooled DNA using R. Bioinformatics 2011; 27:i77-84. [PMID: 21685105 PMCID: PMC3117388 DOI: 10.1093/bioinformatics/btr205] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Motivation: High-throughput-sequencing (HTS) technologies are the method of choice for screening the human genome for rare sequence variants causing susceptibility to complex diseases. Unfortunately, preparation of samples for a large number of individuals is still very cost- and labor intensive. Thus, recently, screens for rare sequence variants were carried out in samples of pooled DNA, in which equimolar amounts of DNA from multiple individuals are mixed prior to sequencing with HTS. The resulting sequence data, however, poses a bioinformatics challenge: the discrimination of sequencing errors from real sequence variants present at a low frequency in the DNA pool. Results: Our method vipR uses data from multiple DNA pools in order to compensate for differences in sequencing error rates along the sequenced region. More precisely, instead of aiming at discriminating sequence variants from sequencing errors, vipR identifies sequence positions that exhibit significantly different minor allele frequencies in at least two DNA pools using the Skellam distribution. The performance of vipR was compared with three other models on data from a targeted resequencing study of the TMEM132D locus in 600 individuals distributed over four DNA pools. Performance of the methods was computed on SNPs that were also genotyped individually using a MALDI-TOF technique. On a set of 82 sequence variants, vipR achieved an average sensitivity of 0.80 at an average specificity of 0.92, thus outperforming the reference methods by at least 0.17 in specificity at comparable sensitivity. Availability: The code of vipR is freely available via: http://sourceforge.net/projects/htsvipr/ Contact:altmann@mpipsykl.mpg.de
Collapse
Affiliation(s)
- Andre Altmann
- Department of Statistical Genetics, Max Planck Institute of Psychiatry, Munich, Germany.
| | | | | | | | | | | |
Collapse
|
940
|
Flaherty P, Natsoulis G, Muralidharan O, Winters M, Buenrostro J, Bell J, Brown S, Holodniy M, Zhang N, Ji HP. Ultrasensitive detection of rare mutations using next-generation targeted resequencing. Nucleic Acids Res 2011; 40:e2. [PMID: 22013163 PMCID: PMC3245950 DOI: 10.1093/nar/gkr861] [Citation(s) in RCA: 103] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
With next-generation DNA sequencing technologies, one can interrogate a specific genomic region of interest at very high depth of coverage and identify less prevalent, rare mutations in heterogeneous clinical samples. However, the mutation detection levels are limited by the error rate of the sequencing technology as well as by the availability of variant-calling algorithms with high statistical power and low false positive rates. We demonstrate that we can robustly detect mutations at 0.1% fractional representation. This represents accurate detection of one mutant per every 1000 wild-type alleles. To achieve this sensitive level of mutation detection, we integrate a high accuracy indexing strategy and reference replication for estimating sequencing error variance. We employ a statistical model to estimate the error rate at each position of the reference and to quantify the fraction of variant base in the sample. Our method is highly specific (99%) and sensitive (100%) when applied to a known 0.1% sample fraction admixture of two synthetic DNA samples to validate our method. As a clinical application of this method, we analyzed nine clinical samples of H1N1 influenza A and detected an oseltamivir (antiviral therapy) resistance mutation in the H1N1 neuraminidase gene at a sample fraction of 0.18%.
Collapse
Affiliation(s)
- Patrick Flaherty
- Stanford Genome Technology Center, Stanford University, Palo Alto, CA 94304, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
941
|
Gundry M, Vijg J. Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants. Mutat Res 2011; 729:1-15. [PMID: 22016070 DOI: 10.1016/j.mrfmmm.2011.10.001] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2011] [Revised: 09/23/2011] [Accepted: 10/05/2011] [Indexed: 12/20/2022]
Abstract
DNA mutations are the source of genetic variation within populations. The majority of mutations with observable effects are deleterious. In humans mutations in the germ line can cause genetic disease. In somatic cells multiple rounds of mutations and selection lead to cancer. The study of genetic variation has progressed rapidly since the completion of the draft sequence of the human genome. Recent advances in sequencing technology, most importantly the introduction of massively parallel sequencing (MPS), have resulted in more than a hundred-fold reduction in the time and cost required for sequencing nucleic acids. These improvements have greatly expanded the use of sequencing as a practical tool for mutation analysis. While in the past the high cost of sequencing limited mutation analysis to selectable markers or small forward mutation targets assumed to be representative for the genome overall, current platforms allow whole genome sequencing for less than $5000. This has already given rise to direct estimates of germline mutation rates in multiple organisms including humans by comparing whole genome sequences between parents and offspring. Here we present a brief history of the field of mutation research, with a focus on classical tools for the measurement of mutation rates. We then review MPS, how it is currently applied and the new insight into human and animal mutation frequencies and spectra that has been obtained from whole genome sequencing. While great progress has been made, we note that the single most important limitation of current MPS approaches for mutation analysis is the inability to address low-abundance mutations that turn somatic tissues into mosaics of cells. Such mutations are at the basis of intra-tumor heterogeneity, with important implications for clinical diagnosis, and could also contribute to somatic diseases other than cancer, including aging. Some possible approaches to gain access to low-abundance mutations are discussed, with a brief overview of new sequencing platforms that are currently waiting in the wings to advance this exploding field even further.
Collapse
Affiliation(s)
- Michael Gundry
- Albert Einstein College of Medicine, Department of Genetics, New York, NY 10461, United States
| | | |
Collapse
|
942
|
Hamada M, Wijaya E, Frith MC, Asai K. Probabilistic alignments with quality scores: an application to short-read mapping toward accurate SNP/indel detection. ACTA ACUST UNITED AC 2011; 27:3085-92. [PMID: 21976422 DOI: 10.1093/bioinformatics/btr537] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
MOTIVATION Recent studies have revealed the importance of considering quality scores of reads generated by next-generation sequence (NGS) platforms in various downstream analyses. It is also known that probabilistic alignments based on marginal probabilities (e.g. aligned-column and/or gap probabilities) provide more accurate alignment than conventional maximum score-based alignment. There exists, however, no study about probabilistic alignment that considers quality scores explicitly, although the method is expected to be useful in SNP/indel callers and bisulfite mapping, because accurate estimation of aligned columns or gaps is important in those analyses. RESULTS In this study, we propose methods of probabilistic alignment that consider quality scores of (one of) the sequences as well as a usual score matrix. The method is based on posterior decoding techniques in which various marginal probabilities are computed from a probabilistic model of alignments with quality scores, and can arbitrarily trade-off sensitivity and positive predictive value (PPV) of prediction (aligned columns and gaps). The method is directly applicable to read mapping (alignment) toward accurate detection of SNPs and indels. Several computational experiments indicated that probabilistic alignments can estimate aligned columns and gaps accurately, compared with other mapping algorithms e.g. SHRiMP2, Stampy, BWA and Novoalign. The study also suggested that our approach yields favorable precision for SNP/indel calling.
Collapse
Affiliation(s)
- Michiaki Hamada
- Graduate School of Frontier Sciences, University of Tokyo, Kashiwa 277-8562, Japan.
| | | | | | | |
Collapse
|
943
|
Wei Z, Wang W, Hu P, Lyon GJ, Hakonarson H. SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data. Nucleic Acids Res 2011; 39:e132. [PMID: 21813454 PMCID: PMC3201884 DOI: 10.1093/nar/gkr599] [Citation(s) in RCA: 187] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2011] [Revised: 06/30/2011] [Accepted: 07/06/2011] [Indexed: 11/12/2022] Open
Abstract
We develop a statistical tool SNVer for calling common and rare variants in analysis of pooled or individual next-generation sequencing (NGS) data. We formulate variant calling as a hypothesis testing problem and employ a binomial-binomial model to test the significance of observed allele frequency against sequencing error. SNVer reports one single overall P-value for evaluating the significance of a candidate locus being a variant based on which multiplicity control can be obtained. This is particularly desirable because tens of thousands loci are simultaneously examined in typical NGS experiments. Each user can choose the false-positive error rate threshold he or she considers appropriate, instead of just the dichotomous decisions of whether to 'accept or reject the candidates' provided by most existing methods. We use both simulated data and real data to demonstrate the superior performance of our program in comparison with existing methods. SNVer runs very fast and can complete testing 300 K loci within an hour. This excellent scalability makes it feasible for analysis of whole-exome sequencing data, or even whole-genome sequencing data using high performance computing cluster. SNVer is freely available at http://snver.sourceforge.net/.
Collapse
Affiliation(s)
- Zhi Wei
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 08540, USA.
| | | | | | | | | |
Collapse
|
944
|
Glusman G, Caballero J, Mauldin DE, Hood L, Roach JC. Kaviar: an accessible system for testing SNV novelty. ACTA ACUST UNITED AC 2011; 27:3216-7. [PMID: 21965822 DOI: 10.1093/bioinformatics/btr540] [Citation(s) in RCA: 168] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
SUMMARY With the rapidly expanding availability of data from personal genomes, exomes and transcriptomes, medical researchers will frequently need to test whether observed genomic variants are novel or known. This task requires downloading and handling large and diverse datasets from a variety of sources, and processing them with bioinformatics tools and pipelines. Alternatively, researchers can upload data to online tools, which may conflict with privacy requirements. We present here Kaviar, a tool that greatly simplifies the assessment of novel variants. Kaviar includes: (i) an integrated and growing database of genomic variation from diverse sources, including over 55 million variants from personal genomes, family genomes, transcriptomes, SNV databases and population surveys; and (ii) software for querying the database efficiently.
Collapse
|
945
|
Saintenac C, Jiang D, Akhunov ED. Targeted analysis of nucleotide and copy number variation by exon capture in allotetraploid wheat genome. Genome Biol 2011; 12:R88. [PMID: 21917144 PMCID: PMC3308051 DOI: 10.1186/gb-2011-12-9-r88] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2011] [Revised: 08/01/2011] [Accepted: 09/14/2011] [Indexed: 11/30/2022] Open
Abstract
Background The ability of grass species to adapt to various habitats is attributed to the dynamic nature of their genomes, which have been shaped by multiple rounds of ancient and recent polyploidization. To gain a better understanding of the nature and extent of variation in functionally relevant regions of a polyploid genome, we developed a sequence capture assay to compare exonic sequences of allotetraploid wheat accessions. Results A sequence capture assay was designed for the targeted re-sequencing of 3.5 Mb exon regions that surveyed a total of 3,497 genes from allotetraploid wheat. These data were used to describe SNPs, copy number variation and homoeologous sequence divergence in coding regions. A procedure for variant discovery in the polyploid genome was developed and experimentally validated. About 1% and 24% of discovered SNPs were loss-of-function and non-synonymous mutations, respectively. Under-representation of replacement mutations was identified in several groups of genes involved in translation and metabolism. Gene duplications were predominant in a cultivated wheat accession, while more gene deletions than duplications were identified in wild wheat. Conclusions We demonstrate that, even though the level of sequence similarity between targeted polyploid genomes and capture baits can bias enrichment efficiency, exon capture is a powerful approach for variant discovery in polyploids. Our results suggest that allopolyploid wheat can accumulate new variation in coding regions at a high rate. This process has the potential to broaden functional diversity and generate new phenotypic variation that eventually can play a critical role in the origin of new adaptations and important agronomic traits.
Collapse
Affiliation(s)
- Cyrille Saintenac
- Throckmorton Plant Sciences Center, Kansas State University, Manhattan, KS 66506, USA
| | | | | |
Collapse
|
946
|
Saintenac C, Jiang D, Akhunov ED. Targeted analysis of nucleotide and copy number variation by exon capture in allotetraploid wheat genome. Genome Biol 2011. [PMID: 21917144 DOI: 10.1186/gb‐2011‐12‐9‐r88] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND The ability of grass species to adapt to various habitats is attributed to the dynamic nature of their genomes, which have been shaped by multiple rounds of ancient and recent polyploidization. To gain a better understanding of the nature and extent of variation in functionally relevant regions of a polyploid genome, we developed a sequence capture assay to compare exonic sequences of allotetraploid wheat accessions. RESULTS A sequence capture assay was designed for the targeted re-sequencing of 3.5 Mb exon regions that surveyed a total of 3,497 genes from allotetraploid wheat. These data were used to describe SNPs, copy number variation and homoeologous sequence divergence in coding regions. A procedure for variant discovery in the polyploid genome was developed and experimentally validated. About 1% and 24% of discovered SNPs were loss-of-function and non-synonymous mutations, respectively. Under-representation of replacement mutations was identified in several groups of genes involved in translation and metabolism. Gene duplications were predominant in a cultivated wheat accession, while more gene deletions than duplications were identified in wild wheat. CONCLUSIONS We demonstrate that, even though the level of sequence similarity between targeted polyploid genomes and capture baits can bias enrichment efficiency, exon capture is a powerful approach for variant discovery in polyploids. Our results suggest that allopolyploid wheat can accumulate new variation in coding regions at a high rate. This process has the potential to broaden functional diversity and generate new phenotypic variation that eventually can play a critical role in the origin of new adaptations and important agronomic traits.
Collapse
Affiliation(s)
- Cyrille Saintenac
- Throckmorton Plant Sciences Center, Kansas State University, Manhattan, KS 66506, USA
| | | | | |
Collapse
|
947
|
Dickinson RE, Griffin H, Bigley V, Reynard LN, Hussain R, Haniffa M, Lakey JH, Rahman T, Wang XN, McGovern N, Pagan S, Cookson S, McDonald D, Chua I, Wallis J, Cant A, Wright M, Keavney B, Chinnery PF, Loughlin J, Hambleton S, Santibanez-Koref M, Collin M. Exome sequencing identifies GATA-2 mutation as the cause of dendritic cell, monocyte, B and NK lymphoid deficiency. Blood 2011; 118:2656-8. [PMID: 21765025 PMCID: PMC5137783 DOI: 10.1182/blood-2011-06-360313] [Citation(s) in RCA: 341] [Impact Index Per Article: 24.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The human syndrome of dendritic cell, monocyte, B and natural killer lymphoid deficiency presents as a sporadic or autosomal dominant trait causing susceptibility to mycobacterial and other infections, predisposition to myelodysplasia and leukemia, and, in some cases, pulmonary alveolar proteinosis. Seeking a genetic cause, we sequenced the exomes of 4 unrelated persons, 3 with sporadic disease, looking for novel, heterozygous, and probably deleterious variants. A number of genes harbored novel variants in person, but only one gene, GATA2, was mutated in all 4 persons. Each person harbored a different mutation, but all were predicted to be highly deleterious and to cause loss or mutation of the C-terminal zinc finger domain. Because GATA2 is the only common mutated gene in 4 unrelated persons, it is highly probable to be the cause of dendritic cell, monocyte, B, and natural killer lymphoid deficiency. This disorder therefore constitutes a new genetic form of heritable immunodeficiency and leukemic transformation.
Collapse
Affiliation(s)
- Rachel Emma Dickinson
- Institute of Cellular Medicine, International Centre for Life, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Helen Griffin
- Institute of Genetic Medicine, International Centre for Life, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Venetia Bigley
- Institute of Cellular Medicine, International Centre for Life, Newcastle University, Newcastle upon Tyne, United Kingdom
- Newcastle upon Tyne Hospitals National Health Service Foundation Trust, Newcastle upon Tyne, United Kingdom
| | - Louise N. Reynard
- Institute of Cellular Medicine, International Centre for Life, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Rafiqul Hussain
- Institute of Genetic Medicine, International Centre for Life, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Muzlifah Haniffa
- Institute of Cellular Medicine, International Centre for Life, Newcastle University, Newcastle upon Tyne, United Kingdom
- Newcastle upon Tyne Hospitals National Health Service Foundation Trust, Newcastle upon Tyne, United Kingdom
| | - Jeremy H. Lakey
- Institute of Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Thahira Rahman
- Institute of Genetic Medicine, International Centre for Life, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Xiao-Nong Wang
- Institute of Cellular Medicine, International Centre for Life, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Naomi McGovern
- Institute of Cellular Medicine, International Centre for Life, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Sarah Pagan
- Institute of Cellular Medicine, International Centre for Life, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Sharon Cookson
- Institute of Cellular Medicine, International Centre for Life, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - David McDonald
- Institute of Cellular Medicine, International Centre for Life, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Ignatius Chua
- Univerity College London Centre for Immunodeficiency, Royal Free Hospital, London, United Kingdom
| | - Jonathan Wallis
- Newcastle upon Tyne Hospitals National Health Service Foundation Trust, Newcastle upon Tyne, United Kingdom
| | - Andrew Cant
- Institute of Cellular Medicine, International Centre for Life, Newcastle University, Newcastle upon Tyne, United Kingdom
- Newcastle upon Tyne Hospitals National Health Service Foundation Trust, Newcastle upon Tyne, United Kingdom
| | - Michael Wright
- Institute of Genetic Medicine, International Centre for Life, Newcastle University, Newcastle upon Tyne, United Kingdom
- Newcastle upon Tyne Hospitals National Health Service Foundation Trust, Newcastle upon Tyne, United Kingdom
| | - Bernard Keavney
- Institute of Genetic Medicine, International Centre for Life, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Patrick F. Chinnery
- Institute of Genetic Medicine, International Centre for Life, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - John Loughlin
- Institute of Cellular Medicine, International Centre for Life, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Sophie Hambleton
- Institute of Cellular Medicine, International Centre for Life, Newcastle University, Newcastle upon Tyne, United Kingdom
- Newcastle upon Tyne Hospitals National Health Service Foundation Trust, Newcastle upon Tyne, United Kingdom
| | - Mauro Santibanez-Koref
- Institute of Genetic Medicine, International Centre for Life, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Matthew Collin
- Institute of Cellular Medicine, International Centre for Life, Newcastle University, Newcastle upon Tyne, United Kingdom
- Newcastle upon Tyne Hospitals National Health Service Foundation Trust, Newcastle upon Tyne, United Kingdom
| |
Collapse
|
948
|
Sequential bottlenecks drive viral evolution in early acute hepatitis C virus infection. PLoS Pathog 2011; 7:e1002243. [PMID: 21912520 PMCID: PMC3164670 DOI: 10.1371/journal.ppat.1002243] [Citation(s) in RCA: 180] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2011] [Accepted: 07/12/2011] [Indexed: 02/07/2023] Open
Abstract
Hepatitis C is a pandemic human RNA virus, which commonly causes chronic infection and liver disease. The characterization of viral populations that successfully initiate infection, and also those that drive progression to chronicity is instrumental for understanding pathogenesis and vaccine design. A comprehensive and longitudinal analysis of the viral population was conducted in four subjects followed from very early acute infection to resolution of disease outcome. By means of next generation sequencing (NGS) and standard cloning/Sanger sequencing, genetic diversity and viral variants were quantified over the course of the infection at frequencies as low as 0.1%. Phylogenetic analysis of reassembled viral variants revealed acute infection was dominated by two sequential bottleneck events, irrespective of subsequent chronicity or clearance. The first bottleneck was associated with transmission, with one to two viral variants successfully establishing infection. The second occurred approximately 100 days post-infection, and was characterized by a decline in viral diversity. In the two subjects who developed chronic infection, this second bottleneck was followed by the emergence of a new viral population, which evolved from the founder variants via a selective sweep with fixation in a small number of mutated sites. The diversity at sites with non-synonymous mutation was higher in predicted cytotoxic T cell epitopes, suggesting immune-driven evolution. These results provide the first detailed analysis of early within-host evolution of HCV, indicating strong selective forces limit viral evolution in the acute phase of infection. Primary hepatitis C (HCV) infection is typically asymptomatic and commonly results in persistent infection. The characteristics of early infection remain undefined. Four subjects were studied longitudinally from within a few weeks of transmission until resolution of outcome, via a full genome analysis of viral evolution. In the acute phase (<100 days post-infection) there were two periods with a major reduction in genetic diversity (i.e. a bottleneck) irrespective of subsequent clearance (n = 2) or chronic infection (n = 2). The first bottleneck was associated with transmission, with generally only one ‘founder’ virus successfully establishing infection. The second occurred following the primary peak in viraemia, concomitant with seroconversion, approximately 100 days post-infection. In the subjects who became chronically infected, the second bottleneck was followed by emergence of a new cluster of variants, which evolved from the founder(s), and carried only a small number of mutated residues that reached fixation. Some fixations occurred in known targets of CD8 cytotoxic T cell and neutralizing antibody responses. These results indicate a common evolutionary pattern, independent of disease outcome in the acute phase of HCV infection, with strong signatures of selective pressures driving the transition into chronic infection. These novel data will inform preventative vaccine strategies.
Collapse
|
949
|
Pérez-Enciso M, Ferretti L. Massive parallel sequencing in animal genetics: wherefroms and wheretos. Anim Genet 2011; 41:561-9. [PMID: 20477787 DOI: 10.1111/j.1365-2052.2010.02057.x] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Next generation sequencing (NGS) has revolutionized genomics research, making it difficult to overstate its impact on studies of Biology. NGS will immediately allow researchers working in non-mainstream species to obtain complete genomes together with a comprehensive catalogue of variants. In addition, RNA-seq will be a decisive way to annotate genes that cannot be predicted purely by computational or comparative approaches. Future applications include whole genome sequence association studies, as opposed to classical SNP-based association, and implementing this new source of information into breeding programmes. For these purposes, one of the main advantages of sequencing vs. genotyping is the possibility of identifying copy number variants. Currently, experimental design is a topic of utmost interest, and here we discuss some of the options available, including pools and reduced representation libraries. Although bioinformatics is still an important bottleneck, this limitation is only transient and should not deter animal geneticists from embracing these technologies.
Collapse
Affiliation(s)
- M Pérez-Enciso
- Departament de Ciència Animal i dels Aliments, Facultat de Veterinària, Universitat Autònoma de Barcelona, Bellaterra, Spain.
| | | |
Collapse
|
950
|
Frequent mutations of chromatin remodeling genes in transitional cell carcinoma of the bladder. Nat Genet 2011; 43:875-8. [PMID: 21822268 DOI: 10.1038/ng.907] [Citation(s) in RCA: 590] [Impact Index Per Article: 42.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2011] [Accepted: 07/15/2011] [Indexed: 12/13/2022]
Abstract
Transitional cell carcinoma (TCC) is the most common type of bladder cancer. Here we sequenced the exomes of nine individuals with TCC and screened all the somatically mutated genes in a prevalence set of 88 additional individuals with TCC with different tumor stages and grades. In our study, we discovered a variety of genes previously unknown to be mutated in TCC. Notably, we identified genetic aberrations of the chromatin remodeling genes (UTX, MLL-MLL3, CREBBP-EP300, NCOR1, ARID1A and CHD6) in 59% of our 97 subjects with TCC. Of these genes, we showed UTX to be altered substantially more frequently in tumors of low stages and grades, highlighting its potential role in the classification and diagnosis of bladder cancer. Our results provide an overview of the genetic basis of TCC and suggest that aberration of chromatin regulation might be a hallmark of bladder cancer.
Collapse
|