1
|
Yuan Y, Bayer PE, Batley J, Edwards D. Current status of structural variation studies in plants. PLANT BIOTECHNOLOGY JOURNAL 2021; 19:2153-2163. [PMID: 34101329 PMCID: PMC8541774 DOI: 10.1111/pbi.13646] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 05/31/2021] [Accepted: 06/03/2021] [Indexed: 05/23/2023]
Abstract
Structural variations (SVs) including gene presence/absence variations and copy number variations are a common feature of genomes in plants and, together with single nucleotide polymorphisms and epigenetic differences, are responsible for the heritable phenotypic diversity observed within and between species. Understanding the contribution of SVs to plant phenotypic variation is important for plant breeders to assist in producing improved varieties. The low resolution of early genetic technologies and inefficient methods have previously limited our understanding of SVs in plants. However, with the rapid expansion in genomic technologies, it is possible to assess SVs with an ever-greater resolution and accuracy. Here, we review the current status of SV studies in plants, examine the roles that SVs play in phenotypic traits, compare current technologies and assess future challenges for SV studies.
Collapse
Affiliation(s)
- Yuxuan Yuan
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
- School of Life Sciences and State Key Laboratory for AgrobiotechnologyThe Chinese University of Hong KongHong Kong SARChina
| | - Philipp E. Bayer
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
| | - Jacqueline Batley
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
| | - David Edwards
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
| |
Collapse
|
2
|
Signatures of TSPAN8 variants associated with human metabolic regulation and diseases. iScience 2021; 24:102893. [PMID: 34401672 PMCID: PMC8355918 DOI: 10.1016/j.isci.2021.102893] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Revised: 06/18/2021] [Accepted: 07/20/2021] [Indexed: 02/08/2023] Open
Abstract
Here, with the example of common copy number variation (CNV) in the TSPAN8 gene, we present an important piece of work in the field of CNV detection, that is, CNV association with complex human traits such as 1H NMR metabolomic phenotypes and an example of functional characterization of CNVs among human induced pluripotent stem cells (HipSci). We report TSPAN8 exon 11 (ENSE00003720745) as a pleiotropic locus associated with metabolomic regulation and show that its biology is associated with several metabolic diseases such as type 2 diabetes (T2D) and cancer. Our results further demonstrate the power of multivariate association models over univariate methods and define metabolomic signatures for variants in TSPAN8.
Collapse
|
3
|
Detection of False-Positive Deletions from the Database of Genomic Variants. BIOMED RESEARCH INTERNATIONAL 2019; 2019:8420547. [PMID: 31080831 PMCID: PMC6475568 DOI: 10.1155/2019/8420547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 02/24/2019] [Accepted: 03/04/2019] [Indexed: 11/24/2022]
Abstract
Next generation sequencing is an emerging technology that has been widely used in the detection of genomic variants. However, since its depth of coverage, a main signature used for variant calling, is affected greatly by biases such as GC content and mappability, some callings are false positives. In this study, we utilized paired-end read mapping, another signature that is not affected by the aforementioned biases, to detect false-positive deletions in the database of genomic variants. We first identified 1923 suspicious variants that may be false positives and then conducted validation studies on each suspicious variant, which detected 583 false-positive deletions. Finally we analysed the distribution of these false positives by chromosome, sample, and size. Hopefully, incorrect documentation and annotations in downstream studies can be avoided by correcting these false positives in public repositories.
Collapse
|
4
|
Roca I, González-Castro L, Fernández H, Couce ML, Fernández-Marmiesse A. Free-access copy-number variant detection tools for targeted next-generation sequencing data. MUTATION RESEARCH-REVIEWS IN MUTATION RESEARCH 2019; 779:114-125. [DOI: 10.1016/j.mrrev.2019.02.005] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Revised: 12/25/2018] [Accepted: 02/22/2019] [Indexed: 01/23/2023]
|
5
|
Guggisberg A, Liu X, Suter L, Mansion G, Fischer MC, Fior S, Roumet M, Kretzschmar R, Koch MA, Widmer A. The genomic basis of adaptation to calcareous and siliceous soils in Arabidopsis lyrata. Mol Ecol 2018; 27:5088-5103. [PMID: 30411828 DOI: 10.1111/mec.14930] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Revised: 10/03/2018] [Accepted: 10/04/2018] [Indexed: 12/27/2022]
Abstract
Edaphic conditions are important determinants of plant fitness. While much has been learnt in recent years about plant adaptation to heavy metal contaminated soils, the genomic basis underlying adaptation to calcareous and siliceous substrates remains largely unknown. We performed a reciprocal germination experiment and whole-genome resequencing in natural calcareous and siliceous populations of diploid Arabidopsis lyrata to test for edaphic adaptation and detect signatures of selection at loci associated with soil-mediated divergence. In parallel, genome scans on respective diploid ecotypes from the Arabidopsis arenosa species complex were undertaken, to search for shared patterns of adaptive genetic divergence. Soil ecotypes of A. lyrata display significant genotype-by-treatment responses for seed germination. Sequence (SNPs) and copy-number variants (CNVs) point towards loci involved in ion transport as the main targets of adaptive genetic divergence. Two genes exhibiting high differentiation among soil types in A. lyrata further share trans-specific single nucleotide polymorphisms with A. arenosa. This work applies experimental and genomic approaches to study edaphic adaptation in A. lyrata and suggests that physiological response to elemental toxicity and deficiency underlies the evolution of calcareous and siliceous ecotypes. The discovery of shared adaptive variation between sister species indicates that ancient polymorphisms contribute to adaptive evolution.
Collapse
Affiliation(s)
| | - Xuanyu Liu
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
| | - Léonie Suter
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
| | - Guilhem Mansion
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
| | - Martin C Fischer
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
| | - Simone Fior
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
| | - Marie Roumet
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
| | - Ruben Kretzschmar
- Institute of Biogeochemistry and Pollutant Dynamics, ETH Zurich, Zurich, Switzerland
| | - Marcus A Koch
- Centre for Organismal Studies Heidelberg, Heidelberg University, Heidelberg, Germany
| | - Alex Widmer
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
| |
Collapse
|
6
|
Bianconi ME, Dunning LT, Moreno-Villena JJ, Osborne CP, Christin PA. Gene duplication and dosage effects during the early emergence of C4 photosynthesis in the grass genus Alloteropsis. JOURNAL OF EXPERIMENTAL BOTANY 2018; 69:1967-1980. [PMID: 29394370 PMCID: PMC6018922 DOI: 10.1093/jxb/ery029] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Accepted: 01/17/2018] [Indexed: 05/04/2023]
Abstract
The importance of gene duplication for evolutionary diversification has been mainly discussed in terms of genetic redundancy allowing neofunctionalization. In the case of C4 photosynthesis, which evolved via the co-option of multiple enzymes to boost carbon fixation in tropical conditions, the importance of genetic redundancy has not been consistently supported by genomic studies. Here, we test for a different role for gene duplication in the early evolution of C4 photosynthesis, via dosage effects creating rapid step changes in expression levels. Using genome-wide data for accessions of the grass genus Alloteropsis that recently diversified into different photosynthetic types, we estimate gene copy numbers and demonstrate that recurrent duplications in two important families of C4 genes coincided with increases in transcript abundance along the phylogeny, in some cases via a pure dosage effect. While increased gene copy number during the initial emergence of C4 photosynthesis probably offered a rapid route to enhanced expression, we also find losses of duplicates following the acquisition of genes encoding better-suited isoforms. The dosage effect of gene duplication might therefore act as a transient process during the evolution of a C4 biochemistry, rendered obsolete by the fixation of regulatory mutations increasing expression levels.
Collapse
Affiliation(s)
- Matheus E Bianconi
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, UK
| | - Luke T Dunning
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, UK
| | | | - Colin P Osborne
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, UK
| | | |
Collapse
|
7
|
Dharanipragada P, Vogeti S, Parekh N. iCopyDAV: Integrated platform for copy number variations-Detection, annotation and visualization. PLoS One 2018; 13:e0195334. [PMID: 29621297 PMCID: PMC5886540 DOI: 10.1371/journal.pone.0195334] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2017] [Accepted: 03/20/2018] [Indexed: 12/14/2022] Open
Abstract
Discovery of copy number variations (CNVs), a major category of structural variations, have dramatically changed our understanding of differences between individuals and provide an alternate paradigm for the genetic basis of human diseases. CNVs include both copy gain and copy loss events and their detection genome-wide is now possible using high-throughput, low-cost next generation sequencing (NGS) methods. However, accurate detection of CNVs from NGS data is not straightforward due to non-uniform coverage of reads resulting from various systemic biases. We have developed an integrated platform, iCopyDAV, to handle some of these issues in CNV detection in whole genome NGS data. It has a modular framework comprising five major modules: data pre-treatment, segmentation, variant calling, annotation and visualization. An important feature of iCopyDAV is the functional annotation module that enables the user to identify and prioritize CNVs encompassing various functional elements, genomic features and disease-associations. Parallelization of the segmentation algorithms makes the iCopyDAV platform even accessible on a desktop. Here we show the effect of sequencing coverage, read length, bin size, data pre-treatment and segmentation approaches on accurate detection of the complete spectrum of CNVs. Performance of iCopyDAV is evaluated on both simulated data and real data for different sequencing depths. It is an open-source integrated pipeline available at https://github.com/vogetihrsh/icopydav and as Docker’s image at http://bioinf.iiit.ac.in/icopydav/.
Collapse
Affiliation(s)
- Prashanthi Dharanipragada
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | - Sriharsha Vogeti
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | - Nita Parekh
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
- * E-mail:
| |
Collapse
|
8
|
Yuan X, Zhang J, Yang L, Bai J, Fan P. Detection of Significant Copy Number Variations From Multiple Samples in Next-Generation Sequencing Data. IEEE Trans Nanobioscience 2018; 17:12-20. [PMID: 29570071 DOI: 10.1109/tnb.2017.2783910] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Analyzing copy number variations (CNVs) from next-generation sequencing (NGS) data has become a common approach to detect disease susceptibility genes. The main challenge is how to utilize the NGS data with limited coverage depth to detect significant CNVs. Here, we introduce a new statistical method, the derivative of correlation coefficient (DCC), to detect significant CNVs that recurrently occur in multiple samples using read depth signals. We use a sliding window to calculate a correlation coefficient for each genome bin, and compute corresponding derivatives by fitting curves to the correlation coefficient. Then, the detection of significant CNVs was transformed into a problem of detecting significant derivatives reflecting genome breakpoints that can be solved using statistical hypothesis testing. We tested and compared the performance of DCC against several peer methods using a large number of simulation data sets, and validated DCC using several real sequencing data sets derived from the European Genome-Phenome archive, DNA Data Bank of Japan, and the 1000 Genomes Project. Experimental results suggest that DCC is an effective approach for identifying CNVs, outperforming peer methods in the terms of detection power and accuracy. DCC can be used to detect significant or recurrent CNVs in various NGS data sets, thus providing useful information to study genomic mutations and find disease susceptibility genes.
Collapse
|
9
|
Liang Y, Qiu K, Liao B, Zhu W, Huang X, Li L, Chen X, Li K. Seeksv: an accurate tool for somatic structural variation and virus integration detection. Bioinformatics 2016; 33:184-191. [DOI: 10.1093/bioinformatics/btw591] [Citation(s) in RCA: 56] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Revised: 07/29/2016] [Accepted: 09/06/2016] [Indexed: 01/31/2023] Open
|
10
|
Poznik GD, Xue Y, Mendez FL, Willems TF, Massaia A, Wilson Sayres MA, Ayub Q, McCarthy SA, Narechania A, Kashin S, Chen Y, Banerjee R, Rodriguez-Flores JL, Cerezo M, Shao H, Gymrek M, Malhotra A, Louzada S, Desalle R, Ritchie GRS, Cerveira E, Fitzgerald TW, Garrison E, Marcketta A, Mittelman D, Romanovitch M, Zhang C, Zheng-Bradley X, Abecasis GR, McCarroll SA, Flicek P, Underhill PA, Coin L, Zerbino DR, Yang F, Lee C, Clarke L, Auton A, Erlich Y, Handsaker RE, Bustamante CD, Tyler-Smith C. Punctuated bursts in human male demography inferred from 1,244 worldwide Y-chromosome sequences. Nat Genet 2016; 48:593-9. [PMID: 27111036 PMCID: PMC4884158 DOI: 10.1038/ng.3559] [Citation(s) in RCA: 198] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2015] [Accepted: 04/01/2016] [Indexed: 12/21/2022]
Abstract
We report the sequences of 1,244 human Y chromosomes randomly ascertained from 26 worldwide populations by the 1000 Genomes Project. We discovered more than 65,000 variants, including single-nucleotide variants, multiple-nucleotide variants, insertions and deletions, short tandem repeats, and copy number variants. Of these, copy number variants contribute the greatest predicted functional impact. We constructed a calibrated phylogenetic tree on the basis of binary single-nucleotide variants and projected the more complex variants onto it, estimating the number of mutations for each class. Our phylogeny shows bursts of extreme expansion in male numbers that have occurred independently among each of the five continental superpopulations examined, at times of known migrations and technological innovations.
Collapse
Affiliation(s)
- G David Poznik
- Program in Biomedical Informatics, Stanford University, Stanford, California, USA
- Department of Genetics, Stanford University, Stanford, California, USA
| | - Yali Xue
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Fernando L Mendez
- Department of Genetics, Stanford University, Stanford, California, USA
| | - Thomas F Willems
- Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
- New York Genome Center, New York, New York, USA
| | - Andrea Massaia
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Melissa A Wilson Sayres
- School of Life Sciences, Arizona State University, Tempe, Arizona, USA
- Center for Evolution and Medicine, Biodesign Institute, Arizona State University, Tempe, Arizona, USA
| | - Qasim Ayub
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Shane A McCarthy
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Apurva Narechania
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York, USA
| | - Seva Kashin
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Yuan Chen
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Ruby Banerjee
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | | | - Maria Cerezo
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Haojing Shao
- Institute for Molecular Bioscience, University of Queensland, St Lucia, Queensland, Australia
| | - Melissa Gymrek
- New York Genome Center, New York, New York, USA
- Harvard-MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Ankit Malhotra
- Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, USA
| | - Sandra Louzada
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Rob Desalle
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York, USA
| | - Graham R S Ritchie
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, UK
| | - Eliza Cerveira
- Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, USA
| | | | - Erik Garrison
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Anthony Marcketta
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, USA
| | - David Mittelman
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, USA
- Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia, USA
| | | | - Chengsheng Zhang
- Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, USA
| | - Xiangqun Zheng-Bradley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, UK
| | - Gonçalo R Abecasis
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan, USA
| | - Steven A McCarroll
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, UK
| | - Peter A Underhill
- Department of Genetics, Stanford University, Stanford, California, USA
| | - Lachlan Coin
- Institute for Molecular Bioscience, University of Queensland, St Lucia, Queensland, Australia
| | - Daniel R Zerbino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, UK
| | - Fengtang Yang
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Charles Lee
- Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, USA
- Department of Life Sciences, Ewha Womans University, Seoul, Republic of Korea
| | - Laura Clarke
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, UK
| | - Adam Auton
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Yaniv Erlich
- New York Genome Center, New York, New York, USA
- Department of Computer Science, Fu Foundation School of Engineering, Columbia University, New York, New York, USA
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, USA
| | - Robert E Handsaker
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
| | - Carlos D Bustamante
- Department of Genetics, Stanford University, Stanford, California, USA
- Department of Biomedical Data Science, Stanford University, Stanford, California, USA
| | - Chris Tyler-Smith
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| |
Collapse
|
11
|
Li J, Woods SL, Healey S, Beesley J, Chen X, Lee JS, Sivakumaran H, Wayte N, Nones K, Waterfall JJ, Pearson J, Patch AM, Senz J, Ferreira MA, Kaurah P, Mackenzie R, Heravi-Moussavi A, Hansford S, Lannagan TRM, Spurdle AB, Simpson PT, da Silva L, Lakhani SR, Clouston AD, Bettington M, Grimpen F, Busuttil RA, Di Costanzo N, Boussioutas A, Jeanjean M, Chong G, Fabre A, Olschwang S, Faulkner GJ, Bellos E, Coin L, Rioux K, Bathe OF, Wen X, Martin HC, Neklason DW, Davis SR, Walker RL, Calzone KA, Avital I, Heller T, Koh C, Pineda M, Rudloff U, Quezado M, Pichurin PN, Hulick PJ, Weissman SM, Newlin A, Rubinstein WS, Sampson JE, Hamman K, Goldgar D, Poplawski N, Phillips K, Schofield L, Armstrong J, Kiraly-Borri C, Suthers GK, Huntsman DG, Foulkes WD, Carneiro F, Lindor NM, Edwards SL, French JD, Waddell N, Meltzer PS, Worthley DL, Schrader KA, Chenevix-Trench G. Point Mutations in Exon 1B of APC Reveal Gastric Adenocarcinoma and Proximal Polyposis of the Stomach as a Familial Adenomatous Polyposis Variant. Am J Hum Genet 2016; 98:830-842. [PMID: 27087319 DOI: 10.1016/j.ajhg.2016.03.001] [Citation(s) in RCA: 139] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2016] [Accepted: 03/02/2016] [Indexed: 12/15/2022] Open
Abstract
Gastric adenocarcinoma and proximal polyposis of the stomach (GAPPS) is an autosomal-dominant cancer-predisposition syndrome with a significant risk of gastric, but not colorectal, adenocarcinoma. We mapped the gene to 5q22 and found loss of the wild-type allele on 5q in fundic gland polyps from affected individuals. Whole-exome and -genome sequencing failed to find causal mutations but, through Sanger sequencing, we identified point mutations in APC promoter 1B that co-segregated with disease in all six families. The mutations reduced binding of the YY1 transcription factor and impaired activity of the APC promoter 1B in luciferase assays. Analysis of blood and saliva from carriers showed allelic imbalance of APC, suggesting that these mutations lead to decreased allele-specific expression in vivo. Similar mutations in APC promoter 1B occur in rare families with familial adenomatous polyposis (FAP). Promoter 1A is methylated in GAPPS and sporadic FGPs and in normal stomach, which suggests that 1B transcripts are more important than 1A in gastric mucosa. This might explain why all known GAPPS-affected families carry promoter 1B point mutations but only rare FAP-affected families carry similar mutations, the colonic cells usually being protected by the expression of the 1A isoform. Gastric polyposis and cancer have been previously described in some FAP-affected individuals with large deletions around promoter 1B. Our finding that GAPPS is caused by point mutations in the same promoter suggests that families with mutations affecting the promoter 1B are at risk of gastric adenocarcinoma, regardless of whether or not colorectal polyps are present.
Collapse
Affiliation(s)
- Jun Li
- Department of Genetics and Computational Biology, QIMR Berghofer, Herston, QLD 4029, Australia
| | - Susan L Woods
- School of Medicine, University of Adelaide and Cancer Theme, SAHMRI, Adelaide, SA 5000, Australia
| | - Sue Healey
- Department of Genetics and Computational Biology, QIMR Berghofer, Herston, QLD 4029, Australia
| | - Jonathan Beesley
- Department of Genetics and Computational Biology, QIMR Berghofer, Herston, QLD 4029, Australia
| | - Xiaoqing Chen
- Department of Genetics and Computational Biology, QIMR Berghofer, Herston, QLD 4029, Australia
| | - Jason S Lee
- Department of Genetics and Computational Biology, QIMR Berghofer, Herston, QLD 4029, Australia
| | - Haran Sivakumaran
- Department of Genetics and Computational Biology, QIMR Berghofer, Herston, QLD 4029, Australia
| | - Nicci Wayte
- Department of Genetics and Computational Biology, QIMR Berghofer, Herston, QLD 4029, Australia
| | - Katia Nones
- Department of Genetics and Computational Biology, QIMR Berghofer, Herston, QLD 4029, Australia
| | - Joshua J Waterfall
- Genetics Branch, Center for Cancer Research (CCR), National Cancer Institute (NCI), NIH, Bethesda, MD 20892, USA
| | - John Pearson
- Department of Genetics and Computational Biology, QIMR Berghofer, Herston, QLD 4029, Australia; Institute for Molecular Bioscience, The University of Queensland, St Lucia, QLD 4072, Australia
| | - Anne-Marie Patch
- Department of Genetics and Computational Biology, QIMR Berghofer, Herston, QLD 4029, Australia
| | - Janine Senz
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC V6T 2B5, Canada
| | - Manuel A Ferreira
- Department of Genetics and Computational Biology, QIMR Berghofer, Herston, QLD 4029, Australia
| | - Pardeep Kaurah
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V6H 3N1, Canada
| | - Robertson Mackenzie
- Department of Molecular Oncology, BC Cancer Research Centre, Vancouver, BC V5Z 1L3, Canada
| | | | - Samantha Hansford
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC V6T 2B5, Canada
| | - Tamsin R M Lannagan
- School of Medicine, University of Adelaide and Cancer Theme, SAHMRI, Adelaide, SA 5000, Australia
| | - Amanda B Spurdle
- Department of Genetics and Computational Biology, QIMR Berghofer, Herston, QLD 4029, Australia
| | - Peter T Simpson
- UQ Centre for Clinical Research, The University of Queensland, Brisbane, QLD 4029, Australia; School of Medicine, The University of Queensland, Brisbane, QLD 4006, Australia
| | - Leonard da Silva
- UQ Centre for Clinical Research, The University of Queensland, Brisbane, QLD 4029, Australia; School of Medicine, The University of Queensland, Brisbane, QLD 4006, Australia
| | - Sunil R Lakhani
- UQ Centre for Clinical Research, The University of Queensland, Brisbane, QLD 4029, Australia; School of Medicine, The University of Queensland, Brisbane, QLD 4006, Australia; Anatomical Pathology, Pathology Queensland, Royal Brisbane and Women's Hospital, Brisbane, QLD 4029, Australia
| | - Andrew D Clouston
- Centre for Liver Disease Research, TRI Building, University of Queensland, Woolloongabba, QLD 4102, Australia; Envoi Specialist Pathologists, Bishop Street, Kelvin Grove, QLD 4059, Australia
| | - Mark Bettington
- School of Medicine, The University of Queensland, Brisbane, QLD 4006, Australia; Envoi Specialist Pathologists, Bishop Street, Kelvin Grove, QLD 4059, Australia; The Conjoint Gastroenterology Laboratory, QIMR Berghofer, Herston, QLD 4029, Australia
| | - Florian Grimpen
- Departments of Gastroenterology and Hepatology, Royal Brisbane and Women's Hospital, Brisbane, QLD 4006, Australia
| | - Rita A Busuttil
- Cancer Genetics and Genomics Laboratory, Peter MacCallum Cancer Centre, Locked Bag 1, Melbourne, VIC 8006, Australia; Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC 3010, Australia; Department of Medicine, Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Natasha Di Costanzo
- Cancer Genetics and Genomics Laboratory, Peter MacCallum Cancer Centre, Locked Bag 1, Melbourne, VIC 8006, Australia
| | - Alex Boussioutas
- Cancer Genetics and Genomics Laboratory, Peter MacCallum Cancer Centre, Locked Bag 1, Melbourne, VIC 8006, Australia; Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC 3010, Australia; Department of Medicine, Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3010, Australia; Department of Gastroenterology, Royal Melbourne Hospital, Parkville, VIC 3010, Australia
| | - Marie Jeanjean
- Lady Davis Institute, Segal Cancer Centre, Jewish General Hospital, Montreal, QC H3T 1E2, Canada
| | - George Chong
- Molecular Pathology Centre, Department of Pathology, Jewish General Hospital - McGill University, Montreal, QC H3T 1E2, Canada
| | - Aurélie Fabre
- AP-HM Timone, Medical Genetics Department, 13385 Marseille, France; Aix Marseille Université, INSERM, GMGF UMR_S 910, 13385 Marseille, France; Oncology Unit, Generale de Sante, Clairval Hospital, 13009 Marseille, France
| | - Sylviane Olschwang
- AP-HM Timone, Medical Genetics Department, 13385 Marseille, France; Aix Marseille Université, INSERM, GMGF UMR_S 910, 13385 Marseille, France; Oncology Unit, Generale de Sante, Clairval Hospital, 13009 Marseille, France
| | - Geoffrey J Faulkner
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia
| | - Evangelos Bellos
- Institute for Molecular Bioscience, The University of Queensland, St Lucia, QLD 4072, Australia; Department of Genomics of Common Disease, Imperial College London, London W12 0NN, UK
| | - Lachlan Coin
- Institute for Molecular Bioscience, The University of Queensland, St Lucia, QLD 4072, Australia
| | - Kevin Rioux
- Department of Medicine, Division of Gastroenterology, Department of Microbiology and Infectious Diseases, Gastrointestinal Research Group, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Oliver F Bathe
- Departments of Surgery and Oncology, University of Calgary, Calgary, AB T2N 4N1, Canada; Division of Surgical Oncology, Tom Baker Cancer Centre, 1331 29(th) St NW, Calgary, AB T2N 4N1, Canada
| | - Xiaogang Wen
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP)/Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Porto 4200-135, Portugal; Centro Hospitalar Vila Nova de Gaia/Espinho, Porto 4430-027, Portugal
| | - Hilary C Martin
- Wellcome Trust Centre for Human Genetics, Oxford OX3 7BN, UK
| | - Deborah W Neklason
- Department of Internal Medicine, Huntsman Cancer Institute at University of Utah, Salt Lake City, UT 84112, USA
| | - Sean R Davis
- Genetics Branch, Center for Cancer Research (CCR), National Cancer Institute (NCI), NIH, Bethesda, MD 20892, USA
| | - Robert L Walker
- Genetics Branch, Center for Cancer Research (CCR), National Cancer Institute (NCI), NIH, Bethesda, MD 20892, USA
| | - Kathleen A Calzone
- Genetics Branch, Center for Cancer Research (CCR), National Cancer Institute (NCI), NIH, Bethesda, MD 20892, USA
| | - Itzhak Avital
- Department of Surgery, Saint Peter's University Hospital, Rutgers University, New Brunswick, NJ 08901, USA
| | - Theo Heller
- Liver Diseases Branch, National Institute of Diabetes and Digestive and Kidney Disease (NIDDK), NIH, Bethesda, MD 20892, USA
| | - Christopher Koh
- Liver Diseases Branch, National Institute of Diabetes and Digestive and Kidney Disease (NIDDK), NIH, Bethesda, MD 20892, USA
| | - Marbin Pineda
- Genetics Branch, Center for Cancer Research (CCR), National Cancer Institute (NCI), NIH, Bethesda, MD 20892, USA
| | - Udo Rudloff
- Thoracic and Gastrointestinal Oncology Branch, Center for Cancer Research (CCR), National Cancer Institute (NCI), NIH, Bethesda, MD 20892, USA
| | - Martha Quezado
- Laboratory of Pathology, Center for Cancer Research (CCR), National Cancer Institute (NCI), NIH, Bethesda, MD 20892, USA
| | - Pavel N Pichurin
- Department of Medical Genetics, Mayo Clinic, Rochester, MN 55905, USA
| | - Peter J Hulick
- Center for Medical Genetics, NorthShore University HealthSystem, Evanston, IL 60201, USA
| | | | - Anna Newlin
- Center for Medical Genetics, NorthShore University HealthSystem, Evanston, IL 60201, USA
| | - Wendy S Rubinstein
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), NIH, Bethesda, MD 20892, USA
| | - Jone E Sampson
- Department of Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR 97239, USA
| | - Kelly Hamman
- Department of Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR 97239, USA
| | - David Goldgar
- Department of Dermatology and Huntsman Cancer Institute, University of Utah School of Medicine, Salt Lake City, UT 84112, USA
| | - Nicola Poplawski
- Adult Genetics Unit, SA Pathology at the Women's and Children's Hospital, North Adelaide, SA 5006, Australia; University Department of Paediatrics, University of Adelaide, Adelaide, SA 5005, Australia
| | - Kerry Phillips
- Adult Genetics Unit, SA Pathology at the Women's and Children's Hospital, North Adelaide, SA 5006, Australia; University Department of Paediatrics, University of Adelaide, Adelaide, SA 5005, Australia
| | - Lyn Schofield
- Genetic Services of Western Australia, King Edward Memorial Hospital, Subiaco, WA 6008, Australia
| | - Jacqueline Armstrong
- Adult Genetics Unit, SA Pathology at the Women's and Children's Hospital, North Adelaide, SA 5006, Australia
| | - Cathy Kiraly-Borri
- Genetic Services of Western Australia, King Edward Memorial Hospital, Subiaco, WA 6008, Australia
| | - Graeme K Suthers
- University Department of Paediatrics, University of Adelaide, Adelaide, SA 5005, Australia
| | - David G Huntsman
- Department of Molecular Oncology, BC Cancer Research Centre, Vancouver, BC V5Z 1L3, Canada; Department of Pathology and Obstetrics and Gynaecology, University of British Columbia, Vancouver, BC V6Z 2K5, Canada
| | - William D Foulkes
- Lady Davis Institute, Segal Cancer Centre, Jewish General Hospital, Montreal, QC H3T 1E2, Canada; Program in Cancer Genetics, Departments of Oncology and Human Genetics, McGill University, Montreal, QC H3A 1B1, Canada
| | - Fatima Carneiro
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP)/Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Porto 4200-135, Portugal; Medical Faculty of the University of Porto/Centro Hospitalar São João, Porto 4200-319, Portugal
| | - Noralane M Lindor
- Department of Health Sciences Research, Mayo Clinic, Scottsdale, AZ 85259, USA
| | - Stacey L Edwards
- Department of Genetics and Computational Biology, QIMR Berghofer, Herston, QLD 4029, Australia
| | - Juliet D French
- Department of Genetics and Computational Biology, QIMR Berghofer, Herston, QLD 4029, Australia
| | - Nicola Waddell
- Department of Genetics and Computational Biology, QIMR Berghofer, Herston, QLD 4029, Australia; Institute for Molecular Bioscience, The University of Queensland, St Lucia, QLD 4072, Australia
| | - Paul S Meltzer
- Genetics Branch, Center for Cancer Research (CCR), National Cancer Institute (NCI), NIH, Bethesda, MD 20892, USA
| | - Daniel L Worthley
- School of Medicine, University of Adelaide and Cancer Theme, SAHMRI, Adelaide, SA 5000, Australia
| | - Kasmintan A Schrader
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V6H 3N1, Canada; Department of Molecular Oncology, BC Cancer Research Centre, Vancouver, BC V5Z 1L3, Canada
| | - Georgia Chenevix-Trench
- Department of Genetics and Computational Biology, QIMR Berghofer, Herston, QLD 4029, Australia.
| |
Collapse
|
12
|
Pirooznia M, Goes FS, Zandi PP. Whole-genome CNV analysis: advances in computational approaches. Front Genet 2015; 6:138. [PMID: 25918519 PMCID: PMC4394692 DOI: 10.3389/fgene.2015.00138] [Citation(s) in RCA: 123] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2015] [Accepted: 03/23/2015] [Indexed: 01/04/2023] Open
Abstract
Accumulating evidence indicates that DNA copy number variation (CNV) is likely to make a significant contribution to human diversity and also play an important role in disease susceptibility. Recent advances in genome sequencing technologies have enabled the characterization of a variety of genomic features, including CNVs. This has led to the development of several bioinformatics approaches to detect CNVs from next-generation sequencing data. Here, we review recent advances in CNV detection from whole genome sequencing. We discuss the informatics approaches and current computational tools that have been developed as well as their strengths and limitations. This review will assist researchers and analysts in choosing the most suitable tools for CNV analysis as well as provide suggestions for new directions in future development.
Collapse
Affiliation(s)
- Mehdi Pirooznia
- Mood Disorders Center, Department of Psychiatry and Behavioral Sciences, School of Medicine, Johns Hopkins University Baltimore, MD, USA
| | - Fernando S Goes
- Mood Disorders Center, Department of Psychiatry and Behavioral Sciences, School of Medicine, Johns Hopkins University Baltimore, MD, USA
| | - Peter P Zandi
- Mood Disorders Center, Department of Psychiatry and Behavioral Sciences, School of Medicine, Johns Hopkins University Baltimore, MD, USA ; Department of Mental Health, Johns Hopkins Bloomberg School of Public Health Baltimore, MD, USA USA
| |
Collapse
|
13
|
Glusman G, Severson A, Dhankani V, Robinson M, Farrah T, Mauldin DE, Stittrich AB, Ament SA, Roach JC, Brunkow ME, Bodian DL, Vockley JG, Shmulevich I, Niederhuber JE, Hood L. Identification of copy number variants in whole-genome data using Reference Coverage Profiles. Front Genet 2015; 6:45. [PMID: 25741365 PMCID: PMC4330915 DOI: 10.3389/fgene.2015.00045] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2014] [Accepted: 01/30/2015] [Indexed: 12/20/2022] Open
Abstract
The identification of DNA copy numbers from short-read sequencing data remains a challenge for both technical and algorithmic reasons. The raw data for these analyses are measured in tens to hundreds of gigabytes per genome; transmitting, storing, and analyzing such large files is cumbersome, particularly for methods that analyze several samples simultaneously. We developed a very efficient representation of depth of coverage (150–1000× compression) that enables such analyses. Current methods for analyzing variants in whole-genome sequencing (WGS) data frequently miss copy number variants (CNVs), particularly hemizygous deletions in the 1–100 kb range. To fill this gap, we developed a method to identify CNVs in individual genomes, based on comparison to joint profiles pre-computed from a large set of genomes. We analyzed depth of coverage in over 6000 high quality (>40×) genomes. The depth of coverage has strong sequence-specific fluctuations only partially explained by global parameters like %GC. To account for these fluctuations, we constructed multi-genome profiles representing the observed or inferred diploid depth of coverage at each position along the genome. These Reference Coverage Profiles (RCPs) take into account the diverse technologies and pipeline versions used. Normalization of the scaled coverage to the RCP followed by hidden Markov model (HMM) segmentation enables efficient detection of CNVs and large deletions in individual genomes. Use of pre-computed multi-genome coverage profiles improves our ability to analyze each individual genome. We make available RCPs and tools for performing these analyses on personal genomes. We expect the increased sensitivity and specificity for individual genome analysis to be critical for achieving clinical-grade genome interpretation.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | - Dale L Bodian
- Inova Translational Medicine Institute, Inova Health System Falls Church, VA, USA
| | - Joseph G Vockley
- Inova Translational Medicine Institute, Inova Health System Falls Church, VA, USA
| | | | - John E Niederhuber
- Inova Translational Medicine Institute, Inova Health System Falls Church, VA, USA
| | - Leroy Hood
- Institute for Systems Biology Seattle, WA, USA
| |
Collapse
|
14
|
Large multiallelic copy number variations in humans. Nat Genet 2015; 47:296-303. [PMID: 25621458 PMCID: PMC4405206 DOI: 10.1038/ng.3200] [Citation(s) in RCA: 265] [Impact Index Per Article: 29.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2014] [Accepted: 12/31/2014] [Indexed: 12/14/2022]
Abstract
Thousands of genomic segments appear to be present in widely varying copy numbers in different human genomes. We developed ways to use increasingly abundant whole-genome sequence data to identify the copy numbers, alleles and haplotypes present at most large multiallelic CNVs (mCNVs). We analyzed 849 genomes sequenced by the 1000 Genomes Project to identify most large (>5-kb) mCNVs, including 3,878 duplications, of which 1,356 appear to have 3 or more segregating alleles. We find that mCNVs give rise to most human variation in gene dosage-seven times the combined contribution of deletions and biallelic duplications-and that this variation in gene dosage generates abundant variation in gene expression. We describe 'runaway duplication haplotypes' in which genes, including HPR and ORM1, have mutated to high copy number on specific haplotypes. We also describe partially successful initial strategies for analyzing mCNVs via imputation and provide an initial data resource to support such analyses.
Collapse
|
15
|
Bellos E, Coin LJM. cnvOffSeq: detecting intergenic copy number variation using off-target exome sequencing data. ACTA ACUST UNITED AC 2015; 30:i639-45. [PMID: 25161258 PMCID: PMC4147927 DOI: 10.1093/bioinformatics/btu475] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
MOTIVATION Exome sequencing technologies have transformed the field of Mendelian genetics and allowed for efficient detection of genomic variants in protein-coding regions. The target enrichment process that is intrinsic to exome sequencing is inherently imperfect, generating large amounts of unintended off-target sequence. Off-target data are characterized by very low and highly heterogeneous coverage and are usually discarded by exome analysis pipelines. We posit that off-target read depth is a rich, but overlooked, source of information that could be mined to detect intergenic copy number variation (CNV). We propose cnvOffseq, a novel normalization framework for off-target read depth that is based on local adaptive singular value decomposition (SVD). This method is designed to address the heterogeneity of the underlying data and allows for accurate and precise CNV detection and genotyping in off-target regions. RESULTS cnvOffSeq was benchmarked on whole-exome sequencing samples from the 1000 Genomes Project. In a set of 104 gold standard intergenic deletions, our method achieved a sensitivity of 57.5% and a specificity of 99.2%, while maintaining a low FDR of 5%. For gold standard deletions longer than 5 kb, cnvOffSeq achieves a sensitivity of 90.4% without increasing the FDR. cnvOffSeq outperforms both whole-genome and whole-exome CNV detection methods considerably and is shown to offer a substantial improvement over naïve local SVD. AVAILABILITY AND IMPLEMENTATION cnvOffSeq is available at http://sourceforge.net/p/cnvoffseq/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Evangelos Bellos
- Department of Genomics of Common Disease, Imperial College London, London W12 0NN, UK and Institute for Molecular Bioscience, University of Queensland, St Lucia, QLD 4072, Australia
| | - Lachlan J M Coin
- Department of Genomics of Common Disease, Imperial College London, London W12 0NN, UK and Institute for Molecular Bioscience, University of Queensland, St Lucia, QLD 4072, Australia Department of Genomics of Common Disease, Imperial College London, London W12 0NN, UK and Institute for Molecular Bioscience, University of Queensland, St Lucia, QLD 4072, Australia
| |
Collapse
|
16
|
Lin K, Smit S, Bonnema G, Sanchez-Perez G, de Ridder D. Making the difference: integrating structural variation detection tools. Brief Bioinform 2014; 16:852-64. [PMID: 25504367 DOI: 10.1093/bib/bbu047] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2014] [Indexed: 01/01/2023] Open
Abstract
From prokaryotes to eukaryotes, phenotypic variation, adaptation and speciation has been associated with structural variation between genomes of individuals within the same species. Many computer algorithms detecting such variations (callers) have recently been developed, spurred by the advent of the next-generation sequencing technology. Such callers mainly exploit split-read mapping or paired-end read mapping. However, as different callers are geared towards different types of structural variation, there is still no single caller that can be considered a community standard; instead, increasingly the various callers are combined in integrated pipelines. In this article, we review a wide range of callers, discuss challenges in the integration step and present a survey of pipelines used in population genomics studies. Based on our findings, we provide general recommendations on how to set-up such pipelines. Finally, we present an outlook on future challenges in structural variation detection.
Collapse
|
17
|
Chu C, Zhang J, Wu Y. GINDEL: accurate genotype calling of insertions and deletions from low coverage population sequence reads. PLoS One 2014; 9:e113324. [PMID: 25423315 PMCID: PMC4244156 DOI: 10.1371/journal.pone.0113324] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Accepted: 10/24/2014] [Indexed: 12/25/2022] Open
Abstract
Insertions and deletions (indels) are important types of structural variations. Obtaining accurate genotypes of indels may facilitate further genetic study. There are a few existing methods for calling indel genotypes from sequence reads. However, none of these tools can accurately call indel genotypes for indels of all lengths, especially for low coverage sequence data. In this paper, we present GINDEL, an approach for calling genotypes of both insertions and deletions from sequence reads. GINDEL uses a machine learning approach which combines multiple features extracted from next generation sequencing data. We test our approach on both simulated and real data and compare with existing tools, including Genome STRiP, Pindel and Clever-sv. Results show that GINDEL works well for deletions larger than 50 bp on both high and low coverage data. Also, GINDEL performs well for insertion genotyping on both simulated and real data. For comparison, Genome STRiP performs less well for shorter deletions (50-200 bp) on both simulated and real sequence data from the 1000 Genomes Project. Clever-sv performs well for intermediate deletions (200-1500 bp) but is less accurate when coverage is low. Pindel only works well for high coverage data, but does not perform well at low coverage. To summarize, we show that GINDEL not only can call genotypes of insertions and deletions (both short and long) for high and low coverage population sequence data, but also is more accurate and efficient than other approaches. The program GINDEL can be downloaded at: http://sourceforge.net/p/gindel.
Collapse
Affiliation(s)
- Chong Chu
- Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut, United States of America
| | - Jin Zhang
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Yufeng Wu
- Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut, United States of America
| |
Collapse
|
18
|
Bellos E, Kumar V, Lin C, Maggi J, Phua ZY, Cheng CY, Cheung CMG, Hibberd ML, Wong TY, Coin LJM, Davila S. cnvCapSeq: detecting copy number variation in long-range targeted resequencing data. Nucleic Acids Res 2014; 42:e158. [PMID: 25228465 PMCID: PMC4227763 DOI: 10.1093/nar/gku849] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Targeted resequencing technologies have allowed for efficient and cost-effective detection of genomic variants in specific regions of interest. Although capture sequencing has been primarily used for investigating single nucleotide variants and indels, it has the potential to elucidate a broader spectrum of genetic variation, including copy number variants (CNVs). Various methods exist for detecting CNV in whole-genome and exome sequencing datasets. However, no algorithms have been specifically designed for contiguous target sequencing, despite its increasing importance in clinical and research applications. We have developed cnvCapSeq, a novel method for accurate and sensitive CNV discovery and genotyping in long-range targeted resequencing. cnvCapSeq was benchmarked using a simulated contiguous capture sequencing dataset comprising 21 genomic loci of various lengths. cnvCapSeq was shown to outperform the best existing exome CNV method by a wide margin both in terms of sensitivity (92.0 versus 48.3%) and specificity (99.8 versus 70.5%). We also applied cnvCapSeq to a real capture sequencing cohort comprising a contiguous 358 kb region that contains the Complement Factor H gene cluster. In this dataset, cnvCapSeq identified 41 samples with CNV, including two with duplications, with a genotyping accuracy of 99%, as ascertained by quantitative real-time PCR.
Collapse
Affiliation(s)
- Evangelos Bellos
- Department of Genomics of Common Disease, School of Public Health, Imperial College London, London W12 0NN, UK
| | - Vikrant Kumar
- Genome Institute of Singapore, 60 Biopolis St., 138672, Singapore
| | - Clarabelle Lin
- Genome Institute of Singapore, 60 Biopolis St., 138672, Singapore
| | - Jordi Maggi
- Institute of Medical Molecular Genetics, University of Zurich, Wagistrasse 12, 8952 Schlieren, Switzerland
| | - Zai Yang Phua
- Genome Institute of Singapore, 60 Biopolis St., 138672, Singapore
| | - Ching-Yu Cheng
- Singapore Eye Research Institute, Singapore National Eye Center, 11 Third Hospital Avenue, 168751, Singapore Department of Ophthalmology, National University of Singapore, 1E Kent Ridge Road, 119228, Singapore
| | - Chui Ming Gemmy Cheung
- Singapore Eye Research Institute, Singapore National Eye Center, 11 Third Hospital Avenue, 168751, Singapore Department of Ophthalmology, National University of Singapore, 1E Kent Ridge Road, 119228, Singapore
| | - Martin L Hibberd
- Genome Institute of Singapore, 60 Biopolis St., 138672, Singapore Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London WC1E 7HT, UK
| | - Tien Yin Wong
- Singapore Eye Research Institute, Singapore National Eye Center, 11 Third Hospital Avenue, 168751, Singapore Department of Ophthalmology, National University of Singapore, 1E Kent Ridge Road, 119228, Singapore
| | - Lachlan J M Coin
- Department of Genomics of Common Disease, School of Public Health, Imperial College London, London W12 0NN, UK Institute for Molecular Bioscience, University of Queensland, St Lucia, QLD 4072, Australia
| | - Sonia Davila
- Genome Institute of Singapore, 60 Biopolis St., 138672, Singapore
| |
Collapse
|
19
|
Connolly JJ, Glessner JT, Almoguera B, Crosslin DR, Jarvik GP, Sleiman PM, Hakonarson H. Copy number variation analysis in the context of electronic medical records and large-scale genomics consortium efforts. Front Genet 2014; 5:51. [PMID: 24672537 PMCID: PMC3957100 DOI: 10.3389/fgene.2014.00051] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2013] [Accepted: 02/18/2014] [Indexed: 12/18/2022] Open
Abstract
The goal of this paper is to review recent research on copy number variations (CNVs) and their association with complex and rare diseases. In the latter part of this paper, we focus on how large biorepositories such as the electronic medical record and genomics (eMERGE) consortium may be best leveraged to systematically mine for potentially pathogenic CNVs, and we end with a discussion of how such variants might be reported back for inclusion in electronic medical records as part of medical history.
Collapse
Affiliation(s)
- John J Connolly
- The Center for Applied Genomics, Children's Hospital of Philadelphia Philadelphia, PA, USA
| | - Joseph T Glessner
- The Center for Applied Genomics, Children's Hospital of Philadelphia Philadelphia, PA, USA ; Department of Pediatrics, University of Pennsylvania Perelman School of Medicine Philadelphia, PA, USA
| | - Berta Almoguera
- The Center for Applied Genomics, Children's Hospital of Philadelphia Philadelphia, PA, USA
| | - David R Crosslin
- Departments of Medicine (Medical Genetics) and Genome Sciences, University of Washington Medical Center Seattle, WA, USA
| | - Gail P Jarvik
- Departments of Medicine (Medical Genetics) and Genome Sciences, University of Washington Medical Center Seattle, WA, USA
| | - Patrick M Sleiman
- The Center for Applied Genomics, Children's Hospital of Philadelphia Philadelphia, PA, USA ; Department of Pediatrics, University of Pennsylvania Perelman School of Medicine Philadelphia, PA, USA
| | - Hakon Hakonarson
- The Center for Applied Genomics, Children's Hospital of Philadelphia Philadelphia, PA, USA ; Department of Pediatrics, University of Pennsylvania Perelman School of Medicine Philadelphia, PA, USA
| |
Collapse
|
20
|
Abstract
Common copy number variations (CNVs) are small regions of genomic variations at the same loci across multiple samples, which can be detected with high resolution from next-generation sequencing (NGS) technique. Multiple sequencing data samples are often available from genomic studies; examples include sequences from multiple platforms and sequences from multiple individuals. By integrating complementary information from multiple data samples, detection power can be potentially improved. However, most of current CNV detection methods often process an individual sequence sample, or two samples in an abnormal versus matched normal study; researches on detecting common CNVs across multiple samples have been very limited but are much needed. In this paper, we propose a novel method to detect common CNVs from multiple sequencing samples by exploiting the concurrency of genomic variations in read depth signals derived from multiple NGS data. We use a penalized sparse regression model to fit multiple read depth profiles, based on which common CNV identification is formulated as a change-point detection problem. Finally, we validate the proposed method on both simulation and real data, showing that it can give both higher detection power and better break point estimation over several published CNV detection methods.
Collapse
Affiliation(s)
- Junbo Duan
- Department of Biomedical Engineering, Xi’an Jiaotong University, Xi’an 710049, China
| | - Hong-Wen Deng
- Department of Biomedical Engineering and Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, LA 70118 USA
| | - Yu-Ping Wang
- Department of Biomedical Engineering and Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, LA 70118 USA
| |
Collapse
|