1
|
Abstract
Differences between genomes can be due to single nucleotide variants (SNPs), translocations, inversions and copy number variants (CNVs, gain or loss of DNA). The latter can range from sub-microscopic events to complete chromosomal aneuploidies. Small CNVs are often benign but those larger than 250 kb are strongly associated with morbid consequences such as developmental disorders and cancer. Detecting CNVs within and between populations is essential to better understand the plasticity of our genome and to elucidate its possible contribution to disease or phenotypic traits.While the link between SNPs and disease susceptibility has been well studied, to date there are still very few published CNV genome-wide association studies; probably owing to the fact that CNV analysis remains a slightly more complex task than SNP analysis (both in term of bioinformatics workflow and uncertainty in the CNV calling leading to high false positive rates and unknown false negative rates). This chapter aims at explaining computational methods for the analysis of CNVs, ranging from study design, data processing and quality control, up to genome-wide association study with clinical traits.
Collapse
Affiliation(s)
- Aurélien Macé
- Institute of Social and Preventive Medicine, University Hospital of Lausanne, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Zoltán Kutalik
- Institute of Social and Preventive Medicine, University Hospital of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | |
Collapse
|
2
|
Liu M, Moon S, Wang L, Kim S, Kim YJ, Hwang MY, Kim YJ, Elston RC, Kim BJ, Won S. On the association analysis of CNV data: a fast and robust family-based association method. BMC Bioinformatics 2017; 18:217. [PMID: 28420343 PMCID: PMC5395793 DOI: 10.1186/s12859-017-1622-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2016] [Accepted: 03/31/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Copy number variation (CNV) is known to play an important role in the genetics of complex diseases and several methods have been proposed to detect association of CNV with phenotypes of interest. Statistical methods for CNV association analysis can be categorized into two different strategies. First, the copy number is estimated by maximum likelihood and association of the expected copy number with the phenotype is tested. Second, the observed probe intensity measurements can be directly used to detect association of CNV with the phenotypes of interest. RESULTS For each strategy we provide a statistic that can be applied to extended families. The computational efficiency of the proposed methods enables genome-wide association analysis and we show with simulation studies that the proposed methods outperform other existing approaches. In particular, we found that the first strategy is always more efficient than the second strategy no matter whether copy numbers for each individual are well identified or not. With the proposed methods, we performed genome-wide CNV association analyses of hematological trait, hematocrit, on 521 Korean family samples. CONCLUSIONS We found that statistical analysis with the expected copy number is more powerful than the statistic with the probe intensity measurements regardless of the accuracy of the estimation of copy numbers.
Collapse
Affiliation(s)
- Meiling Liu
- Department of Applied Statistics, Chung-Ang University, Seoul, 156-756, South Korea.,Department of Bioinformatics and Computational Biology, Iowa State University, Ames, IA, 50011, USA
| | - Sanghoon Moon
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Cheongju-si, Chungcheongbuk-do, 363-951, South Korea
| | - Longfei Wang
- Interdisciplinary Program of Bioinformatics, Seoul National University, Seoul, 151-742, South Korea
| | - Sulgi Kim
- Naver Labs, 235 Pangyoyeok-ro, Bundang-gu, Seongnam-si, Gyeonggi-do, 13494, South Korea
| | - Yeon-Jung Kim
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Cheongju-si, Chungcheongbuk-do, 363-951, South Korea
| | - Mi Yeong Hwang
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Cheongju-si, Chungcheongbuk-do, 363-951, South Korea
| | - Young Jin Kim
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Cheongju-si, Chungcheongbuk-do, 363-951, South Korea
| | - Robert C Elston
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, 44106, USA
| | - Bong-Jo Kim
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Cheongju-si, Chungcheongbuk-do, 363-951, South Korea.
| | - Sungho Won
- Interdisciplinary Program of Bioinformatics, Seoul National University, Seoul, 151-742, South Korea. .,Department of Public Health Science, Seoul National University, Seoul, 151-742, South Korea. .,Institute of Health and Environment, Seoul National University, Seoul, 151-742, South Korea.
| |
Collapse
|
3
|
Valsesia A, Macé A, Jacquemont S, Beckmann JS, Kutalik Z. The Growing Importance of CNVs: New Insights for Detection and Clinical Interpretation. Front Genet 2013; 4:92. [PMID: 23750167 PMCID: PMC3667386 DOI: 10.3389/fgene.2013.00092] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2013] [Accepted: 05/04/2013] [Indexed: 02/03/2023] Open
Abstract
Differences between genomes can be due to single nucleotide variants, translocations, inversions, and copy number variants (CNVs, gain or loss of DNA). The latter can range from sub-microscopic events to complete chromosomal aneuploidies. Small CNVs are often benign but those larger than 500 kb are strongly associated with morbid consequences such as developmental disorders and cancer. Detecting CNVs within and between populations is essential to better understand the plasticity of our genome and to elucidate its possible contribution to disease. Hence there is a need for better-tailored and more robust tools for the detection and genome-wide analyses of CNVs. While a link between a given CNV and a disease may have often been established, the relative CNV contribution to disease progression and impact on drug response is not necessarily understood. In this review we discuss the progress, challenges, and limitations that occur at different stages of CNV analysis from the detection (using DNA microarrays and next-generation sequencing) and identification of recurrent CNVs to the association with phenotypes. We emphasize the importance of germline CNVs and propose strategies to aid clinicians to better interpret structural variations and assess their clinical implications.
Collapse
Affiliation(s)
- Armand Valsesia
- Genetics Core, Nestlé Institute of Health Sciences Lausanne, Switzerland
| | | | | | | | | |
Collapse
|
4
|
Zhang Z, Wang JC, Howells W, Lin P, Agrawal A, Edenberg HJ, Tischfield JA, Schuckit MA, Bierut LJ, Goate A, Rice JP. Dosage transmission disequilibrium test (dTDT) for linkage and association detection. PLoS One 2013; 8:e63526. [PMID: 23691058 PMCID: PMC3653954 DOI: 10.1371/journal.pone.0063526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2012] [Accepted: 04/06/2013] [Indexed: 11/26/2022] Open
Abstract
Both linkage and association studies have been successfully applied to identify disease susceptibility genes with genetic markers such as microsatellites and Single Nucleotide Polymorphisms (SNPs). As one of the traditional family-based studies, the Transmission/Disequilibrium Test (TDT) measures the over-transmission of an allele in a trio from its heterozygous parents to the affected offspring and can be potentially useful to identify genetic determinants for complex disorders. However, there is reduced information when complete trio information is unavailable. In this study, we developed a novel approach to "infer" the transmission of SNPs by combining both the linkage and association data, which uses microsatellite markers from families informative for linkage together with SNP markers from the offspring who are genotyped for both linkage and a Genome-Wide Association Study (GWAS). We generalized the traditional TDT to process these inferred dosage probabilities, which we name as the dosage-TDT (dTDT). For evaluation purpose, we developed a simulation procedure to assess its operating characteristics. We applied the dTDT to the simulated data and documented the power of the dTDT under a number of different realistic scenarios. Finally, we applied our methods to a family study of alcohol dependence (COGA) and performed individual genotyping on complete families for the top signals. One SNP (rs4903712 on chromosome 14) remained significant after correcting for multiple testing Methods developed in this study can be adapted to other platforms and will have widespread applicability in genomic research when case-control GWAS data are collected in families with existing linkage data.
Collapse
Affiliation(s)
- Zhehao Zhang
- Washington University School of Medicine, Department of Psychiatry, St. Louis, Missouri, United States of America
| | - Jen-Chyong Wang
- Washington University School of Medicine, Department of Psychiatry, St. Louis, Missouri, United States of America
| | - William Howells
- Washington University School of Medicine, Department of Psychiatry, St. Louis, Missouri, United States of America
| | - Peng Lin
- Washington University School of Medicine, Department of Psychiatry, St. Louis, Missouri, United States of America
| | - Arpana Agrawal
- Washington University School of Medicine, Department of Psychiatry, St. Louis, Missouri, United States of America
| | - Howard J. Edenberg
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
| | - Jay A. Tischfield
- LSB 136, Rutgers University, Piscataway, New Jersey, United States of America
| | - Marc A. Schuckit
- Department of Psychiatry, University of California San Diego, La Jolla, California, United States of America
| | - Laura J. Bierut
- Washington University School of Medicine, Department of Psychiatry, St. Louis, Missouri, United States of America
| | - Alison Goate
- Washington University School of Medicine, Department of Psychiatry, St. Louis, Missouri, United States of America
| | - John P. Rice
- Washington University School of Medicine, Department of Psychiatry, St. Louis, Missouri, United States of America
| |
Collapse
|
5
|
Abstract
Association mapping has successfully identified common SNPs associated with many diseases. However, the inability of this class of variation to account for most of the supposed heritability has led to a renewed interest in methods - primarily linkage analysis - to detect rare variants. Family designs allow for control of population stratification, investigations of questions such as parent-of-origin effects and other applications that are imperfectly or not readily addressed in case-control association studies. This article guides readers through the interface between linkage and association analysis, reviews the new methodologies and provides useful guidelines for applications. Just as effective SNP-genotyping tools helped to realize the potential of association studies, next-generation sequencing tools will benefit genetic studies by improving the power of family-based approaches.
Collapse
|