1
|
Hsieh AR, Chen DP, Chattopadhyay AS, Li YJ, Chang CC, Fann CSJ. A non-threshold region-specific method for detecting rare variants in complex diseases. PLoS One 2017; 12:e0188566. [PMID: 29190701 PMCID: PMC5708778 DOI: 10.1371/journal.pone.0188566] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Accepted: 11/09/2017] [Indexed: 11/23/2022] Open
Abstract
A region-specific method, NTR (non-threshold rare) variant detection method, was developed—it does not use the threshold for defining rare variants and accounts for directions of effects. NTR also considers linkage disequilibrium within the region and accommodates common and rare variants simultaneously. NTR weighs variants according to minor allele frequency and odds ratio to combine the effects of common and rare variants on disease occurrence into a single score and provides a test statistic to assess the significance of the score. In the simulations, under different effect sizes, the power of NTR increased as the effect size increased, and the type I error of our method was controlled well. Moreover, NTR was compared with several other existing methods, including the combined multivariate and collapsing method (CMC), weighted sum statistic method (WSS), sequence kernel association test (SKAT), and its modification, SKAT-O. NTR yields comparable or better power in simulations, especially when the effects of linkage disequilibrium between variants were at least moderate. In an analysis of diabetic nephropathy data, NTR detected more confirmed disease-related genes than the other aforementioned methods. NTR can thus be used as a complementary tool to help in dissecting the etiology of complex diseases.
Collapse
Affiliation(s)
- Ai-Ru Hsieh
- Graduate Institute of Biostatistics, China Medical University, Taichung, Taiwan
| | - Dao-Peng Chen
- Institute of Biomedical Sciences, Academia Sinica, Nankang, Taipei, Taiwan
| | | | - Ying-Ju Li
- Institute of Biomedical Sciences, Academia Sinica, Nankang, Taipei, Taiwan
| | - Chien-Ching Chang
- Institute of Biomedical Sciences, Academia Sinica, Nankang, Taipei, Taiwan
| | - Cathy S. J. Fann
- Institute of Biomedical Sciences, Academia Sinica, Nankang, Taipei, Taiwan
- * E-mail:
| |
Collapse
|
2
|
Maresso K, Broeckel U. Genotyping platforms for mass-throughput genotyping with SNPs, including human genome-wide scans. ADVANCES IN GENETICS 2008; 60:107-39. [PMID: 18358318 DOI: 10.1016/s0065-2660(07)00405-1] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
The completion of the Human Genome Project (HGP) in 2003 brought the scientific community one step closer to identifying the genes underlying common, polygenic diseases. Prior to this achievement, the goal of identifying the genetic factors responsible for diseases presenting substantial public health burdens was elusive. Although the theoretical foundation for disease association studies had been discussed before the completion of the HGP, obstacles remained at that time before such studies could be considered feasible. One of these obstacles was the identification and mapping of numerous polymorphisms that could be easily and inexpensively typed. However, this challenge was overcome with the sequencing of the human genome and the subsequent cataloging of single-nucleotide polymorphisms (SNPs). The challenge then became how to rapidly and cost-effectively assay a dense set of these SNPs in the large number of samples required for disease association studies of complex traits. This challenge has been recently met as well, with the commercial offering of mass-throughput oligonucleotide array-based genotyping platforms at affordable prices. These platforms have made genome-wide association scans a reality and bring us closer than ever to elucidating the genetic mechanisms of complex disease. Here, we discuss the need for mass-throughput genotyping and then review and evaluate various platforms now available to investigators wishing to undertake high-throughput genotyping projects with SNPs, particularly genome-wide association scans.
Collapse
Affiliation(s)
- Karen Maresso
- Department of Pediatrics, Children's Hospital of Wisconsin, Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | | |
Collapse
|
3
|
Albers CA, Kappen HJ. Modeling linkage disequilibrium in exact linkage computations: a comparison of first-order Markov approaches and the clustered-markers approach. BMC Proc 2007; 1 Suppl 1:S159. [PMID: 18466504 PMCID: PMC2367570 DOI: 10.1186/1753-6561-1-s1-s159] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Recent studies have shown that linkage disequilibrium (LD) between single-nucleotide polymorphism (SNP) markers is widespread. Assuming linkage equilibrium has been shown to cause false positives in linkage studies where parental genotypes are not available. Therefore, linkage analysis methods that can deal with LD are required to accurately analyze SNP marker data sets. We compared three approaches to deal with LD between markers: 1) The clustered-markers approach implemented in the computer program MERLIN; 2) The standard hidden Markov model (HMM) multipoint model augmented with a first-order Markov model for the allele frequencies of the founders, in which we considered both a Bayesian and a maximum-likelihood implementation of this approach; 3) The 'independent' SNPs approach, i.e., removing SNPs from the data set until the remaining SNPs have low levels of LD. We evaluated these approaches on the Illumina 6K SNP data set of affected sib-pairs of Problem 2. We found that the first-order Markov model was able to account for most of the strong LD in this data set. The difference between the Bayesian and maximum- likelihood implementation was small. An advantage of the first-order Markov model is that it does not require the user to specify parameters.
Collapse
Affiliation(s)
- Cornelis A Albers
- Department of Biophysics, Radboud University, 126 Geert Grooteplein 21, Nijmegen, Gelderland 6525EZ The Netherlands.
| | | |
Collapse
|
4
|
Albers CA, Heskes T, Kappen HJ. Haplotype inference in general pedigrees using the cluster variation method. Genetics 2007; 177:1101-16. [PMID: 17660564 PMCID: PMC2034616 DOI: 10.1534/genetics.107.074047] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2007] [Accepted: 07/14/2007] [Indexed: 12/19/2022] Open
Abstract
We present CVMHAPLO, a probabilistic method for haplotyping in general pedigrees with many markers. CVMHAPLO reconstructs the haplotypes by assigning in every iteration a fixed number of the ordered genotypes with the highest marginal probability, conditioned on the marker data and ordered genotypes assigned in previous iterations. CVMHAPLO makes use of the cluster variation method (CVM) to efficiently estimate the marginal probabilities. We focused on single-nucleotide polymorphism (SNP) markers in the evaluation of our approach. In simulated data sets where exact computation was feasible, we found that the accuracy of CVMHAPLO was high and similar to that of maximum-likelihood methods. In simulated data sets where exact computation of the maximum-likelihood haplotype configuration was not feasible, the accuracy of CVMHAPLO was similar to that of state of the art Markov chain Monte Carlo (MCMC) maximum-likelihood approximations when all ordered genotypes were assigned and higher when only a subset of the ordered genotypes was assigned. CVMHAPLO was faster than the MCMC approach and provided more detailed information about the uncertainty in the inferred haplotypes. We conclude that CVMHAPLO is a practical tool for the inference of haplotypes in large complex pedigrees.
Collapse
Affiliation(s)
- Cornelis A Albers
- Department of Cognitive Neuroscience/Biophysics, Institute for Computing and Information Sciences, Radboud University, 6525 EZ Nijmegen, The Netherlands.
| | | | | |
Collapse
|
5
|
Imai K, Ogai Y, Nishizawa D, Kasai S, Ikeda K, Koga H. A novel SNP detection technique utilizing a multiple primer extension (MPEX) on a phospholipid polymer-coated surface. MOLECULAR BIOSYSTEMS 2007; 3:547-53. [PMID: 17639130 DOI: 10.1039/b701645j] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Conventional methods for detecting single nucleotide polymorphisms (SNPs), including direct DNA sequencing, pyrosequencing, and melting curve analysis, are to a great extent limited by their requirement for particular detection instruments. To overcome this limitation, we established a novel SNP detection technique utilizing multiple primer extension (MPEX) on a phospholipid polymer-coated surface. This technique is based on the development of a new plastic S-BIO PrimeSurface with a biocompatible polymer; its surface chemistry offers extraordinarily stable thermal properties, as well as chemical properties advantageous for enzymatic reactions on the surface. To visualize allele-specific PCR products on the surface, biotin-dUTP was incorporated into newly synthesized PCR products during the extension reaction. The products were ultimately detected by carrying out a colorimetric reaction with substrate solution containing 4-nitro-blue tetrazolium chloride (NBT) and 5-bromo-4-chloro-3-indolyl phosphate (BCIP). We demonstrated the significance of this novel SNP detection technique by analyzing representative SNPs on 4 LD blocks of the micro opioid receptor gene. We immobilized 20 allele-specific oligonucleotides on this substrate, and substantially reproduced the results previously obtained by other methods.
Collapse
Affiliation(s)
- Kazuhide Imai
- Laboratory of Medical Genomics, Department of Human Genome Technology, Kazusa DNA Research Institute, 2-6-7 Kazusa-Kamatari, Kisarazu, Chiba, Japan
| | | | | | | | | | | |
Collapse
|
6
|
Warren DM, Dyer TD, Peterson CP, Mahaney MC, Blangero J, Almasy L. A comparison of univariate, bivariate, and trivariate whole-genome linkage screens of genetically correlated electrophysiological endophenotypes. BMC Genet 2005; 6 Suppl 1:S117. [PMID: 16451574 PMCID: PMC1866821 DOI: 10.1186/1471-2156-6-s1-s117] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
We used a maximum-likelihood based multipoint linkage approach implemented in SOLAR to examine simultaneously linkage for three electrophysiological endophenotypes from the Collaborative Study of the Genetics of Alcoholism: TTTH1, TTTH2, and TTTH3. These endophenotypes have been identified as markers of alcohol dependence susceptibility. Data were from 905 individuals in 143 families. Measured covariates considered included sex, age at electrophysiology data collection, habitual smoking status, and the maximum number of drinks consumed in a 24-hour period. Comparisons were made among genome-wide univariate, bivariate, and trivariate linkage analyses using genotypes based on microsatellite markers supplied by the Center for Inherited Disease Research, and genotypes based on single-nucleotide polymorphism markers provided by Illumina. All LODs were corrected to a standard equivalent to 1 degree of freedom. Using the trivariate approach and the microsatellite-based genotypes, we estimated a maximum multipoint linkage signal of LOD = 2.66 on chromosome 7q at 157 cM. Analyses using the Illumina SNP genotypes produced similar results, yielding a maximum multipoint LOD of 2.95 on 7q at 174 cM. These regions of interest correspond to those identified in the univariate and bivariate linkage screens. Our results suggest that trivariate multipoint linkage analyses have utility in the further characterization of chromosomal regions potentially containing genes influencing the phenotypes being examined. Based on a comparison of the number of LOD scores achieving statistical significance, our results suggest that the microsatellite- and Illumina SNP-based genotypes have similar utility for detecting genomic regions of interest.
Collapse
Affiliation(s)
- Diane M Warren
- Department of Genetics, Southwest Foundation for Biomedical Research, P.O. Box 760549, San Antonio, Texas 78245-0549 USA
| | - Thomas D Dyer
- Department of Genetics, Southwest Foundation for Biomedical Research, P.O. Box 760549, San Antonio, Texas 78245-0549 USA
| | - Charles P Peterson
- Department of Genetics, Southwest Foundation for Biomedical Research, P.O. Box 760549, San Antonio, Texas 78245-0549 USA
| | - Michael C Mahaney
- Department of Genetics, Southwest Foundation for Biomedical Research, P.O. Box 760549, San Antonio, Texas 78245-0549 USA
| | - John Blangero
- Department of Genetics, Southwest Foundation for Biomedical Research, P.O. Box 760549, San Antonio, Texas 78245-0549 USA
| | - Laura Almasy
- Department of Genetics, Southwest Foundation for Biomedical Research, P.O. Box 760549, San Antonio, Texas 78245-0549 USA
| |
Collapse
|
7
|
Goode EL, Jarvik GP. Assessment and implications of linkage disequilibrium in genome-wide single-nucleotide polymorphism and microsatellite panels. Genet Epidemiol 2005; 29 Suppl 1:S72-6. [PMID: 16342185 DOI: 10.1002/gepi.20112] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Linkage disequilibrium (LD) between markers is more likely to exist in dense genome-wide single-nucleotide polymorphism (SNP) panels than in microsatellite panels. As part of Genetic Analysis Workshop 14 (GAW14), the extent of LD in the Illumina linkage panel III and the Affymetrix Genechip 10 K mapping array was assessed, using data from the Collaborative Study on the Genetics of Alcoholism (COGA). The impact of LD on linkage results was examined in COGA and simulated data, and characteristics of SNPs were assessed for their ability to detect population substructure and predict haplotypes. The authors of the papers summarized here observed greater LD in the Affymetrix than in the Illumina panel, possibly due to increased marker density in the Affymetrix panel, and found greater LD on chromosome X than on the autosomes. Simulation analyses suggest that intermarker LD can cause an upward bias in linkage statistics; however, the impact of LD on linkage analysis depends on the proportion of ungenotyped founders and the extent of LD. No large effect of LD on linkage peaks was observed in COGA analyses. In addition, the papers summarized here found that SNPs with high minor allele frequencies were the most informative compared with microsatellites for the detection of population substructure, and that SNPs in higher LD, and small numbers of SNPs, were the most reliable for haplotype prediction. As ease of genotyping continues to increase, study design and SNP selection for linkage and association studies (including genome-wide association studies) will be improved with consideration of LD in the particular populations studied.
Collapse
Affiliation(s)
- Ellen L Goode
- Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, Minnesota 55905, USA.
| | | |
Collapse
|