1
|
Diao G, Lin DY. Statistically efficient association analysis of quantitative traits with haplotypes and untyped SNPs in family studies. BMC Genet 2020; 21:99. [PMID: 32894040 PMCID: PMC7487716 DOI: 10.1186/s12863-020-00902-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 08/17/2020] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Associations between haplotypes and quantitative traits provide valuable information about the genetic basis of complex human diseases. Haplotypes also provide an effective way to deal with untyped SNPs. Two major challenges arise in haplotype-based association analysis of family data. First, haplotypes may not be inferred with certainty from genotype data. Second, the trait values within a family tend to be correlated because of common genetic and environmental factors. RESULTS To address these challenges, we present an efficient likelihood-based approach to analyzing associations of quantitative traits with haplotypes or untyped SNPs. This approach properly accounts for within-family trait correlations and can handle general pedigrees with arbitrary patterns of missing genotypes. We characterize the genetic effects on the quantitative trait by a linear regression model with random effects and develop efficient likelihood-based inference procedures. Extensive simulation studies are conducted to examine the performance of the proposed methods. An application to family data from the Childhood Asthma Management Program Ancillary Genetic Study is provided. A computer program is freely available. CONCLUSIONS Results from extensive simulation studies show that the proposed methods for testing the haplotype effects on quantitative traits have correct type I error rates and are more powerful than some existing methods.
Collapse
Affiliation(s)
- Guoqing Diao
- Department of Biostatistics and Bioinformatics, The George Washington University, Washington, District of Columbia, USA.
| | - Dan-Yu Lin
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
2
|
Wu J, Chen GB, Zhi D, Liu N, Zhang K. A hidden Markov model for haplotype inference for present-absent data of clustered genes using identified haplotypes and haplotype patterns. Front Genet 2014; 5:267. [PMID: 25161663 PMCID: PMC4129397 DOI: 10.3389/fgene.2014.00267] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Accepted: 07/21/2014] [Indexed: 11/21/2022] Open
Abstract
The majority of killer cell immunoglobin-like receptor (KIR) genes are detected as either present or absent using locus-specific genotyping technology. Ambiguity arises from the presence of a specific KIR gene since the exact copy number (one or two) of that gene is unknown. Therefore, haplotype inference for these genes is becoming more challenging due to such large portion of missing information. Meantime, many haplotypes and partial haplotype patterns have been previously identified due to tight linkage disequilibrium (LD) among these clustered genes thus can be incorporated to facilitate haplotype inference. In this paper, we developed a hidden Markov model (HMM) based method that can incorporate identified haplotypes or partial haplotype patterns for haplotype inference from present-absent data of clustered genes (e.g., KIR genes). We compared its performance with an expectation maximization (EM) based method previously developed in terms of haplotype assignments and haplotype frequency estimation through extensive simulations for KIR genes. The simulation results showed that the new HMM based method outperformed the previous method when some incorrect haplotypes were included as identified haplotypes and/or the standard deviation of haplotype frequencies were small. We also compared the performance of our method with two methods that do not use previously identified haplotypes and haplotype patterns, including an EM based method, HPALORE, and a HMM based method, MaCH. Our simulation results showed that the incorporation of identified haplotypes and partial haplotype patterns can improve accuracy for haplotype inference. The new software package HaploHMM is available and can be downloaded at http://www.soph.uab.edu/ssg/files/People/KZhang/HaploHMM/haplohmm-index.html.
Collapse
Affiliation(s)
- Jihua Wu
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham Birmingham, AL, USA
| | - Guo-Bo Chen
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham Birmingham, AL, USA ; Queensland Brain Institute, The University of Queensland St. Lucia, QLD, Australia
| | - Degui Zhi
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham Birmingham, AL, USA
| | - Nianjun Liu
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham Birmingham, AL, USA
| | - Kui Zhang
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham Birmingham, AL, USA
| |
Collapse
|
3
|
Lin WY, Tiwari HK, Gao G, Zhang K, Arcaroli JJ, Abraham E, Liu N. Similarity-based multimarker association tests for continuous traits. Ann Hum Genet 2012; 76:246-60. [PMID: 22497480 DOI: 10.1111/j.1469-1809.2012.00706.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Testing multiple markers simultaneously not only can capture the linkage disequilibrium patterns but also can decrease the number of tests and thus alleviate the multiple-testing penalty. If a gene is associated with a phenotype, subjects with similar genotypes in this gene should also have similar phenotypes. Based on this concept, we have developed a general framework that is applicable to continuous traits. Two similarity-based tests (namely, SIMc and SIMp tests) were derived as special cases of the general framework. In our simulation study, we compared the power of the two tests with that of the single-marker analysis, a standard haplotype regression, and a popular and powerful kernel machine regression. Our SIMc test outperforms other tests when the average R(2) (a measure of linkage disequilibrium) between the causal variant and the surrounding markers is larger than 0.3 or when the causal allele is common (say, frequency = 0.3). Our SIMp test outperforms other tests when the causal variant was introduced at common haplotypes (the maximum frequency of risk haplotypes >0.4). We also applied our two tests to an adiposity data set to show their utility.
Collapse
Affiliation(s)
- Wan-Yu Lin
- Department of Biostatistics, University of Alabama at Birmingham, USA
| | | | | | | | | | | | | |
Collapse
|
4
|
Lin WY, Liu N. Reducing bias of allele frequency estimates by modeling SNP genotype data with informative missingness. Front Genet 2012; 3:107. [PMID: 22719749 PMCID: PMC3376470 DOI: 10.3389/fgene.2012.00107] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2012] [Accepted: 05/25/2012] [Indexed: 01/30/2023] Open
Abstract
The presence of missing single-nucleotide polymorphism (SNP) genotypes is common in genetic studies. For studies with low-density SNPs, the most commonly used approach to dealing with genotype missingness is to simply remove the observations with missing genotypes from the analyses. This naïve method is straightforward but is valid only when the missingness is random. However, a given assay often has a different capability in genotyping heterozygotes and homozygotes, causing the phenomenon of "differential dropout" in the sense that the missing rates of heterozygotes and homozygotes are different. In practice, differential dropout among genotypes exists in even carefully designed studies, such as the data from the HapMap project and the Wellcome Trust Case Control Consortium. Under the assumption of Hardy-Weinberg equilibrium and no genotyping error, we here propose a statistical method to model the differential dropout among different genotypes. Compared with the naïve method, our method provides more accurate allele frequency estimates when the differential dropout is present. To demonstrate its practical use, we further apply our method to the HapMap data and a scleroderma data set.
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University Taipei, Taiwan
| | | |
Collapse
|
5
|
Liu N, Bucala R, Zhao H. Modeling Informatively Missing Genotypes in Haplotype Analysis. COMMUN STAT-THEOR M 2009; 38:3445-3460. [PMID: 20052310 DOI: 10.1080/03610920802696588] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
It is common to have missing genotypes in practical genetic studies. The majority of the existing statistical methods, including those on haplotype analysis, assume that genotypes are missing at random-that is, at a given marker, different genotypes and different alleles are missing with the same probability. In our previous work, we have demonstrated that the violation of this assumption may lead to serious bias in haplotype frequency estimates and haplotype association analysis. We have proposed a general missing data model to simultaneously characterize missing data patterns across a set of two or more biallelic markers. We have proved that haplotype frequencies and missing data probabilities are identifiable if and only if there is linkage disequilibrium between these markers under the general missing data model. In this study, we extend our work to multi-allelic markers and observe a similar finding. Simulation studies on the analysis of haplotypes consisting of two markers illustrate that our proposed model can reduce the bias for haplotype frequency estimates due to incorrect assumptions on the missing data mechanism. Finally, we illustrate the utilities of our method through its application to a real data set from a study of scleroderma.
Collapse
Affiliation(s)
- Nianjun Liu
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL
| | | | | |
Collapse
|
6
|
Abstract
Association methods based on linkage disequilibrium (LD) offer a promising approach for detecting genetic variations that are responsible for complex human diseases. Although methods based on individual single nucleotide polymorphisms (SNPs) may lead to significant findings, methods based on haplotypes comprising multiple SNPs on the same inherited chromosome may provide additional power for mapping disease genes and also provide insight on factors influencing the dependency among genetic markers. Such insights may provide information essential for understanding human evolution and also for identifying cis-interactions between two or more causal variants. Because obtaining haplotype information directly from experiments can be cost prohibitive in most studies, especially in large scale studies, haplotype analysis presents many unique challenges. In this chapter, we focus on two main issues: haplotype inference and haplotype-association analysis. We first provide a detailed review of methods for haplotype inference using unrelated individuals as well as related individuals from pedigrees. We then cover a number of statistical methods that employ haplotype information in association analysis. In addition, we discuss the advantages and limitations of different methods.
Collapse
Affiliation(s)
- Nianjun Liu
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | | | | |
Collapse
|
7
|
Yu Z, Schaid DJ. Methods to impute missing genotypes for population data. Hum Genet 2007; 122:495-504. [PMID: 17851696 DOI: 10.1007/s00439-007-0427-y] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2007] [Accepted: 08/30/2007] [Indexed: 01/23/2023]
Abstract
For large-scale genotyping studies, it is common for most subjects to have some missing genetic markers, even if the missing rate per marker is low. This compromises association analyses, with varying numbers of subjects contributing to analyses when performing single-marker or multi-marker analyses. In this paper, we consider eight methods to infer missing genotypes, including two haplotype reconstruction methods (local expectation maximization-EM, and fastPHASE), two k-nearest neighbor methods (original k-nearest neighbor, KNN, and a weighted k-nearest neighbor, wtKNN), three linear regression methods (backward variable selection, LM.back, least angle regression, LM.lars, and singular value decomposition, LM.svd), and a regression tree, Rtree. We evaluate the accuracy of them using single nucleotide polymorphism (SNP) data from the HapMap project, under a variety of conditions and parameters. We find that fastPHASE has the lowest error rates across different analysis panels and marker densities. LM.lars gives slightly less accurate estimate of missing genotypes than fastPHASE, but has better performance than the other methods.
Collapse
Affiliation(s)
- Zhaoxia Yu
- Department of Statistics, University of California, Irvine, CA 92697, USA.
| | | |
Collapse
|
8
|
Mensah FK, Gilthorpe MS, Davies CF, Keen LJ, Adamson PJ, Roman E, Morgan GJ, Bidwell JL, Law GR. Haplotype uncertainty in association studies. Genet Epidemiol 2007; 31:348-57. [PMID: 17323369 DOI: 10.1002/gepi.20215] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Inferring haplotypes from genotype data is commonly undertaken in population genetic association studies. Within such studies the importance of accounting for uncertainty in the inference of haplotypes is well recognised. We investigate the effectiveness of correcting for uncertainty using simple methods based on the output provided by the PHASE haplotype inference methodology. In case-control analyses investigating non-Hodgkin lymphoma and haplotypes associated with immune regulation we find little effect of making adjustment for uncertainty in inferred haplotypes. Using simulation we introduce a higher degree of haplotype uncertainty than was present in our study data. The simulation represents two genetic loci, physically close on a chromosome, forming haplotypes. Considering a range of allele frequencies, degrees of linkage between the loci, and frequency of missing genotype data, we detail the characteristics of genetic regions which may be susceptible to the influence of haplotype uncertainty. Within our evaluation we find that bias is avoided by considering haplotype probabilities or using multiple imputation, provided that for each of these methods haplotypes are inferred separately for case and control populations; furthermore using multiple imputation provides the facility to incorporate haplotype uncertainty in the estimation of confidence intervals. We discuss the implications of our findings within the context of the complexity of haplotype inference for larger marker rich regions as would typically be encountered in genetic analyses.
Collapse
Affiliation(s)
- F K Mensah
- Department of Health Sciences, Epidemiology and Genetics Unit, University of York, York, United Kingdom
| | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Chen X, Wang X, Hossain S, O'Neill FA, Walsh D, Pless L, Chowdari KV, Nimgaonkar VL, Schwab SG, Wildenauer DB, Sullivan PF, van den Oord E, Kendler KS. Haplotypes spanning SPEC2, PDZ-GEF2 and ACSL6 genes are associated with schizophrenia. Hum Mol Genet 2006; 15:3329-42. [PMID: 17030554 DOI: 10.1093/hmg/ddl409] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Chromosome 5q22-33 is a region where studies have repeatedly found evidence for linkage to schizophrenia. In this report, we took a stepwise approach to systematically map this region in the Irish Study of High Density Schizophrenia Families (ISHDSF, 267 families, 1337 subjects) sample. We typed 289 SNPs in the critical interval of 8 million basepairs and found a 758 kb interval coding for the SPEC2/PDZ-GEF2/ACSL6 genes to be associated with the disease. Using sex and genotype-conditioned transmission disequilibrium test analyses, we found that 19 of the 24 typed markers were associated with the disease and the associations were sex-specific. We replicated these findings with an Irish case-control sample (657 cases and 414 controls), an Irish parent-proband trio sample (187 families, 564 subjects), a German nuclear family sample (211 families, 751 subjects) and a Pittsburgh nuclear family sample (247 families, 729 subjects). In all four samples, we replicated the sex-specific associations at the levels of both individual markers and haplotypes using sex- and genotype-conditioned analyses. Three risk haplotypes were identified in the five samples, and each haplotype was found in at least two samples. Consistent with the discovery of multiple estrogen-response elements in this region, our data showed that the impact of these haplotypes on risk for schizophrenia differed in males and females. From these data, we concluded that haplotypes underlying the SPEC2/PDZ-GEF2/ACSL6 region are associated with schizophrenia. However, due to the extended high LD in this region, we were unable to distinguish whether the association signals came from one or more of these genes.
Collapse
Affiliation(s)
- Xiangning Chen
- Department of Psychiatry and Virginia Institute for Psychiatric and Behavior Genetics, Virginia Commonwealth University, Richimond, VA 23298, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|