1
|
Biswas S, Lin S. A Bayesian Approach for Incorporating Variable Rates of Heterogeneity in Linkage Analysis. J Am Stat Assoc 2012. [DOI: 10.1198/016214506000000609] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Affiliation(s)
- Swati Biswas
- Swati Biswas is Assistant Professor, Department of Biostatistics, University of North Texas Health Science Center, Fort Worth, TX 76107 . Shili Lin is Professor, Department of Statistics, The Ohio State University, Columbus, OH 43210 . This work was supported in part by National Science Foundation grants DMS-99-71770 and DMS-03-66800 and National Institutes of Health grant 5R01HG002657-03. The authors sincerely thank Chris Amos for providing the lung cancer dataset of Genetic Epidemiology of Lung Cancer
| | - Shili Lin
- Swati Biswas is Assistant Professor, Department of Biostatistics, University of North Texas Health Science Center, Fort Worth, TX 76107 . Shili Lin is Professor, Department of Statistics, The Ohio State University, Columbus, OH 43210 . This work was supported in part by National Science Foundation grants DMS-99-71770 and DMS-03-66800 and National Institutes of Health grant 5R01HG002657-03. The authors sincerely thank Chris Amos for providing the lung cancer dataset of Genetic Epidemiology of Lung Cancer
| |
Collapse
|
2
|
Seok SC, Evans M, Vieland VJ. Fast and accurate calculation of a computationally intensive statistic for mapping disease genes. J Comput Biol 2009; 16:659-76. [PMID: 19432537 DOI: 10.1089/cmb.2008.0175] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Many statistical methods in biology utilize numerical integration in order to deal with moderately high-dimensional parameter spaces without closed form integrals. One such method is the PPL, a class of models for mapping and modeling genes for complex human disorders. While the most common approach to numerical integration in statistics is MCMC, this is not a good option for the PPL for a variety of reasons, leading us to develop an alternative integration method for this application. We utilize an established sub-region adaptive integration method, but adapt it to specific features of our application. These include division of the multi-dimensional integrals into three separate layers, implementing internal constraints on the parameter space, and calibrating the approximation to ensure adequate precision of results for our application. The proposed approach is compared with an empirically driven fixed grid scheme as well as other numerical integration methods. The new method is shown to require far fewer function evaluations compared to the alternatives while matching or exceeding the best of them in terms of accuracy. The savings in evaluations is sufficiently large that previously intractable problems are now feasible in real time.
Collapse
Affiliation(s)
- Sang-Cheol Seok
- Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children's Hospital, Columbus, Ohio 43205, USA.
| | | | | |
Collapse
|
3
|
Biswas S, Lin S. Incorporating covariates in mapping heterogeneous traits: a hierarchical model using empirical Bayes estimation. Genet Epidemiol 2008; 31:684-96. [PMID: 17487892 DOI: 10.1002/gepi.20233] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Complex genetic traits are inherently heterogeneous, i.e., they may be caused by different genes, or non-genetic factors, in different individuals. So, for mapping genes responsible for these diseases using linkage analysis, heterogeneity must be accounted for in the model. Heterogeneity across different families can be modeled using a mixture distribution by letting each family have its own heterogeneity parameter denoting the probability that its disease-causing gene is linked to the marker map under consideration. A substantial gain in power is expected if covariates that can discriminate between the families of linked and unlinked types are incorporated in this modeling framework. To this end, we propose a hierarchical Bayesian model, in which the families are grouped according to various (categorized) levels of covariate(s). The heterogeneity parameters of families within each group are assigned a common prior, whose parameters are further assigned hyper-priors. The hyper-parameters are obtained by utilizing the empirical Bayes estimates. We also address related issues such as evaluating whether the covariate(s) under consideration are informative and grouping of families. We compare the proposed approach with one that does not utilize covariates and show that our approach leads to considerable gains in power to detect linkage and in precision of interval estimates through various simulation scenarios. An application to the asthma datasets of Genetic Analysis Workshop 12 also illustrates this gain in a real data analysis. Additionally, we compare the performances of microsatellite markers and single nucleotide polymorphisms for our approach and find that the latter clearly outperforms the former.
Collapse
Affiliation(s)
- Swati Biswas
- Department of Biostatistics, School of Public Health, University of North Texas Health Science Center, Fort Worth, TX 76107-2699, USA.
| | | |
Collapse
|
4
|
Logue MW, Li Y. Computation of the posterior probability of linkage using 'high effect' genetic model priors. Hum Hered 2008; 66:25-34. [PMID: 18223315 DOI: 10.1159/000114163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2007] [Accepted: 08/30/2007] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVES The posterior probability of linkage, or PPL, directly measures the probability that a disease gene is linked to a marker. By placing a Bayesian prior on the elements of the genetic model, it allows for an unknown genetic model without the inflationary effects of maximization. The standard technique uses essentially uniform priors over the elements of the penetrance vector. However, much of the parameter space corresponds to models that seem unlikely to yield substantial evidence for linkage: for example, models with very high phenocopy rates. METHODS A new class of priors on the elements of the genetic model is examined both theoretically and in simulations. These priors place 0% probability over models with low sibling relative risk, lambda(s). RESULTS Focusing the prior probability on high lambda(s) models does tend to increase the mean PPL for linked markers, and to decrease the mean PPL for unlinked markers. However, the power to detect linkage remains virtually unchanged. Moreover, under these priors, the PPL occasionally yields unacceptably high values under no linkage. CONCLUSIONS It appears important to retain prior probability over apparently 'uninformative' genetic models to accurately characterize the amount of evidence for linkage represented by the data.
Collapse
Affiliation(s)
- M W Logue
- Genetics Program, Boston University School of Medicine, 715 Albany St. L320D, Boston, MA 02118, USA.
| | | |
Collapse
|
5
|
Logue MW, Vieland VJ. The incorporation of prior genomic information does not necessarily improve the performance of Bayesian linkage methods: an example involving sex-specific recombination and the two-point PPL. Hum Hered 2006; 60:196-205. [PMID: 16397399 DOI: 10.1159/000090543] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2005] [Accepted: 10/12/2005] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVE We continue statistical development of the posterior probability of linkage (PPL). We present a two-point PPL allowing for unequal male and female recombination fractions, thetaM and thetaF, and consider alternative priors on thetaM, thetaF. METHODS We compare the sex-averaged PPL (PPLSA), assuming thetaM = thetaF, to the sex-specific PPL (PPLSS) in (thetaM, thetaF), in a series of simulations; we also compute the PPLSS using alternative priors on (thetaM, thetaF). RESULTS The PPLSS based on a prior that ignores prior genomic information on sex specific recombination rates performs essentially identically to the PPLSA, even in the presence of large thetaM, thetaF differences. Moreover, adaptively skewing the prior, to incorporate (correct) genomic information on thetaM, thetaF differences, actually worsens performance of the PPLSS. We demonstrate that this has little to do with the PPLSS per se, but is rather due to extremely high levels of variability in the location of the maximum likelihood estimates of (thetaM, thetaF) in realistic data sets. CONCLUSIONS Incorporating (correct) prior genomic information is not always helpful. We recommend that the PPLSA be used as the standard form of the PPL regardless of the sex-specific recombination rates in the region of the marker in question.
Collapse
Affiliation(s)
- Mark W Logue
- Program in Public Health Genetics, College of Public Health, University of Iowa, Iowa City, Iowa 52242, USA.
| | | |
Collapse
|
6
|
Biswas S, Lin S. Evaluations of maximization procedures for estimating linkage parameters under heterogeneity. Genet Epidemiol 2004; 26:206-17. [PMID: 15022207 DOI: 10.1002/gepi.10314] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Locus heterogeneity is a major problem plaguing the mapping of disease genes responsible for complex genetic traits via linkage analysis. A common feature of several available methods to account for heterogeneity is that they involve maximizing a multidimensional likelihood to obtain maximum likelihood estimates. The high dimensionality of the likelihood surface may be due to multiple heterogeneity (mixing) parameters, linkage parameters, and/or regression coefficients corresponding to multiple covariates. Here, we focus on this nontrivial computational aspect of incorporating heterogeneity by considering several likelihood maximization procedures, including the expectation maximization (EM) algorithm and the stochastic expectation maximization (SEM) algorithm. The wide applicability of these procedures is demonstrated first through a general formulation of accounting for heterogeneity, and then by applying them to two specific formulations. Furthermore, our simulation studies as well as an application to the Genetic Analysis Workshop 12 asthma datasets show that, among other observations, SEM performs better than EM. As an aside, we illustrate a limitation of the popular admixture approach for incorporating heterogeneity, proved elsewhere. We also show how to obtain standard errors (SEs) for EM and SEM estimates, using methods available in the literature. These SEs can then be combined with the corresponding estimates to provide confidence intervals of the parameters.
Collapse
Affiliation(s)
- Swati Biswas
- Department of Statistics, Ohio State University, Columbus, Ohio 43210, USA
| | | |
Collapse
|
7
|
Yang X, Wang K, Huang J, Vieland VJ. Genome-wide linkage analysis of blood pressure under locus heterogeneity. BMC Genet 2003; 4 Suppl 1:S78. [PMID: 14975146 PMCID: PMC1866517 DOI: 10.1186/1471-2156-4-s1-s78] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
We describe a method for mapping quantitative trait loci that allows for locus heterogeneity. A genome-wide linkage analysis of blood pressure was performed using sib-pair data from the Framingham Heart Study. Evidence of linkage was found on four markers (GATA89G08, GATA23D06, GATA14E09, and 049xd2) at a significance level of 0.01. Two of them (GATA14E09 and 049xd2) seem to overlap with linkage signals reported previously, while the other two are not linked to any known signals.
Collapse
Affiliation(s)
- Xinqun Yang
- Department of Biostatistics, Division of Statistical Genetics, The University of Iowa, Iowa City, USA
| | - Kai Wang
- Department of Biostatistics, Division of Statistical Genetics, The University of Iowa, Iowa City, USA
| | - Jian Huang
- Department of Biostatistics, Division of Statistical Genetics, The University of Iowa, Iowa City, USA
- Department of Statistics and Actuarial Science, The University of Iowa, Iowa City, USA
| | - Veronica J Vieland
- Department of Biostatistics, Division of Statistical Genetics, The University of Iowa, Iowa City, USA
- Department of Psychiatry, The University of Iowa, Iowa City, USA
| |
Collapse
|
8
|
Logue MW, Goedken RJ, Vieland VJ. A model-integrated multipoint Bayesian analysis of hypertension in the Framingham Heart Study data finds little evidence of linkage. BMC Genet 2003; 4 Suppl 1:S75. [PMID: 14975143 PMCID: PMC1866514 DOI: 10.1186/1471-2156-4-s1-s75] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
This Genetic Analysis Workshop 13 contribution presents a linkage analysis of hypertension in the Framingham data based on the posterior probability of linkage, or PPL. We dichotomized the phenotype, coding individuals who had been treated for hypertension at any time, as well as those with repeated high blood pressure measurements, as affected. Here we use a new variation on the multipoint PPL that incorporates integration over the genetic model. PPLs were computed for chromosomes 1 through 5, 11, 14, and 17 and remained below the 2% assumed prior probability of linkage for 73% of the locations examined. The maximum PPL of 4.5% was obtained on chromosome 1 at 178 cM. Although this is more than twice the assumed prior probability of linkage, it is well below a level at which we would recommend committing substantial additional resources to molecular follow-up. While the PPL analysis of this data remains inconclusive, Bayesian methodology gives us a clear mechanism for using the information gained here in further studies.
Collapse
MESH Headings
- Bayes Theorem
- Chromosome Mapping/statistics & numerical data
- Chromosomes, Human, Pair 1/genetics
- Chromosomes, Human, Pair 11/genetics
- Chromosomes, Human, Pair 14/genetics
- Chromosomes, Human, Pair 17/genetics
- Chromosomes, Human, Pair 2/genetics
- Chromosomes, Human, Pair 3/genetics
- Chromosomes, Human, Pair 4/genetics
- Chromosomes, Human, Pair 5/genetics
- Epidemiologic Studies
- Genetic Linkage/genetics
- Humans
- Hypertension/genetics
- Models, Genetic
- Phenotype
Collapse
Affiliation(s)
- Mark W Logue
- Division of Statistical Genetics, Department of Biostatistics, College of Public Health, University of Iowa, Iowa City, Iowa, USA
- Center for Statistical Genetics Research, College of Public Health and Roy J. & Lucille A. Carver College of Medicine, University of Iowa, Iowa City, Iowa, USA
- Department of Psychiatry, Roy J. & Lucille A. Carver College of Medicine, University of Iowa, Iowa City, Iowa, USA
| | - Rhinda J Goedken
- Center for Statistical Genetics Research, College of Public Health and Roy J. & Lucille A. Carver College of Medicine, University of Iowa, Iowa City, Iowa, USA
| | - Veronica J Vieland
- Division of Statistical Genetics, Department of Biostatistics, College of Public Health, University of Iowa, Iowa City, Iowa, USA
- Center for Statistical Genetics Research, College of Public Health and Roy J. & Lucille A. Carver College of Medicine, University of Iowa, Iowa City, Iowa, USA
- Department of Psychiatry, Roy J. & Lucille A. Carver College of Medicine, University of Iowa, Iowa City, Iowa, USA
| |
Collapse
|
9
|
Abstract
Many family-based tests of linkage disequilibrium are not valid when related nuclear families from larger pedigrees are used, or when independent nuclear families with multiple cases are used. The Pedigree Disequilibrium Test (PDT) proposed by Martin et al. [Am J Hum Genet 67:146-54, 2000] avoids these problems. This paper sketches an extension of the PDT that can account for measured covariates. Where the PDT is based on allele-counting methods, this extension is based on conditional logistic regression. Versions of these statistics were used to test for association between disease and two known functional single nucleotide polymorphisms (SNPs) on gene 1 and gene 6 and one inert SNP on gene 7 in the first 25 replicates of the simulated population-isolate data. The new method was also used to test for linkage disequilibrium after correcting for the effect of the environmental factor E1. The PDT and the conditional logistic extension had similar power to detect the functional SNPs (100% for gene 1, approximately 50% for gene 6) and appropriate type I error rates for the inert SNP. Correcting for E1 slightly increased power to detect the association between gene 6 and disease.
Collapse
Affiliation(s)
- P Kraft
- Department of Preventive Medicine, University of Southern California, 1540 Alcazar, CHP 218 MC 9010, Los Angeles, CA 90089-9010, USA
| |
Collapse
|