1
|
Link V, Schraiber JG, Fan C, Dinh B, Mancuso N, Chiang CWK, Edge MD. Tree-based QTL mapping with expected local genetic relatedness matrices. Am J Hum Genet 2023; 110:2077-2091. [PMID: 38065072 PMCID: PMC10716520 DOI: 10.1016/j.ajhg.2023.10.017] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Revised: 10/26/2023] [Accepted: 10/27/2023] [Indexed: 12/18/2023] Open
Abstract
Understanding the genetic basis of complex phenotypes is a central pursuit of genetics. Genome-wide association studies (GWASs) are a powerful way to find genetic loci associated with phenotypes. GWASs are widely and successfully used, but they face challenges related to the fact that variants are tested for association with a phenotype independently, whereas in reality variants at different sites are correlated because of their shared evolutionary history. One way to model this shared history is through the ancestral recombination graph (ARG), which encodes a series of local coalescent trees. Recent computational and methodological breakthroughs have made it feasible to estimate approximate ARGs from large-scale samples. Here, we explore the potential of an ARG-based approach to quantitative-trait locus (QTL) mapping, echoing existing variance-components approaches. We propose a framework that relies on the conditional expectation of a local genetic relatedness matrix (local eGRM) given the ARG. Simulations show that our method is especially beneficial for finding QTLs in the presence of allelic heterogeneity. By framing QTL mapping in terms of the estimated ARG, we can also facilitate the detection of QTLs in understudied populations. We use local eGRM to analyze two chromosomes containing known body size loci in a sample of Native Hawaiians. Our investigations can provide intuition about the benefits of using estimated ARGs in population- and statistical-genetic methods in general.
Collapse
Affiliation(s)
- Vivian Link
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Joshua G Schraiber
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Caoqi Fan
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Bryan Dinh
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Nicholas Mancuso
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Charleston W K Chiang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Michael D Edge
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
2
|
Evolution of Clustering Quantified by a Stochastic Method—Case Studies on Natural and Human Social Structures. SUSTAINABILITY 2020. [DOI: 10.3390/su12197972] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Clustering structures appearing from small to large scales are ubiquitous in the physical world. Interestingly, clustering structures are omnipresent in human history too, ranging from the mere organization of life in societies (e.g., urbanization) to the development of large-scale infrastructure and policies for meeting organizational needs. Indeed, in its struggle for survival and progress, mankind has perpetually sought the benefits of unions. At the same time, it is acknowledged that as the scale of the projects grows, the cost of the delivered products is reduced while their quantities are maximized. Thus, large-scale infrastructures and policies are considered advantageous and are constantly being pursued at even great scales. This work develops a general method to quantify the temporal evolution of clustering, using a stochastic computational tool called 2D-C, which is applicable for the study of both natural and human social spatial structures. As case studies, the evolution of the structure of the universe, of ecosystems and of human clustering structures such as urbanization, are investigated using novel sources of spatial information. Results suggest the clear existence both of periods of clustering and declustering in the natural world and in the human social structures; yet clustering is the general trend. In view of the ongoing COVID-19 pandemic, societal challenges arising from large-scale clustering structures are discussed.
Collapse
|
3
|
Chiba-Falek O, Lutz MW. Towards precision medicine in Alzheimer's disease: deciphering genetic data to establish informative biomarkers. EXPERT REVIEW OF PRECISION MEDICINE AND DRUG DEVELOPMENT 2017; 2:47-55. [PMID: 28944295 DOI: 10.1080/23808993.2017.1286227] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
INTRODUCTION Developing biomarker tools for identification of individuals at high-risk for late-onset Alzheimer's disease (LOAD) is important for prognosis and early treatment. This review focuses on genetic factors and their potential role for precision medicine in LOAD. AREAS COVERED APOEe4 is the strongest genetic risk factor for non-Mendelian LOAD, and the APOE-linkage disequilibrium (LD) region has produced the most significant association signal in multi-center genome-wide-association-studies (GWAS). Consideration of extended haplotypes in the APOE-LD region and specifically, non-coding variants in putative enhancer elements, such as the TOMM40-polyT, in-addition to the coding variants that comprise the APOE-genotypes, may be useful for predicting subjects at high-risk of developing LOAD and estimating age-of-onset of early disease-stage symptoms. A genetic-biomarker based on APOE-TOMM40-polyT haplotypes, and age is currently applied in a clinical trial for prevention/delay of LOAD onset. Additionally, we discuss LOAD-GWAS discoveries and the development of new genetic risk scores based on LOAD-GWAS findings other than the APOE-LD region. EXPERT COMMENTARY Deciphering the precise causal genetic-variants within LOAD-GWAS regions will advance the development of genetic-biomarkers to complement and refine the APOE-LD region based prediction model. Collectively, the genetic-biomarkers will be translational for early diagnosis and enrichment of clinical trials with subjects at high-risk.
Collapse
Affiliation(s)
- Ornit Chiba-Falek
- Department of Neurology, Duke University Medical Center, Durham, NC 27710, USA.,Center for Genomic and Computational Biology, Duke University Medical Center, Durham, NC 27710, USA
| | - Michael W Lutz
- Department of Neurology, Duke University Medical Center, Durham, NC 27710, USA
| |
Collapse
|
4
|
N’Diaye A, Haile JK, Cory AT, Clarke FR, Clarke JM, Knox RE, Pozniak CJ. Single Marker and Haplotype-Based Association Analysis of Semolina and Pasta Colour in Elite Durum Wheat Breeding Lines Using a High-Density Consensus Map. PLoS One 2017; 12:e0170941. [PMID: 28135299 PMCID: PMC5279799 DOI: 10.1371/journal.pone.0170941] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2016] [Accepted: 01/12/2017] [Indexed: 12/30/2022] Open
Abstract
Association mapping is usually performed by testing the correlation between a single marker and phenotypes. However, because patterns of variation within genomes are inherited as blocks, clustering markers into haplotypes for genome-wide scans could be a worthwhile approach to improve statistical power to detect associations. The availability of high-density molecular data allows the possibility to assess the potential of both approaches to identify marker-trait associations in durum wheat. In the present study, we used single marker- and haplotype-based approaches to identify loci associated with semolina and pasta colour in durum wheat, the main objective being to evaluate the potential benefits of haplotype-based analysis for identifying quantitative trait loci. One hundred sixty-nine durum lines were genotyped using the Illumina 90K Infinium iSelect assay, and 12,234 polymorphic single nucleotide polymorphism (SNP) markers were generated and used to assess the population structure and the linkage disequilibrium (LD) patterns. A total of 8,581 SNPs previously localized to a high-density consensus map were clustered into 406 haplotype blocks based on the average LD distance of 5.3 cM. Combining multiple SNPs into haplotype blocks increased the average polymorphism information content (PIC) from 0.27 per SNP to 0.50 per haplotype. The haplotype-based analysis identified 12 loci associated with grain pigment colour traits, including the five loci identified by the single marker-based analysis. Furthermore, the haplotype-based analysis resulted in an increase of the phenotypic variance explained (50.4% on average) and the allelic effect (33.7% on average) when compared to single marker analysis. The presence of multiple allelic combinations within each haplotype locus offers potential for screening the most favorable haplotype series and may facilitate marker-assisted selection of grain pigment colour in durum wheat. These results suggest a benefit of haplotype-based analysis over single marker analysis to detect loci associated with colour traits in durum wheat.
Collapse
Affiliation(s)
- Amidou N’Diaye
- Department of Plant Sciences and Crop Development Centre, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Jemanesh K. Haile
- Department of Plant Sciences and Crop Development Centre, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Aron T. Cory
- Department of Plant Sciences and Crop Development Centre, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Fran R. Clarke
- Semiarid Prairie Agricultural Research Centre, Agriculture and Agri-Food Canada, Swift Current, Saskatchewan, Canada
| | - John M. Clarke
- Department of Plant Sciences and Crop Development Centre, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Ron E. Knox
- Semiarid Prairie Agricultural Research Centre, Agriculture and Agri-Food Canada, Swift Current, Saskatchewan, Canada
| | - Curtis J. Pozniak
- Department of Plant Sciences and Crop Development Centre, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| |
Collapse
|
5
|
|
6
|
Association of specific PTEN/10q haplotypes with endometrial cancer phenotypes in African-American and European American women. Gynecol Oncol 2015; 138:434-40. [PMID: 26026735 DOI: 10.1016/j.ygyno.2015.05.024] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2015] [Accepted: 05/22/2015] [Indexed: 12/12/2022]
Abstract
OBJECTIVE Endometrial carcinoma (EC), the most common gynecologic malignancy in the United States, affects European American (EA) women more frequently than African-American (AA) women. Yet, AA women are more likely to die from EC. Proposed etiologies for this racial disparity, such as socioeconomic status, aggressive, non-endometrioid tumor histology, and comorbid conditions, do not account for the entire disparity experienced by AA women, suggesting an unexplored genetic component. Germline mutations in PTEN cause Cowden syndrome (CS), which increases lifetime risk of endometrial cancer. In addition, somatic PTEN silencing is one of the most common initiating events in sporadic EC. Therefore, we hypothesized that specific PTEN haplotypes in the AA population may directly predispose AA women to unfavorable tumor characteristics when diagnosed with EC. METHODS We conducted a case-control association study of germline variations in and around the PTEN/10q region between 53 EA and 51 AA EC cases and ethnic controls. RESULTS Eighteen tag SNPs with minor allele frequency ≥0.1, were genotyped and used to reconstruct haplotypes. Forty-eight ancestry informative markers were genotyped control for population stratification. Two haplotypes were overrepresented in AA, and there was a trend towards tumors with higher stage and grade in patients with these haplotypes. One haplotype was overrepresented in the EA population with a trend towards more endometrioid tumors. CONCLUSIONS We show that specific PTEN/10q haplotypes are significantly different between EA and AA individuals (p≤0.02), and specific haplotypes may increase the risk of unfavorable tumor phenotypes in AA women diagnosed with EC.
Collapse
|
7
|
Gupta PK, Kulwal PL, Jaiswal V. Association mapping in crop plants: opportunities and challenges. ADVANCES IN GENETICS 2014; 85:109-47. [PMID: 24880734 DOI: 10.1016/b978-0-12-800271-1.00002-0] [Citation(s) in RCA: 76] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
The research area of association mapping (AM) is currently receiving major attention for genetic studies of quantitative traits in all major crops. However, the level of success and utility of AM achieved for crop improvement is not comparable to that in the area of human health care for diagnosis of complex human diseases. These AM studies in plants, as in humans, became possible due to the availability of DNA-based molecular markers and a variety of sophisticated statistical tools that are evolving on a regular basis. In this chapter, we first briefly review the significance of a variety of populations that are used in AM studies, then briefly describe the molecular markers and high-throughput genotyping strategies, and finally describe the approaches used for AM studies. The major part of the chapter is, however, devoted to analysis of reasons why the results of AM have been underutilized in plant breeding. We also examine the opportunities available and challenges faced while using AM for crop improvement programs. This includes a detailed discussion of the issues that have plagued AM studies, and the solutions that have become available to deal with these issues, so that in future, the results of AM studies may prove increasingly fruitful for crop improvement programs.
Collapse
Affiliation(s)
- Pushpendra K Gupta
- Department of Genetics and Plant Breeding, Ch. Charan Singh University, Meerut, UP, India
| | - Pawan L Kulwal
- State Level Biotechnology Centre, Mahatma Phule Agricultural University, Rahuri, MS, India
| | - Vandana Jaiswal
- Department of Genetics and Plant Breeding, Ch. Charan Singh University, Meerut, UP, India
| |
Collapse
|
8
|
Zhao LP, Huang X. Recursive organizer (ROR): an analytic framework for sequence-based association analysis. Hum Genet 2013; 132:745-59. [PMID: 23494241 DOI: 10.1007/s00439-013-1285-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2012] [Accepted: 03/03/2013] [Indexed: 12/13/2022]
Abstract
The advent of next-generation sequencing technologies affords the ability to sequence thousands of subjects cost-effectively, and is revolutionizing the landscape of genetic research. With the evolving genotyping/sequencing technologies, it is not unrealistic to expect that we will soon obtain a pair of diploidic fully phased genome sequences from each subject in the near future. Here, in light of this potential, we propose an analytic framework called, recursive organizer (ROR), which recursively groups sequence variants based upon sequence similarities and their empirical disease associations, into fewer and potentially more interpretable super sequence variants (SSV). As an illustration, we applied ROR to assess an association between HLA-DRB1 and type 1 diabetes (T1D), discovering SSVs of HLA-DRB1 with sequence data from the Wellcome Trust Case Control Consortium. Specifically, ROR reduces 36 observed unique HLA-DRB1 sequences into 8 SSVs that empirically associate with T1D, a fourfold reduction of sequence complexity. Using HLA-DRB1 data from Type 1 Diabetes Genetics Consortium as cases and data from Fred Hutchinson Cancer Research Center as controls, we are able to validate associations of these SSVs with T1D. Further, SSVs consist of nine nucleotides, and each associates with its corresponding amino acids. Detailed examination of these selected amino acids reveals their potential functional roles in protein structures and possible implication to the mechanism of T1D.
Collapse
Affiliation(s)
- Lue Ping Zhao
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, Mailstop M2-B500, P.O. Box 19024, Seattle, WA 98109-1024, USA.
| | | |
Collapse
|
9
|
Powell JE, Kranis A, Floyd J, Dekkers JCM, Knott S, Haley CS. Optimal use of regression models in genome-wide association studies. Anim Genet 2011; 43:133-43. [PMID: 22404349 DOI: 10.1111/j.1365-2052.2011.02234.x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The performance of linear regression models in genome-wide association studies is influenced by how marker information is parameterized in the model. Considering the impact of parameterization is especially important when using information from multiple markers to test for association. Properties of the population, such as linkage disequilibrium (LD) and allele frequencies, will also affect the ability of a model to provide statistical support for an underlying quantitative trait locus (QTL). Thus, for a given location in the genome, the relationship between population properties and model parameterization is expected to influence the performance of the model in providing evidence for the position of a QTL. As LD and allele frequencies vary throughout the genome and between populations, understanding the relationship between these properties and model parameterization is of considerable importance in order to make optimal use of available genomic data. Here, we evaluate the performance of regression-based association models using genotype and haplotype information across the full spectrum of allele frequency and LD scenarios. Genetic marker data from 200 broiler chickens were used to simulate genomic conditions by selecting individual markers to act as surrogate QTL (sQTL) and then investigating the ability of surrounding markers to estimate sQTL genotypes and provide statistical support for their location. The LD and allele frequencies of markers and sQTL are shown to have a strong effect on the performance of models relative to one another. Our results provide an indication of the best choice of model parameterization given certain scenarios of marker and QTL LD and allele frequencies. We demonstrate a clear advantage of haplotype-based models, which account for phase uncertainty over other models tested, particularly for QTL with low minor allele frequencies. We show that the greatest advantage of haplotype models over single-marker models occurs when LD between markers and the causal locus is low. Under these situations, haplotype models have a greater accuracy of predicting the location of the QTL than other models tested.
Collapse
Affiliation(s)
- J E Powell
- Department of Genetics and Genomics, The Roslin Institute, University of Edinburgh, Roslin, UK.
| | | | | | | | | | | |
Collapse
|
10
|
Mourad R, Sinoquet C, Leray P. Probabilistic graphical models for genetic association studies. Brief Bioinform 2011; 13:20-33. [PMID: 21450805 DOI: 10.1093/bib/bbr015] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Probabilistic graphical models have been widely recognized as a powerful formalism in the bioinformatics field, especially in gene expression studies and linkage analysis. Although less well known in association genetics, many successful methods have recently emerged to dissect the genetic architecture of complex diseases. In this review article, we cover the applications of these models to the population association studies' context, such as linkage disequilibrium modeling, fine mapping and candidate gene studies, and genome-scale association studies. Significant breakthroughs of the corresponding methods are highlighted, but emphasis is also given to their current limitations, in particular, to the issue of scalability. Finally, we give promising directions for future research in this field.
Collapse
Affiliation(s)
- Raphaël Mourad
- Ecole Polytechnique de l'Université de Nantes, rue Christian Pauc, BP 50609, 44306 Nantes Cedex 3, France.
| | | | | |
Collapse
|
11
|
Tachmazidou I, Johnson MR, De Iorio M. Bayesian variable selection for survival regression in genetics. Genet Epidemiol 2011; 34:689-701. [PMID: 20976796 DOI: 10.1002/gepi.20530] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Variable selection in regression with very big numbers of variables is challenging both in terms of model specification and computation. We focus on genetic studies in the field of survival, and we present a Bayesian-inspired penalized maximum likelihood approach appropriate for high-dimensional problems. In particular, we employ a simple, efficient algorithm that seeks maximum a posteriori (MAP) estimates of regression coefficients. The latter are assigned a Laplace prior with a sharp mode at zero, and non-zero posterior mode estimates correspond to significant single nucleotide polymorphisms (SNPs). Using the Laplace prior reflects a prior belief that only a small proportion of the SNPs significantly influence the response. The method is fast and can handle datasets arising from imputation or resequencing. We demonstrate the localization performance, power and false-positive rates of our method in large simulation studies of dense-SNP datasets and sequence data, and we compare the performance of our method to the univariate Cox regression and to a recently proposed stochastic search approach. In general, we find that our approach improves localization and power slightly, while the biggest advantage is in false-positive counts and computing times. We also apply our method to a real prospective study, and we observe potential association between candidate ABC transporter genes and epilepsy treatment outcomes.
Collapse
Affiliation(s)
- Ioanna Tachmazidou
- Medical Research Council, Biostatistics Unit, Cambridge, United Kingdom.
| | | | | |
Collapse
|
12
|
Bansal V, Libiger O, Torkamani A, Schork NJ. Statistical analysis strategies for association studies involving rare variants. Nat Rev Genet 2010; 11:773-85. [PMID: 20940738 PMCID: PMC3743540 DOI: 10.1038/nrg2867] [Citation(s) in RCA: 342] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The limitations of genome-wide association (GWA) studies that focus on the phenotypic influence of common genetic variants have motivated human geneticists to consider the contribution of rare variants to phenotypic expression. The increasing availability of high-throughput sequencing technologies has enabled studies of rare variants but these methods will not be sufficient for their success as appropriate analytical methods are also needed. We consider data analysis approaches to testing associations between a phenotype and collections of rare variants in a defined genomic region or set of regions. Ultimately, although a wide variety of analytical approaches exist, more work is needed to refine them and determine their properties and power in different contexts.
Collapse
Affiliation(s)
- Vikas Bansal
- The Scripps Translational Science Institute, 3344 North Torrey Pines Court, Suite 300, La Jolla, California 92037, USA
| | | | | | | |
Collapse
|
13
|
Abstract
GWAS have emerged as popular tools for identifying genetic variants that are associated with disease risk. Standard analysis of a case-control GWAS involves assessing the association between each individual genotyped SNP and disease risk. However, this approach suffers from limited reproducibility and difficulties in detecting multi-SNP and epistatic effects. As an alternative analytical strategy, we propose grouping SNPs together into SNP sets on the basis of proximity to genomic features such as genes or haplotype blocks, then testing the joint effect of each SNP set. Testing of each SNP set proceeds via the logistic kernel-machine-based test, which is based on a statistical framework that allows for flexible modeling of epistatic and nonlinear SNP effects. This flexibility and the ability to naturally adjust for covariate effects are important features of our test that make it appealing in comparison to individual SNP tests and existing multimarker tests. Using simulated data based on the International HapMap Project, we show that SNP-set testing can have improved power over standard individual-SNP analysis under a wide range of settings. In particular, we find that our approach has higher power than individual-SNP analysis when the median correlation between the disease-susceptibility variant and the genotyped SNPs is moderate to high. When the correlation is low, both individual-SNP analysis and the SNP-set analysis tend to have low power. We apply SNP-set analysis to analyze the Cancer Genetic Markers of Susceptibility (CGEMS) breast cancer GWAS discovery-phase data.
Collapse
|
14
|
Grossman I, Lutz MW, Crenshaw DG, Saunders AM, Burns DK, Roses AD. Alzheimer's disease: diagnostics, prognostics and the road to prevention. EPMA J 2010; 1:293-303. [PMID: 21124753 PMCID: PMC2987528 DOI: 10.1007/s13167-010-0024-3] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2010] [Accepted: 05/19/2010] [Indexed: 02/03/2023]
Abstract
Alzheimer's disease (AD) presents one of the leading healthcare challenges of the 21st century, with a projected worldwide prevalence of >107 million cases by 2025. While biomarkers have been identified, which may correlate with disease progression or subtype for the purpose of disease monitoring or differential diagnosis, a biomarker for reliable prediction of late onset disease risk has not been available until now. This deficiency in reliable predictive biomarkers, coupled with the devastating nature of the disease, places AD at a high priority for focus by predictive, preventive and personalized medicine. Recent data, discovered using phylogenetic analysis, suggest that a variable length poly-T sequence polymorphism in the TOMM40 gene, adjacent to the APOE gene, is predictive of risk of AD age-of-onset when coupled with a subject's current age. This finding offers hope for reliable assignment of disease risk within a 5-7 year window, and is expected to guide enrichment of clinical trials in order to speed development of preventative medicines.
Collapse
Affiliation(s)
| | - Michael W. Lutz
- Duke University, Box 90344, Durham, NC 27708-0120 USA
- Deane Drug Discovery Institute, Durham, NC USA
| | - Donna G. Crenshaw
- Duke University, Box 90344, Durham, NC 27708-0120 USA
- Deane Drug Discovery Institute, Durham, NC USA
| | - Ann M. Saunders
- Duke University, Box 90344, Durham, NC 27708-0120 USA
- Deane Drug Discovery Institute, Durham, NC USA
| | - Daniel K. Burns
- Cabernet Pharmaceuticals, Durham, NC USA
- Duke University, Box 90344, Durham, NC 27708-0120 USA
- Deane Drug Discovery Institute, Durham, NC USA
| | - Allen D. Roses
- Cabernet Pharmaceuticals, Durham, NC USA
- Duke University, Box 90344, Durham, NC 27708-0120 USA
- Deane Drug Discovery Institute, Durham, NC USA
| |
Collapse
|
15
|
The diverse applications of cladistic analysis of molecular evolution, with special reference to nested clade analysis. Int J Mol Sci 2010; 11:124-39. [PMID: 20162005 PMCID: PMC2820993 DOI: 10.3390/ijms11010124] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2009] [Revised: 01/06/2010] [Accepted: 01/06/2010] [Indexed: 11/17/2022] Open
Abstract
The genetic variation found in small regions of the genomes of many species can be arranged into haplotype trees that reflect the evolutionary genealogy of the DNA lineages found in that region and the accumulation of mutations on those lineages. This review demonstrates some of the many ways in which clades (branches) of haplotype trees have been applied in recent years, including the study of genotype/phenotype associations at candidate loci and in genome-wide association studies, the phylogeographic history of species, human evolution, the conservation of endangered species, and the identification of species.
Collapse
|
16
|
Abstract
We describe a fast hierarchical Bayesian method for mapping quantitative trait loci by haplotype-based association, applicable when haplotypes are not observed directly but are inferred from multiple marker genotypes. The method avoids the use of a Monte Carlo Markov chain by employing priors for which the likelihood factorizes completely. It is parameterized by a single hyperparameter, the fraction of variance explained by the quantitative trait locus, compared to the frequentist fixed-effects model, which requires a parameter for the phenotypic effect of each combination of haplotypes; nevertheless it still provides estimates of haplotype effects. We use simulation to show that the method matches the power of the frequentist regression model and, when the haplotypes are inferred, exceeds it for small QTL effect sizes. The Bayesian estimates of the haplotype effects are more accurate than the frequentist estimates, for both known and inferred haplotypes, which indicates that this advantage is independent of the effect of uncertainty in haplotype inference and will hold in comparison with frequentist methods in general. We apply the method to data from a panel of recombinant inbred lines of Arabidopsis thaliana, descended from 19 inbred founders.
Collapse
|
17
|
A TOMM40 variable-length polymorphism predicts the age of late-onset Alzheimer's disease. THE PHARMACOGENOMICS JOURNAL 2009; 10:375-84. [PMID: 20029386 PMCID: PMC2946560 DOI: 10.1038/tpj.2009.69] [Citation(s) in RCA: 272] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The ɛ4 allele of the apolipoprotein E (APOE) gene is currently the strongest and most highly replicated genetic factor for risk and age of onset of late-onset Alzheimer's disease (LOAD). Using phylogenetic analysis, we have identified a polymorphic poly-T variant, rs10524523, in the translocase of outer mitochondrial membrane 40 homolog (TOMM40) gene that provides greatly increased precision in the estimation of age of LOAD onset for APOE ɛ3 carriers. In two independent clinical cohorts, longer lengths of rs10524523 are associated with a higher risk for LOAD. For APOE ɛ3/4 patients who developed LOAD after 60 years of age, individuals with long poly-T repeats linked to APOE ɛ3 develop LOAD on an average of 7 years earlier than individuals with shorter poly-T repeats linked to APOE ɛ3 (70.5 ± 1.2 years versus 77.6 ± 2.1 years, P=0.02, n=34). Independent mutation events at rs10524523 that occurred during Caucasian evolution have given rise to multiple categories of poly-T length variants at this locus. On replication, these results will have clinical utility for predictive risk estimates for LOAD and for enabling clinical disease prevention studies. In addition, these results show the effective use of a phylogenetic approach for analysis of haplotypes of polymorphisms, including structural polymorphisms, which contribute to complex diseases.
Collapse
|
18
|
Su Z, Cardin N, The Wellcome Trust Case Control Consortium, Donnelly P, Marchini J. A Bayesian Method for Detecting and Characterizing Allelic Heterogeneity and Boosting Signals in Genome-Wide Association Studies. Stat Sci 2009. [DOI: 10.1214/09-sts311] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
19
|
Chapman J, Whittaker J. Analysis of multiple SNPs in a candidate gene or region. Genet Epidemiol 2008; 32:560-6. [PMID: 18428428 DOI: 10.1002/gepi.20330] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We consider the analysis of multiple single nucleotide polymorphisms (SNPs) within a gene or region. The simplest analysis of such data is based on a series of single SNP hypothesis tests, followed by correction for multiple testing, but it is intuitively plausible that a joint analysis of the SNPs will have higher power, particularly when the causal locus may not have been observed. However, standard tests, such as a likelihood ratio test based on an unrestricted alternative hypothesis, tend to have large numbers of degrees of freedom and hence low power. This has motivated a number of alternative test statistics. Here we compare several of the competing methods, including the multivariate score test (Hotelling's test) of Chapman et al. ([2003] Hum. Hered. 56:18-31), Fisher's method for combining P-values, the minimum P-value approach, a Fourier-transform-based approach recently suggested by Wang and Elston ([2007] Am. J. Human Genet. 80:353-360) and a Bayesian score statistic proposed for microarray data by Goeman et al. ([2005] J. R. Stat. Soc. B 68:477-493). Some relationships between these methods are pointed out, and simulation results given to show that the minimum P-value and the Goeman et al. ([2005] J. R. Stat. Soc. B 68:477-493) approaches work well over a range of scenarios. The Wang and Elston approach often performs poorly; we explain why, and show how its performance can be substantially improved.
Collapse
Affiliation(s)
- Juliet Chapman
- London School of Hygiene and Tropical Medicine, London, United Kingdom.
| | | |
Collapse
|
20
|
Ding Z, Mailund T, Song YS. Efficient whole-genome association mapping using local phylogenies for unphased genotype data. Bioinformatics 2008; 24:2215-21. [PMID: 18667442 PMCID: PMC2553438 DOI: 10.1093/bioinformatics/btn406] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2008] [Revised: 07/25/2008] [Accepted: 07/29/2008] [Indexed: 01/09/2023] Open
Abstract
MOTIVATION Recent advances in genotyping technology has made data acquisition for whole-genome association study cost effective, and a current active area of research is developing efficient methods to analyze such large-scale datasets. Most sophisticated association mapping methods that are currently available take phased haplotype data as input. However, phase information is not readily available from sequencing methods and inferring the phase via computational approaches is time-consuming, taking days to phase a single chromosome. RESULTS In this article, we devise an efficient method for scanning unphased whole-genome data for association. Our approach combines a recently found linear-time algorithm for phasing genotypes on trees with a recently proposed tree-based method for association mapping. From unphased genotype data, our algorithm builds local phylogenies along the genome, and scores each tree according to the clustering of cases and controls. We assess the performance of our new method on both simulated and real biological datasets. AVAILABILITY The software described in this article is available at http://www.daimi.au.dk/~mailund/Blossoc and distributed under the GNU General Public License.
Collapse
Affiliation(s)
- Zhihong Ding
- Department of Computer Science, University of California, Davis, USA
| | | | | |
Collapse
|
21
|
Chadeau-Hyam M, Hoggart CJ, O'Reilly PF, Whittaker JC, De Iorio M, Balding DJ. Fregene: simulation of realistic sequence-level data in populations and ascertained samples. BMC Bioinformatics 2008; 9:364. [PMID: 18778480 PMCID: PMC2542380 DOI: 10.1186/1471-2105-9-364] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2008] [Accepted: 09/08/2008] [Indexed: 01/28/2023] Open
Abstract
Background FREGENE simulates sequence-level data over large genomic regions in large populations. Because, unlike coalescent simulators, it works forwards through time, it allows complex scenarios of selection, demography, and recombination to be modelled simultaneously. Detailed tracking of sites under selection is implemented in FREGENE and provides the opportunity to test theoretical predictions and gain new insights into mechanisms of selection. We describe here main functionalities of both FREGENE and SAMPLE, a companion program that can replicate association study datasets. Results We report detailed analyses of six large simulated datasets that we have made publicly available. Three demographic scenarios are modelled: one panmictic, one substructured with migration, and one complex scenario that mimics the principle features of genetic variation in major worldwide human populations. For each scenario there is one neutral simulation, and one with a complex pattern of selection. Conclusion FREGENE and the simulated datasets will be valuable for assessing the validity of models for selection, demography and population genetic parameters, as well as the efficacy of association studies. Its principle advantages are modelling flexibility and computational efficiency. It is open source and object-oriented. As such, it can be customised and the range of models extended.
Collapse
Affiliation(s)
- Marc Chadeau-Hyam
- Department of Epidemiology and Public Health, Imperial College, St Mary's Campus, Norfolk Place, London, W2 1PG, UK.
| | | | | | | | | | | |
Collapse
|
22
|
Tachmazidou I, Andrew T, Verzilli CJ, Johnson MR, De Iorio M. Bayesian survival analysis in genetic association studies. ACTA ACUST UNITED AC 2008; 24:2030-6. [PMID: 18617538 PMCID: PMC2530885 DOI: 10.1093/bioinformatics/btn351] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Motivation: Large-scale genetic association studies are carried out with the hope of discovering single nucleotide polymorphisms involved in the etiology of complex diseases. There are several existing methods in the literature for performing this kind of analysis for case-control studies, but less work has been done for prospective cohort studies. We present a Bayesian method for linking markers to censored survival outcome by clustering haplotypes using gene trees. Coalescent-based approaches are promising for LD mapping, as the coalescent offers a good approximation to the evolutionary history of mutations. Results: We compare the performance of the proposed method in simulation studies to the univariate Cox regression and to dimension reduction methods, and we observe that it performs similarly in localizing the causal site, while offering a clear advantage in terms of false positive associations. Moreover, it offers computational advantages. Applying our method to a real prospective study, we observe potential association between candidate ABC transporter genes and epilepsy treatment outcomes. Availability: R codes are available upon request. Contact:ioanna.tachmazidou@imperial.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ioanna Tachmazidou
- Department of Epidemiology and Public Health, Imperial College, London, UK.
| | | | | | | | | |
Collapse
|
23
|
Su SY, Balding DJ, Coin LJM. Disease association tests by inferring ancestral haplotypes using a hidden markov model. ACTA ACUST UNITED AC 2008; 24:972-8. [PMID: 18296746 DOI: 10.1093/bioinformatics/btn071] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
MOTIVATION Most genome-wide association studies rely on single nucleotide polymorphism (SNP) analyses to identify causal loci. The increased stringency required for genome-wide analyses (with per-SNP significance threshold typically approximately 10(-7)) means that many real signals will be missed. Thus it is still highly relevant to develop methods with improved power at low type I error. Haplotype-based methods provide a promising approach; however, they suffer from statistical problems such as abundance of rare haplotypes and ambiguity in defining haplotype block boundaries. RESULTS We have developed an ancestral haplotype clustering (AncesHC) association method which addresses many of these problems. It can be applied to biallelic or multiallelic markers typed in haploid, diploid or multiploid organisms, and also handles missing genotypes. Our model is free from the assumption of a rigid block structure but recognizes a block-like structure if it exists in the data. We employ a Hidden Markov Model (HMM) to cluster the haplotypes into groups of predicted common ancestral origin. We then test each cluster for association with disease by comparing the numbers of cases and controls with 0, 1 and 2 chromosomes in the cluster. We demonstrate the power of this approach by simulation of case-control status under a range of disease models for 1500 outcrossed mice originating from eight inbred lines. Our results suggest that AncesHC has substantially more power than single-SNP analyses to detect disease association, and is also more powerful than the cladistic haplotype clustering method CLADHC. AVAILABILITY The software can be downloaded from http://www.imperial.ac.uk/medicine/people/l.coin.
Collapse
Affiliation(s)
- Shu-Yi Su
- Department of Epidemiology and Public Health, Imperial College, London W2 1PG, UK
| | | | | |
Collapse
|