1
|
Tasho RP, Shin WT, Cho JY. Acclimatization of Pisum sativum L., grown in soil contaminated with veterinary antibiotics, an attribute of dose hormetic response of root metabolites. THE SCIENCE OF THE TOTAL ENVIRONMENT 2018; 635:364-374. [PMID: 29674261 DOI: 10.1016/j.scitotenv.2018.04.101] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2018] [Revised: 04/02/2018] [Accepted: 04/06/2018] [Indexed: 05/11/2023]
Abstract
Plant-veterinary antibiotic interaction has been widely studied, however, to the best of our knowledge acclimatization studies with regard to changes in plant root metabolites has not been reported so far. The purpose of this study was to examine the changes in the metabolome of pea roots under antibiotic stress and their role in acclimatization. Pisum sativum L. was grown in soil contaminated with three commonly used veterinary antibiotics - kanamycin (KA), sulfamethazine (SA), and tetracycline (TC). In response to antibiotic stress, plants accumulated different types of low molecular weight compounds that provided protection from stress by contributing to ROS detoxification, protection of membrane integrity, efficient signaling, cell wall function, and cellular osmotic adjustment (glucose, galactose, myo-inositol, stigmasterol, octadecadienoic acid, l-proline). The concentration of amino acid, sugar, and triglyceride metabolites in KA and TC samples showed a dose-dependent biphasic (hormesis) fluctuation. This was mirrored in the metabolite abundance as well as the physiological attributes (mycorrhizal colonization, GST function, nutrient assimilation), which helped in the acclimatization without the loss of normal plant function. SA, on the other hand, had progressive toxic effects with increasing concentration. PCA revealed the differences to be due to SA treatments and in sterol and terpenoid metabolites.
Collapse
Affiliation(s)
- R P Tasho
- Department of Agriculture Chemistry, Chonbuk National University, Jeollabuk-do 561-756, South Korea.
| | - W T Shin
- Department of Agriculture Chemistry, Chonbuk National University, Jeollabuk-do 561-756, South Korea
| | - J Y Cho
- Department of Agriculture Chemistry, Chonbuk National University, Jeollabuk-do 561-756, South Korea.
| |
Collapse
|
2
|
Burkett KM, McNeney B, Graham J. Markov chain Monte Carlo sampling of gene genealogies conditional on unphased SNP genotype data. Stat Appl Genet Mol Biol 2013; 12:559-81. [PMID: 23962961 DOI: 10.1515/sagmb-2012-0011] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The gene genealogy is a tree describing the ancestral relationships among genes sampled from unrelated individuals. Knowledge of the tree is useful for inference of population-genetic parameters and has potential application in gene-mapping. Markov chain Monte Carlo approaches that sample genealogies conditional on observed genetic data typically assume that haplotype data are observed even though commonly-used genotyping technologies provide only unphased genotype data. We have extended our haplotype-based genealogy sampler, sampletrees, to handle unphased genotype data. We use the sampled haplotype configurations as a diagnostic for adequate sampling of the tree space based on the reasoning that if haplotype sampling is restricted, sampling from the tree space will also be restricted. We compare the distributions of sampled haplotypes across multiple runs of sampletrees, and to those estimated by the phase inference program, PHASE. Performance was excellent for the majority of individuals as shown by the consistency of results across multiple runs. However, for some individuals in some datasets, sampletrees had problems sampling haplotype configurations; longer run lengths would be required for these datasets. For many datasets though, we expect that sampletrees will be useful for sampling from the posterior distribution of gene genealogies given unphased genotype data.
Collapse
|
3
|
Edwards D. Modelling and visualizing fine-scale linkage disequilibrium structure. BMC Bioinformatics 2013; 14:179. [PMID: 23742095 PMCID: PMC3683336 DOI: 10.1186/1471-2105-14-179] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2013] [Accepted: 05/29/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Detailed study of genetic variation at the population level in humans and other species is now possible due to the availability of large sets of single nucleotide polymorphism data. Alleles at two or more loci are said to be in linkage disequilibrium (LD) when they are correlated or statistically dependent. Current efforts to understand the genetic basis of complex phenotypes are based on the existence of such associations, making study of the extent and distribution of linkage disequilibrium central to this endeavour. The objective of this paper is to develop methods to study fine-scale patterns of allelic association using probabilistic graphical models. RESULTS An efficient, linear-time forward-backward algorithm is developed to estimate chromosome-wide LD models by optimizing a penalized likelihood criterion, and a convenient way to display these models is described. To illustrate the methods they are applied to data obtained by genotyping 8341 pigs. It is found that roughly 20% of the porcine genome exhibits complex LD patterns, forming islands of relatively high genetic diversity. CONCLUSIONS The proposed algorithm is efficient and makes it feasible to estimate and visualize chromosome-wide LD models on a routine basis.
Collapse
Affiliation(s)
- David Edwards
- Department of Molecular Biology and Genetics, Centre for Quantitative Genetics and Genomics, Blichers Allé 20, Tjele 8830, Denmark.
| |
Collapse
|
4
|
Fine-scale mapping of disease susceptibility locus with Bayesian partition model. Genes Genomics 2012. [DOI: 10.1007/s13258-011-0220-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
5
|
Zhou X, Peng B, Li YF, Chen Y, Tang H, Wang X. To Release or Not to Release: Evaluating Information Leaks in Aggregate Human-Genome Data. COMPUTER SECURITY – ESORICS 2011 2011. [DOI: 10.1007/978-3-642-23822-2_33] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
6
|
Wang S, Chanock S, Tang D, Li Z, Edwards S, Jedrychowski W, Perera FP. Effect of gene-environment Interactions on mental development in African American, Dominican, and Caucasian mothers and newborns. Ann Hum Genet 2010; 74:46-56. [PMID: 19860743 PMCID: PMC2804781 DOI: 10.1111/j.1469-1809.2009.00550.x] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The health impact of environmental toxins has gained increasing recognition over the years. Polycyclic aromatic hydrocarbons (PAHs) and environmental tobacco smoke (ETS) are known to affect nervous system development in children, but no studies have investigated how polymorphisms in PAH metabolic genes affect child cognitive development following PAH exposure during pregnancy. In two parallel prospective cohort studies of non-smoking African American and Dominican mothers and children in New York City and of Caucasian mothers and children in Krakow, Poland, we explored the effect of gene-PAH interaction on child mental development index (MDI). Genes known to play important roles in the metabolic activation or detoxification of PAHs were selected. Genetic variations in these genes could influence susceptibility to adverse effects of PAHs in polluted air. We explored the effects of interactions between prenatal PAH exposure and 21 polymorphisms or haplotypes in these genes on MDI at 12, 24, and 36 months among 547 newborns and 806 mothers from three different ethnic groups. Significant interaction effects between haplotypes and PAHs were observed in mothers and their newborns in all three ethnic groups after Bonferroni correction. The strongest and most consistent effect observed was between PAH and haplotype ACCGGC of the CYP1B1 gene.
Collapse
Affiliation(s)
- Shuang Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY
| | | | - Deliang Tang
- Columbia Center for Children’s Environmental Health, Columbia University, New York, NY
| | - Zhigang Li
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY
| | - Susan Edwards
- Columbia Center for Children’s Environmental Health, Columbia University, New York, NY
| | - Wieslaw Jedrychowski
- Columbia Center for Children’s Environmental Health, Columbia University, New York, NY
- Department of Epidemiology and Preventive Medicine, Collage of Medicine, Jagiellonian University, Krakow, Poland
| | - Frederica P. Perera
- Columbia Center for Children’s Environmental Health, Columbia University, New York, NY
| |
Collapse
|
7
|
Abstract
When a novel genetic trait arises in a population, it introduces a signal in the haplotype distribution of that population. Through recombination that signal's history becomes differentiated from the DNA distant to it, but remains similar to the DNA close by. Fine-scale mapping techniques rely on this differentiation to pinpoint trait loci. In this study, we analyzed the differentiation itself to better understand how much information is available to these techniques. Simulated alleles on known recombinant coalescent trees show the upper limit for fine-scale mapping. Varying characteristics of the population being studied increase or decrease this limit. The initial uncertainty in map position has the most direct influence on the final precision of the estimate, with wider initial areas resulting in wider final estimates, though the increase is sigmoidal rather than linear. The Theta of the trait (4Nmu) is also important, with lower values for Theta resulting in greater precision of trait placement up to a point--the increase is sigmoidal as Theta decreases. Collecting data from more individuals can increase precision, though only logarithmically with the total number of individuals, so that each added individual contributes less to the final precision. However, a case/control analysis has the potential to greatly increase the effective number of individuals, as the bulk of the information lies in the differential between affected and unaffected genotypes. If haplotypes are unknown due to incomplete penetrance, much information is lost, with more information lost the less indicative phenotype is of the underlying genotype.
Collapse
Affiliation(s)
- Lucian P Smith
- Department of Genome Sciences, University of Washington, Seattle, WA 98195-5065, USA.
| | | |
Collapse
|
8
|
Igo RP, Li J, Goddard KA. Association mapping by generalized linear regression with density-based haplotype clustering. Genet Epidemiol 2009; 33:16-26. [PMID: 18561202 PMCID: PMC2952426 DOI: 10.1002/gepi.20352] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Haplotypes of closely linked single-nucleotide polymorphisms (SNPs) potentially offer greater power than individual SNPs to detect association between genetic variants and disease. We present a novel approach for association mapping in which density-based clustering of haplotypes reduces the dimensionality of the general linear model (GLM)-based score test of association implemented in the HaploStats software (Schaid et al. [2002] Am. J. Hum. Genet. 70:425-434). A flexible haplotype similarity score, a generalization of previously used measures, forms the basis, for grouping haplotypes of probable recent common ancestry. All haplotypes within a cluster are assigned the same regression coefficient within the GLM, and evidence for association is assessed with a score statistic. The approach is applicable to both binary and continuous trait data, and does not require prior phase information. Results of simulation studies demonstrated that clustering enhanced the power of the score test to detect association, under a variety of conditions, while preserving valid Type-I error. Improvement in performance was most dramatic in the presence of extreme haplotype diversity, while a slight improvement was observed even at low diversity. Our method also offers, for binary traits, a slight advantage in power over a similar approach based on an evolutionary model (Tzeng et al. [2006] Am. J. Hum. Genet. 78:231-242).
Collapse
Affiliation(s)
- Robert P. Igo
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio
| | - Jing Li
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, Ohio
| | - Katrina A.B. Goddard
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio
| |
Collapse
|
9
|
Lee SH, van der Werf JH. Simultaneous fine mapping of closely linked epistatic quantitative trait loci using combined linkage disequilibrium and linkage with a general pedigree. Genet Sel Evol 2008. [DOI: 10.1051/gse:2008002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
10
|
Wang S, Chanock S, Tang D, Li Z, Jedrychowski W, Perera FP. Assessment of interactions between PAH exposure and genetic polymorphisms on PAH-DNA adducts in African American, Dominican, and Caucasian mothers and newborns. Cancer Epidemiol Biomarkers Prev 2008; 17:405-13. [PMID: 18268125 PMCID: PMC3171162 DOI: 10.1158/1055-9965.epi-07-0695] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Polycyclic aromatic hydrocarbons (PAH) are widespread pollutants commonly found in air, food, and drinking water. Benzo[a]pyrene is a well-studied representative PAH found in air from fossil fuel combustion and a transplacental carcinogen experimentally. PAHs bind covalently to DNA to form DNA adducts, an indicator of DNA damage, and an informative biomarker of potential cancer risk. Associations between PAH-DNA adduct levels and both cancer risk and developmental deficits have been seen in previous experimental and epidemiologic studies. Several genes have been shown to play an important role in the metabolic activation or detoxification of PAHs, including the cytochrome P450 genes CYP1A1 and CYP1B1 and the glutathione S-transferase (GST) genes GSTM1, and GSTT2. Genetic variation in these genes could influence susceptibility to adverse effects of PAHs in polluted air. Here, we have explored interactions between prenatal PAH exposure and 17 polymorphisms in these genes (rs2198843, rs1456432, rs4646903, rs4646421, rs2606345, rs7495708, rs2472299, rs162549, rs1056837, rs1056836, rs162560, rs10012, rs2617266, rs2719, rs1622002, rs140194, and gene deletion GSTM1-02) and haplotypes on PAH-DNA adducts in cord blood of 547 newborns and in maternal blood of 806 mothers from three different self-described ethnic groups: African Americans, Dominicans, and Caucasians. PAHs were measured by personal air monitoring of mothers during pregnancy. Significant interactions (p < 0.05) were observed between certain genetic polymorphisms and CYP1A1 haplotype and PAHs in mothers and their newborns in the three ethnic groups. However, with our limited sample size, the current findings are suggestive only, warranting further study.
Collapse
Affiliation(s)
- Shuang Wang
- Department of Biostatistics, Columbia University, New York, New York
| | | | - Deliang Tang
- Columbia Center for Children’s Environmental Health, Mailman School of Public Health, Columbia University, New York, New York
| | - Zhigang Li
- Department of Biostatistics, Columbia University, New York, New York
| | - Wieslaw Jedrychowski
- Columbia Center for Children’s Environmental Health, Mailman School of Public Health, Columbia University, New York, New York
| | - Frederica P. Perera
- Columbia Center for Children’s Environmental Health, Mailman School of Public Health, Columbia University, New York, New York
| |
Collapse
|
11
|
Kulle B, Frigessi A, Edvardsen H, Kristensen V, Wojnowski L. Accounting for haplotype phase uncertainty in linkage disequilibrium estimation. Genet Epidemiol 2008; 32:168-78. [DOI: 10.1002/gepi.20273] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
12
|
Falchi M. Analysis of quantitative trait loci. Methods Mol Biol 2008; 453:297-326. [PMID: 18712311 DOI: 10.1007/978-1-60327-429-6_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Diseases with complex inheritance are characterized by multiple genetic and environmental factors that often interact to produce clinical symptoms. In addition, etiological heterogeneity (different risk factors causing similar phenotypes) obscure the inheritance pattern among affected relatives and hamper the feasibility of gene-mapping studies. For such diseases, the careful selection of quantitative phenotypes that may represent intermediary risk factors for disease development (intermediate phenotypes) is etiologically more homogeneous than the disease per se. Over the last 15 years quantitative trait locus mapping has become a popular method for understanding the genetic basis for intermediate phenotypes. This chapter provides an introduction to classical and recent strategies for mapping quantitative trait loci in humans.
Collapse
Affiliation(s)
- Mario Falchi
- Twin Research and Genetic Epidemiology Unit, King's College London School of Medicine, London, United Kingdom
| |
Collapse
|
13
|
Gu CC, Yu K, Rao DC. Characterization of LD structures and the utility of HapMap in genetic association studies. ADVANCES IN GENETICS 2008; 60:407-35. [PMID: 18358328 DOI: 10.1016/s0065-2660(07)00415-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Observed distribution of and variation in linkage disequilibrium (LD) with respect to the evolution history and disease transmission in a population is the driving force behind the current wave of genome-wide association (GWA) studies of complex human diseases. An extensive literature covers topics from haplotype analysis that utilizes local LD structures in candidate genes and regions to genome-wide organization of LD blocks (neighborhood) that led to the development of International HapMap Project and panels of "tagSNPs" used by current GWA studies. In this chapter, we examine the scenarios where each of the major types of analysis methods may be applicable and where the current popular genotyping platforms for GWA might come short. We discuss current association analysis methods by emphasizing their reliance on the local LD structures or the global organization of the LD structures, and highlight the need to consider individual marker information content in large-scale association mapping.
Collapse
Affiliation(s)
- C Charles Gu
- Division of Biostatistics and Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | | | | |
Collapse
|
14
|
Coalescent methods for fine-scale disease-gene mapping. Methods Mol Biol 2007. [PMID: 17984542 DOI: 10.1007/978-1-59745-389-9_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Fine-scale mapping methods have been developed to localize functional polymorphisms within large candidate regions identified from previous linkage and/or association studies. Population-based association fine-mapping methods utilize linkage disequilibrium of alleles at high-density marker single-nucleotide polymorphisms with the functional polymorphism, generated as the result of shared ancestry of individuals within the population. Here, we review fine-mapping methods that model the shared ancestry of sampled chromosomes explicitly, using the coalescent process, resulting in greater accuracy and precision to localize functional polymorphisms than approaches that treat individuals as unrelated.
Collapse
|
15
|
Andrés AM, Clark AG, Shimmin L, Boerwinkle E, Sing CF, Hixson JE. Understanding the accuracy of statistical haplotype inference with sequence data of known phase. Genet Epidemiol 2007; 31:659-71. [PMID: 17922479 PMCID: PMC2291540 DOI: 10.1002/gepi.20185] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Statistical methods for haplotype inference from multi-site genotypes of unrelated individuals have important application in association studies and population genetics. Understanding the factors that affect the accuracy of this inference is important, but their assessment has been restricted by the limited availability of biological data with known phase. We created hybrid cell lines monosomic for human chromosome 19 and produced single-chromosome complete sequences of a 48 kb genomic region in 39 individuals of African American (AA) and European American (EA) origin. We employ these phase-known genotypes and coalescent simulations to assess the accuracy of statistical haplotype reconstruction by several algorithms. Accuracy of phase inference was considerably low in our biological data even for regions as short as 25-50 kb, suggesting that caution is needed when analyzing reconstructed haplotypes. Moreover, the reliability of estimated confidence in phase inference is not high enough to allow for a reliable incorporation of site-specific uncertainty information in subsequent analyses. We show that, in samples of certain mixed ancestry (AA and EA populations), the most accurate haplotypes are probably obtained when increasing sample size by considering the largest, pooled sample, despite the hypothetical problems associated with pooling across those heterogeneous samples. Strategies to improve confidence in reconstructed haplotypes, and realistic alternatives to the analysis of inferred haplotypes, are discussed.
Collapse
Affiliation(s)
- Aida M Andrés
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA.
| | | | | | | | | | | |
Collapse
|
16
|
Chiu YF, Liang KY, Chuang LM, Beaty TH. Incorporation of covariates into multipoint linkage disequilibrium mapping in case-control studies. Genet Epidemiol 2007; 32:143-51. [PMID: 17968989 DOI: 10.1002/gepi.20271] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Case-control designs are commonly adopted in genetic epidemiological studies because they are cost effective and offer powerful tests for genetic and environmental risk factors, as well as their interactions. Previously, we proposed an association mapping approach to estimate the position of an unobserved disease locus as well as measuring its genetic effect on risk. The method provides a confidence interval for the estimated map position to help narrow the chromosomal region potentially harboring a disease locus. However, concerns often rise about case-control designs including possible false positives or bias due to confounders, heterogeneity or interactions among genes and between genes and environments. In the present work, we extended the multipoint linkage disequilibrium mapping approach for case-control studies to incorporate information about factors influencing the effect of causal genes to improve precision and efficiency of the estimated location. The efficiency, bias and coverage probability of this extended approach for locating a disease locus using case-control data with and without additional information on a covariate were compared through simulation. An example of a case-control study for type 2 diabetes was used to illustrate this extended method. In this study, a strong association between diabetes and a candidate gene, SCL2A10, was detected among nonobese subjects, whereas no evidence of association was found for either obese subjects or the whole sample when obesity was ignored. Simulation studies and these diabetes data both demonstrate how the efficiency of the estimated location of a disease gene can be improved substantially by incorporating information on covariates.
Collapse
Affiliation(s)
- Yen-Feng Chiu
- Division of Biostatistics and Bioinformatics, National Health Research Institutes, Zhunan, Taiwan
| | | | | | | |
Collapse
|
17
|
Milet J, Dehais V, Bourgain C, Jouanolle AM, Mosser A, Perrin M, Morcet J, Brissot P, David V, Deugnier Y, Mosser J. Common variants in the BMP2, BMP4, and HJV genes of the hepcidin regulation pathway modulate HFE hemochromatosis penetrance. Am J Hum Genet 2007; 81:799-807. [PMID: 17847004 PMCID: PMC2227929 DOI: 10.1086/520001] [Citation(s) in RCA: 96] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2007] [Accepted: 05/07/2007] [Indexed: 12/22/2022] Open
Abstract
Most cases of genetic hemochromatosis (GH) are associated with the HFE C282Y/C282Y (p.Cys282Tyr/p.Cys282Tyr) genotype in white populations. The symptoms expressed by C282Y homozygotes are extremely variable. Only a few suffer from an overt disease. Several studies have suggested that, in addition to environmental factors, a genetic component could explain a substantial part of this phenotypic variation, although very few genetic factors have been identified so far. In the present study, we tested the association between common variants in candidate genes and hemochromatosis penetrance, in a large sample of C282Y homozygotes, using pretherapeutic serum ferritin level as marker of hemochromatosis penetrance. We focused on two biologically relevant gene categories: genes involved in non-HFE GH (TFR2, HAMP, and SLC40A1) and genes involved in the regulation of hepcidin expression, including genes from the bone morphogenetic protein (BMP) regulatory pathway (BMP2, BMP4, HJV, SMAD1, SMAD4, and SMAD5) and the IL6 gene from the inflammation-mediated regulation pathway. A significant association was detected between serum ferritin level and rs235756, a common single-nucleotide polymorphism (SNP) in the BMP2 genic region (P=4.42x10-5). Mean ferritin level, adjusted for age and sex, is 655 ng/ml among TT genotypes, 516 ng/ml in TC genotypes, and 349 ng/ml in CC genotypes. Our results further suggest an interactive effect on serum ferritin level of rs235756 in BMP2 and a SNP in HJV, with a small additive effect of a SNP in BMP4. This first reported association between common variants in the BMP pathway and iron burden suggests that full expression of HFE hemochromatosis is linked to abnormal liver expression of hepcidin, not only through impairment in the HFE function but also through functional modulation in the BMP pathway. Our results also highlight the BMP regulation pathway as a good candidate for identification of new modifier genes.
Collapse
|
18
|
Wu Y, Gusfield D. Efficient computation of minimum recombination with genotypes (not haplotypes). J Bioinform Comput Biol 2007; 5:181-200. [PMID: 17589959 DOI: 10.1142/s0219720007002631] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2006] [Revised: 12/01/2006] [Accepted: 12/06/2006] [Indexed: 01/06/2023]
Abstract
A current major focus in genomics is the large-scale collection of genotype data in populations in order to detect variations in the population. The variation data are sought in order to address fundamental and applied questions in genetics that concern the haplotypes in the population. Since, almost all the collected data is in the form of genotypes, but the downstream genetics questions concern haplotypes, the standard approach to this issue has been to try to first infer haplotypes from the genotypes, and then answer the downstream questions using the inferred haplotypes. That two-stage approach has potential deficiencies, giving rise to the general question of how well one can answer the downstream questions using genotype data without first inferring haplotypes, and giving rise to the goal of computing the range of downstream answers that would be obtained over the range of possible inferred haplotype solutions. This paper provides some tools for the study of those issues, and some partial answers. We present algorithms to solve downstream questions concerning the minimum amount of recombination needed to derive given genotypic data, without first fixing a choice of haplotypes. We apply these algorithms to the goal of finding recombination hotspots, obtaining as good results as a published method that first infers haplotypes; and to the case of estimating the minimum amount of recombination needed to derive the true haplotypes underlying the genotypic data, obtaining weaker results compared to first inferring haplotypes using the program PHASE.
Collapse
Affiliation(s)
- Yufeng Wu
- Department of Computer Science, University of California, Davis, CA 95616, USA.
| | | |
Collapse
|
19
|
Hanein S, Perrault I, Gerber S, Delphin N, Benezra D, Shalev S, Carmi R, Feingold J, Dufier JL, Munnich A, Kaplan J, Rozet JM, Jeanpierre M. Population history and infrequent mutations: how old is a rare mutation? GUCY2D as a worked example. Eur J Hum Genet 2007; 16:115-23. [PMID: 17684531 DOI: 10.1038/sj.ejhg.5201905] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
The mosaic pattern of haplotypes observed around a single mutation results from one or several founder events. The difficulties involved in calculating the age of the variant are greatly reduced by assuming a single event, but this simplification may bias analysis of the genealogy of the mutation. However, if it is assumed that more than one founder event occurred, the number of genealogies is very large and the likelihood of every possible tree could not be realistically calculated. A multipoint approach is required, given the number of independent variables needed to describe a complex bifurcating genealogy. Starting from the observation that a limited number of parameters is needed for calculation of the simplest models of bifurcating genealogies, we show that the probability density of a two-ancestor model genealogy can be simply described as an algebraic function in a closed form, two coalescence times being calculated simultaneously without compromising accuracy. Implementation in a Bayesian framework is facilitated by the simplicity of the function, which describes the reciprocal relationship between the region of complete linkage disequilibrium and the branch length of the tree. We illustrate the use of haplotype information about allele-sharing decay around a mutation as a genetic clock, using data for two GUCY2D mutations in Mediterranean populations.
Collapse
Affiliation(s)
- Sylvain Hanein
- Unité de Recherches sur les Handicaps Génétiques de l'Enfant. Hôpital Necker - Enfants Malades, Paris, France
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Abstract
To identify the genetic etiology of a disease of interest, disease-related characteristics (phenotypes) are often tested for association with genetic variants (genotypes). Although genetic association studies of single genetic variants have been widely performed, there has been increasing interest in studies of multiple adjacent genetic variants on one chromosome, known as a haplotype. In this review, we will provide background about the origin of haplotypes and why they can be useful in genetic studies; we will discuss approaches to determining haplotypes and performing haplotype-based genetic association studies; and we will compare single variant and haplotype-based approaches.
Collapse
Affiliation(s)
- Edwin K Silverman
- Channing Laboratory and Pulmonary and Critical Care Division, Brigham and Women's Hospital, Boston, Massachusetts, USA.
| |
Collapse
|
21
|
|
22
|
Zhao HH, Fernando RL, Dekkers JCM. Power and precision of alternate methods for linkage disequilibrium mapping of quantitative trait loci. Genetics 2007; 175:1975-86. [PMID: 17277369 PMCID: PMC1855130 DOI: 10.1534/genetics.106.066480] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Linkage disequilibrium (LD) analysis in outbred populations uses historical recombinations to detect and fine map quantitative trait loci (QTL). Our objective was to evaluate the effect of various factors on power and precision of QTL detection and to compare LD mapping methods on the basis of regression and identity by descent (IBD) in populations of limited effective population size (N(e)). An 11-cM region with 6-38 segregating single-nucleotide polymorphisms (SNPs) and a central QTL was simulated. After 100 generations of random mating with N(e) of 50, 100, or 200, SNP genotypes and phenotypes were generated on 200, 500, or 1000 individuals with the QTL explaining 2 or 5% of phenotypic variance. To detect and map the QTL, phenotypes were regressed on genotypes or (assumed known) haplotypes, in comparison with the IBD method. Power and precision to detect QTL increased with sample size, marker density, and QTL effect. Power decreased with N(e), but precision was affected little by N(e). Single-marker regression had similar or greater power and precision than other regression models, and was comparable to the IBD method. Thus, for rapid initial screening of samples of adequate size in populations in which drift is the primary force that has created LD, QTL can be detected and mapped by regression on SNP genotypes without recovering haplotypes.
Collapse
Affiliation(s)
- H H Zhao
- Department of Animal Science and Center for Integrated Animal Genomics, Iowa State University, Ames, Iowa 50011, USA
| | | | | |
Collapse
|
23
|
Forabosco P, Falchi M, Devoto M. Statistical tools for linkage analysis and genetic association studies. Expert Rev Mol Diagn 2007; 5:781-96. [PMID: 16149880 DOI: 10.1586/14737159.5.5.781] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Genetic mapping by linkage analysis has been an invaluable tool in the positional strategy to identify the molecular basis of many rare Mendelian disorders. With the attention of the scientific and medical community shifting towards the analysis of more common, complex traits, it has become necessary to develop new approaches that take into account the complexity of the genetic basis of these disorders and their possible interaction with other, nongenetic factors. Linkage disequilibrium studies are now becoming increasingly popular thanks to the advent of genotyping platforms that allow genome-wide searching for association between hundreds of thousands of random polymorphisms and disease phenotypes in large samples of unrelated individuals. Moreover, the definition of the disease phenotype itself is being reconsidered to include quantitative traits that may better define the underlying biologic mechanisms for many pathologic conditions. This article will review classic and new approaches to genetic mapping by linkage and association analysis and discuss the directions this field is likely to take in the near future.
Collapse
Affiliation(s)
- Paola Forabosco
- Istituto di Genetica delle Popolazioni - CNR, Alghero, Italy.
| | | | | |
Collapse
|
24
|
Bickeböller H, Goddard KA, Igo RP, Kraft P, Lozano JP, Pankratz N. Issues in association mapping with high-density SNP data and diverse family structures. Genet Epidemiol 2007; 31 Suppl 1:S22-33. [DOI: 10.1002/gepi.20277] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
25
|
Eronen L, Geerts F, Toivonen H. HaploRec: efficient and accurate large-scale reconstruction of haplotypes. BMC Bioinformatics 2006; 7:542. [PMID: 17187677 PMCID: PMC1766938 DOI: 10.1186/1471-2105-7-542] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2006] [Accepted: 12/22/2006] [Indexed: 12/04/2022] Open
Abstract
Background Haplotypes extracted from human DNA can be used for gene mapping and other analysis of genetic patterns within and across populations. A fundamental problem is, however, that current practical laboratory methods do not give haplotype information. Estimation of phased haplotypes of unrelated individuals given their unphased genotypes is known as the haplotype reconstruction or phasing problem. Results We define three novel statistical models and give an efficient algorithm for haplotype reconstruction, jointly called HaploRec. HaploRec is based on exploiting local regularities conserved in haplotypes: it reconstructs haplotypes so that they have maximal local coherence. This approach – not assuming statistical dependence for remotely located markers – has two useful properties: it is well-suited for sparse marker maps, such as those used in gene mapping, and it can actually take advantage of long maps. Conclusion Our experimental results with simulated and real data show that HaploRec is a powerful method for the large scale haplotyping needed in association studies. With sample sizes large enough for gene mapping it appeared to be the best compared to all other tested methods (Phase, fastPhase, PL-EM, Snphap, Gerbil; simulated data), with small samples it was competitive with the best available methods (real data). HaploRec is several orders of magnitude faster than Phase and comparable to the other methods; the running times are roughly linear in the number of subjects and the number of markers. HaploRec is publicly available at .
Collapse
Affiliation(s)
- Lauri Eronen
- HIIT-BRU, Department of Computer Science, University of Helsinki, Finland
| | - Floris Geerts
- Laboratory for Foundations of Computer Science, University of Edinburgh, UK
| | - Hannu Toivonen
- HIIT-BRU, Department of Computer Science, University of Helsinki, Finland
- Department of Computer Science, University of Freiburg, Germany
| |
Collapse
|
26
|
Minichiello MJ, Durbin R. Mapping trait loci by use of inferred ancestral recombination graphs. Am J Hum Genet 2006; 79:910-22. [PMID: 17033967 PMCID: PMC1698562 DOI: 10.1086/508901] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2006] [Accepted: 09/01/2006] [Indexed: 12/26/2022] Open
Abstract
Large-scale association studies are being undertaken with the hope of uncovering the genetic determinants of complex disease. We describe a computationally efficient method for inferring genealogies from population genotype data and show how these genealogies can be used to fine map disease loci and interpret association signals. These genealogies take the form of the ancestral recombination graph (ARG). The ARG defines a genealogical tree for each locus, and, as one moves along the chromosome, the topologies of consecutive trees shift according to the impact of historical recombination events. There are two stages to our analysis. First, we infer plausible ARGs, using a heuristic algorithm, which can handle unphased and missing data and is fast enough to be applied to large-scale studies. Second, we test the genealogical tree at each locus for a clustering of the disease cases beneath a branch, suggesting that a causative mutation occurred on that branch. Since the true ARG is unknown, we average this analysis over an ensemble of inferred ARGs. We have characterized the performance of our method across a wide range of simulated disease models. Compared with simpler tests, our method gives increased accuracy in positioning untyped causative loci and can also be used to estimate the frequencies of untyped causative alleles. We have applied our method to Ueda et al.'s association study of CTLA4 and Graves disease, showing how it can be used to dissect the association signal, giving potentially interesting results of allelic heterogeneity and interaction. Similar approaches analyzing an ensemble of ARGs inferred using our method may be applicable to many other problems of inference from population genotype data.
Collapse
Affiliation(s)
- Mark J Minichiello
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, United Kingdom
| | | |
Collapse
|
27
|
Mailund T, Besenbacher S, Schierup MH. Whole genome association mapping by incompatibilities and local perfect phylogenies. BMC Bioinformatics 2006; 7:454. [PMID: 17042942 PMCID: PMC1624851 DOI: 10.1186/1471-2105-7-454] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2006] [Accepted: 10/16/2006] [Indexed: 11/21/2022] Open
Abstract
Background With current technology, vast amounts of data can be cheaply and efficiently produced in association studies, and to prevent data analysis to become the bottleneck of studies, fast and efficient analysis methods that scale to such data set sizes must be developed. Results We present a fast method for accurate localisation of disease causing variants in high density case-control association mapping experiments with large numbers of cases and controls. The method searches for significant clustering of case chromosomes in the "perfect" phylogenetic tree defined by the largest region around each marker that is compatible with a single phylogenetic tree. This perfect phylogenetic tree is treated as a decision tree for determining disease status, and scored by its accuracy as a decision tree. The rationale for this is that the perfect phylogeny near a disease affecting mutation should provide more information about the affected/unaffected classification than random trees. If regions of compatibility contain few markers, due to e.g. large marker spacing, the algorithm can allow the inclusion of incompatibility markers in order to enlarge the regions prior to estimating their phylogeny. Haplotype data and phased genotype data can be analysed. The power and efficiency of the method is investigated on 1) simulated genotype data under different models of disease determination 2) artificial data sets created from the HapMap ressource, and 3) data sets used for testing of other methods in order to compare with these. Our method has the same accuracy as single marker association (SMA) in the simplest case of a single disease causing mutation and a constant recombination rate. However, when it comes to more complex scenarios of mutation heterogeneity and more complex haplotype structure such as found in the HapMap data our method outperforms SMA as well as other fast, data mining approaches such as HapMiner and Haplotype Pattern Mining (HPM) despite being significantly faster. For unphased genotype data, an initial step of estimating the phase only slightly decreases the power of the method. The method was also found to accurately localise the known susceptibility variants in an empirical data set – the ΔF508 mutation for cystic fibrosis – where the susceptibility variant is already known – and to find significant signals for association between the CYP2D6 gene and poor drug metabolism, although for this dataset the highest association score is about 60 kb from the CYP2D6 gene. Conclusion Our method has been implemented in the Blossoc (BLOck aSSOCiation) software. Using Blossoc, genome wide chip-based surveys of 3 million SNPs in 1000 cases and 1000 controls can be analysed in less than two CPU hours.
Collapse
Affiliation(s)
- Thomas Mailund
- Department of Statistics, University of Oxford, UK
- Bioinformatics Research Center, University of Aarhus, Denmark
| | | | | |
Collapse
|
28
|
Johnson T. Bayesian method for gene detection and mapping, using a case and control design and DNA pooling. Biostatistics 2006; 8:546-65. [PMID: 16984977 DOI: 10.1093/biostatistics/kxl028] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Association mapping studies aim to determine the genetic basis of a trait. A common experimental design uses a sample of unrelated individuals classified into 2 groups, for example cases and controls. If the trait has a complex genetic basis, consisting of many quantitative trait loci (QTLs), each group needs to be large. Each group must be genotyped at marker loci covering the region of interest; for dense coverage of a large candidate region, or a whole-genome scan, the number of markers will be very large. The total amount of genotyping required for such a study is formidable. A laboratory effort efficient technique called DNA pooling could reduce the amount of genotyping required, but the data generated are less informative and require novel methods for efficient analysis. In this paper, a Bayesian statistical analysis of the classic model of McPeek and Strahs is proposed. In contrast to previous work on this model, I assume that data are collected using DNA pooling, so individual genotypes are not directly observed, and also account for experimental errors. A complete analysis can be performed using analytical integration, a propagation algorithm for a hidden Markov model, and quadrature. The method developed here is both statistically and computationally efficient. It allows simultaneous detection and mapping of a QTL, in a large-scale association mapping study, using data from pooled DNA. The method is shown to perform well on data sets simulated under a realistic coalescent-with-recombination model, and is shown to outperform classical single-point methods. The method is illustrated on data consisting of 27 markers in an 880-kb region around the CYP2D6 gene.
Collapse
Affiliation(s)
- Toby Johnson
- School of Biological Sciences, The University of Edinburgh, Edinburgh EH9 3JT, UK.
| |
Collapse
|
29
|
Lee SH, Van der Werf JHJ. Using dominance relationship coefficients based on linkage disequilibrium and linkage with a general complex pedigree to increase mapping resolution. Genetics 2006; 174:1009-16. [PMID: 16951069 PMCID: PMC1602085 DOI: 10.1534/genetics.106.060806] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Dominance (intralocus allelic interactions) plays often an important role in quantitative trait variation. However, few studies about dominance in QTL mapping have been reported in outbred animal or human populations. This is because common dominance effects can be predicted mainly for many full sibs, which do not often occur in outbred or natural populations with a general pedigree. Moreover, incomplete genotypes for such a pedigree make it infeasible to estimate dominance relationship coefficients between individuals. In this study, identity-by-descent (IBD) coefficients are estimated on the basis of population-wide linkage disequilibrium (LD), which makes it possible to track dominance relationships between unrelated founders. Therefore, it is possible to use dominance effects in QTL mapping without full sibs. Incomplete genotypes with a complex pedigree and many markers can be efficiently dealt with by a Markov chain Monte Carlo method for estimating IBD and dominance relationship matrices (D(RM)). It is shown by simulation that the use of D(RM) increases the likelihood ratio at the true QTL position and the mapping accuracy and power with complete dominance, overdominance, and recessive inheritance modes when using 200 genotyped and phenotyped individuals.
Collapse
Affiliation(s)
- S H Lee
- School of Rural Science and Agriculture and Institute of Genetics and Bioinformatics, University of New England, Armidale, NSW 2351, Australia.
| | | |
Collapse
|
30
|
Lee SH, Van der Werf JHJ. Simultaneous fine mapping of multiple closely linked quantitative trait Loci using combined linkage disequilibrium and linkage with a general pedigree. Genetics 2006; 173:2329-37. [PMID: 16751664 PMCID: PMC1569695 DOI: 10.1534/genetics.106.057653] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2006] [Accepted: 05/26/2006] [Indexed: 11/18/2022] Open
Abstract
Within a small region (e.g., <10 cM), there can be multiple quantitative trait loci (QTL) underlying phenotypes of a trait. Simultaneous fine mapping of closely linked QTL needs an efficient tool to remove confounded shade effects among QTL within such a small region. We propose a variance component method using combined linkage disequilibrium (LD) and linkage information and a reversible jump Markov chain Monte Carlo (MCMC) sampling for model selection. QTL identity-by-descent (IBD) coefficients between individuals are estimated by a hybrid MCMC combining the random walk and the meiosis Gibbs sampler. These coefficients are used in a mixed linear model and an empirical Bayesian procedure combines residual maximum likelihood (REML) to estimate QTL effects and a reversible jump MCMC that samples the number of QTL and the posterior QTL intensities across the tested region. Note that two MCMC processes are used, i.e., an (internal) MCMC for IBD estimation and an (external) MCMC for model selection. In a simulation study, the use of the multiple-QTL model clearly removes the shade effects between three closely linked QTL located at 1.125, 3.875, and 7.875 cM across the region of 10 cM, using 40 markers at 0.25-cM intervals. It is shown that the use of combined LD and linkage information gives much more useful information compared to using linkage information alone for both single- and multiple-QTL analyses. When using a lower marker density (11 markers at 1-cM intervals), the signal of the second QTL can disappear. Extreme values of past effective size (resulting in extreme levels of LD) decrease the mapping accuracy.
Collapse
Affiliation(s)
- S H Lee
- School of Rural Science and Agriculture and The Institute of Genetics and Bioinformatics, University of New England, Armidale, NSW 2351, Australia.
| | | |
Collapse
|
31
|
Bataillon T, Mailund T, Thorlacius S, Steingrimsson E, Rafnar T, Halldorsson MM, Calian V, Schierup MH. The effective size of the Icelandic population and the prospects for LD mapping: inference from unphased microsatellite markers. Eur J Hum Genet 2006; 14:1044-53. [PMID: 16736029 DOI: 10.1038/sj.ejhg.5201669] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
Characterizing the extent of linkage disequilibrium (LD) in the genome is a pre-requisite for association mapping studies. Patterns of LD also contain information about the past demography of populations. In this study, we focus on the Icelandic population where LD was investigated in 12 regions of approximately 15 cM using regularly spaced microsatellite loci displaying high heterozygosity. A total of 1753 individuals were genotyped for 179 markers. LD was estimated using a composite disequilibrium measure based on unphased data. LD decreases with distance in all 12 regions and more LD than expected by chance can be detected over approximately 4 cM in our sample. Differences in the patterns of decrease of LD with distance among genomic regions were mostly due to two regions exhibiting, respectively, higher and lower proportions of pairs in LD than average within the first 4 cM. We pooled data from all regions, except these two and summarized patterns of LD by computing the proportion of pairs of loci exhibiting significant LD (at the 5% level) as a function of distance. We compared observed patterns of LD with simulated data sets obtained under scenarios with varying demography and intensity of recombination. We show that unphased data allow to make inferences on scaled recombination rates from patterns of LD. Patterns of LD in Iceland suggest a genome-wide scaled recombination rate of rho* = 200 (130-330) per cM (or an effective size of roughly 5000), in the low range of estimates recently reported in three populations from the HapMap project.
Collapse
Affiliation(s)
- Thomas Bataillon
- Bioinformatics Research Center, University of Aarhus, Høegh-Guldbergs Gade 10, DK-8000 Aarhus C, Denmark.
| | | | | | | | | | | | | | | |
Collapse
|
32
|
Abstract
We present a range of modelling components designed to facilitate Bayesian analysis of genetic-association-study data. A key feature of our approach is the ability to combine different submodels together, almost arbitrarily, for dealing with the complexities of real data. In particular, we propose various techniques for selecting the "best" subset of genetic predictors for a specific phenotype (or set of phenotypes). At the same time, we may control for complex, non-linear relationships between phenotypes and additional (non-genetic) covariates as well as accounting for any residual correlation that exists among multiple phenotypes. Both of these additional modelling components are shown to potentially aid in detecting the underlying genetic signal. We may also account for uncertainty regarding missing genotype data. Indeed, at the heart of our approach is a novel method for reconstructing unobserved haplotypes and/or inferring the values of missing genotypes. This can be deployed independently or, alternatively, it can be fully integrated into arbitrary genotype- or haplotype-based association models such that the missing data and the association model are "estimated" simultaneously. The impact of such simultaneous analysis on inferences drawn from the association model is shown to be potentially significant. Our modelling components are packaged as an "add-on" interface to the widely used WinBUGS software, which allows Markov chain Monte Carlo analysis of a wide range of statistical models. We illustrate their use with a series of increasingly complex analyses conducted on simulated data based on a real pharmacogenetic example.
Collapse
Affiliation(s)
- David J Lunn
- Department of Epidemiology and Public Health, Imperial College London, St. Mary's Campus, London, UK.
| | | | | |
Collapse
|
33
|
Boitard S, Abdallah J, de Rochambeau H, Cierco-Ayrolles C, Mangin B. Linkage disequilibrium interval mapping of quantitative trait loci. BMC Genomics 2006; 7:54. [PMID: 16542433 PMCID: PMC1559614 DOI: 10.1186/1471-2164-7-54] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2005] [Accepted: 03/16/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND For many years gene mapping studies have been performed through linkage analyses based on pedigree data. Recently, linkage disequilibrium methods based on unrelated individuals have been advocated as powerful tools to refine estimates of gene location. Many strategies have been proposed to deal with simply inherited disease traits. However, locating quantitative trait loci is statistically more challenging and considerable research is needed to provide robust and computationally efficient methods. RESULTS Under a three-locus Wright-Fisher model, we derived approximate expressions for the expected haplotype frequencies in a population. We considered haplotypes comprising one trait locus and two flanking markers. Using these theoretical expressions, we built a likelihood-maximization method, called HAPim, for estimating the location of a quantitative trait locus. For each postulated position, the method only requires information from the two flanking markers. Over a wide range of simulation scenarios it was found to be more accurate than a two-marker composite likelihood method. It also performed as well as identity by descent methods, whilst being valuable in a wider range of populations. CONCLUSION Our method makes efficient use of marker information, and can be valuable for fine mapping purposes. Its performance is increased if multiallelic markers are available. Several improvements can be developed to account for more complex evolution scenarios or provide robust confidence intervals for the location estimates.
Collapse
Affiliation(s)
- Simon Boitard
- Unité de Biométrie et Intelligence Artificielle, Institut National de la Recherche Agronomique, BP 52627, 31326 Castanet-Tolosan Cedex, France
- Laboratoire de Statistiques et Probabilités, Université Paul Sabatier, 118 route de Narbonne, 31400 Toulouse, France
| | - Jihad Abdallah
- Laboratoire de Génétique Cellulaire, Institut National de la Recherche Agronomique, BP 52627, 31326 Castanet-Tolosan Cedex, France
- Station d'Amélioration Génétique des Animaux, Institut National de la Recherche Agronomique, BP 52627, 31326 Castanet-Tolosan Cedex, France
| | - Hubert de Rochambeau
- Station d'Amélioration Génétique des Animaux, Institut National de la Recherche Agronomique, BP 52627, 31326 Castanet-Tolosan Cedex, France
| | - Christine Cierco-Ayrolles
- Unité de Biométrie et Intelligence Artificielle, Institut National de la Recherche Agronomique, BP 52627, 31326 Castanet-Tolosan Cedex, France
- Laboratoire de Statistiques et Probabilités, Université Paul Sabatier, 118 route de Narbonne, 31400 Toulouse, France
| | - Brigitte Mangin
- Unité de Biométrie et Intelligence Artificielle, Institut National de la Recherche Agronomique, BP 52627, 31326 Castanet-Tolosan Cedex, France
| |
Collapse
|
34
|
Marchini J, Cutler D, Patterson N, Stephens M, Eskin E, Halperin E, Lin S, Qin ZS, Munro HM, Abecasis GR, Donnelly P. A comparison of phasing algorithms for trios and unrelated individuals. Am J Hum Genet 2006; 78:437-50. [PMID: 16465620 PMCID: PMC1380287 DOI: 10.1086/500808] [Citation(s) in RCA: 222] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2005] [Accepted: 12/29/2005] [Indexed: 11/03/2022] Open
Abstract
Knowledge of haplotype phase is valuable for many analysis methods in the study of disease, population, and evolutionary genetics. Considerable research effort has been devoted to the development of statistical and computational methods that infer haplotype phase from genotype data. Although a substantial number of such methods have been developed, they have focused principally on inference from unrelated individuals, and comparisons between methods have been rather limited. Here, we describe the extension of five leading algorithms for phase inference for handling father-mother-child trios. We performed a comprehensive assessment of the methods applied to both trios and to unrelated individuals, with a focus on genomic-scale problems, using both simulated data and data from the HapMap project. The most accurate algorithm was PHASE (v2.1). For this method, the percentages of genotypes whose phase was incorrectly inferred were 0.12%, 0.05%, and 0.16% for trios from simulated data, HapMap Centre d'Etude du Polymorphisme Humain (CEPH) trios, and HapMap Yoruban trios, respectively, and 5.2% and 5.9% for unrelated individuals in simulated data and the HapMap CEPH data, respectively. The other methods considered in this work had comparable but slightly worse error rates. The error rates for trios are similar to the levels of genotyping error and missing data expected. We thus conclude that all the methods considered will provide highly accurate estimates of haplotypes when applied to trio data sets. Running times differ substantially between methods. Although it is one of the slowest methods, PHASE (v2.1) was used to infer haplotypes for the 1 million-SNP HapMap data set. Finally, we evaluated methods of estimating the value of r(2) between a pair of SNPs and concluded that all methods estimated r(2) well when the estimated value was >or=0.8.
Collapse
Affiliation(s)
- Jonathan Marchini
- Department of Statistics, University of Oxford, Oxford OX1 3TG, United Kingdom.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Waldron ERB, Whittaker JC, Balding DJ. Fine mapping of disease genes via haplotype clustering. Genet Epidemiol 2006; 30:170-9. [PMID: 16385468 DOI: 10.1002/gepi.20134] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We propose an algorithm for analysing SNP-based population association studies, which is a development of that introduced by Molitor et al. [2003: Am J Hum Genet 73:1368-1384]. It uses clustering of haplotypes to overcome the major limitations of many current haplotype-based approaches. We define a between-haplotype score that is simple, yet appears to capture much of the information about evolutionary relatedness of the haplotypes in the vicinity of a (unobserved) putative causal locus. Haplotype clusters can then be defined via a putative ancestral haplotype and a cut-off distance. The number of an individual's two haplotypes that lie within the cluster predicts the individual's genotype at the causal locus. This predicted genotype can then be investigated for association with the phenotype of interest. We implement our approach within a Markov-chain Monte Carlo algorithm that, in effect, searches over locations and ancestral haplotypes to identify large, case-rich clusters. The algorithm successfully fine-maps a causal mutation in a test analysis using real data, and achieves almost 98% accuracy in predicting the genotype at the causal locus. A simulation study indicates that the new algorithm is substantially superior to alternative approaches, and it also allows us to identify situations in which multi-point approaches can substantially improve over single-SNP analyses. Our algorithm runs quickly and there is scope for extension to a wide range of disease models and genomic scales.
Collapse
Affiliation(s)
- E R B Waldron
- Department of Epidemiology and Public Health, Imperial College London, St. Mary's Campus, Norfolk Place, London W2 1PG, United Kingdom.
| | | | | |
Collapse
|
36
|
Fan R, Jung J, Jin L. High-resolution association mapping of quantitative trait loci: a population-based approach. Genetics 2006; 172:663-86. [PMID: 16172503 PMCID: PMC1456191 DOI: 10.1534/genetics.105.046417] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2005] [Accepted: 09/19/2005] [Indexed: 01/19/2023] Open
Abstract
In this article, population-based regression models are proposed for high-resolution linkage disequilibrium mapping of quantitative trait loci (QTL). Two regression models, the "genotype effect model" and the "additive effect model," are proposed to model the association between the markers and the trait locus. The marker can be either diallelic or multiallelic. If only one marker is used, the method is similar to a classical setting by Nielsen and Weir, and the additive effect model is equivalent to the haplotype trend regression (HTR) method by Zaykin et al. If two/multiple marker data with phase ambiguity are used in the analysis, the proposed models can be used to analyze the data directly. By analytical formulas, we show that the genotype effect model can be used to model the additive and dominance effects simultaneously; the additive effect model takes care of the additive effect only. On the basis of the two models, F-test statistics are proposed to test association between the QTL and markers. By a simulation study, we show that the two models have reasonable type I error rates for a data set of moderate sample size. The noncentrality parameter approximations of F-test statistics are derived to make power calculation and comparison. By a simulation study, it is found that the noncentrality parameter approximations of F-test statistics work very well. Using the noncentrality parameter approximations, we compare the power of the two models with that of the HTR. In addition, a simulation study is performed to make a comparison on the basis of the haplotype frequencies of 10 SNPs of angiotensin-1 converting enzyme (ACE) genes.
Collapse
Affiliation(s)
- Ruzong Fan
- Department of Statistics, Texas A&M University, College Station, Texas 77843, USA.
| | | | | |
Collapse
|
37
|
Payseur BA, Clark AG, Hixson J, Boerwinkle E, Sing CF. Contrasting multi-site genotypic distributions among discordant quantitative phenotypes: theAPOA1/C3/A4/A5 gene cluster and cardiovascular disease risk factors. Genet Epidemiol 2006; 30:508-18. [PMID: 16800005 DOI: 10.1002/gepi.20163] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Most tests of association between DNA sequence variation and quantitative phenotypes in samples of randomly chosen individuals rely on specification of genotypic strata followed by comparison of phenotypes across these strata. This strategy often succeeds when phenotypic differences are caused by one or two single nucleotide polymorphisms (SNPs) among the surveyed markers. However, when multiple-SNP haplotypes account for observed phenotypic variation, identification of the best partitioning requires examination of an inordinate number of SNP combinations. An alternative approach is to rank individuals by their phenotypic measures and ask whether attributes of the genotypic variation show a non-random distribution along this phenotypic ranking. One simple version of this strategy selects the top and bottom tails of the distribution, and then tests whether genotypes from these two samples are drawn from a single population. This framework does not require the recovery of phased haplotypes and allows contrasts between large numbers of sites at once. We use a method based on this approach to identify associations between plasma triglyceride level, a risk factor for cardiovascular disease, and multi-site genotypes located in the APOA1/C3/A4/A5 cluster of apolipoprotein genes in unrelated individuals (1,071 African-American females, 780 African-American males, 1,036 European-American females, and 930 European-American males) sampled from four US cities as part of the Coronary Artery Risk Development in Young Adults (CARDIA) study. Method performance is investigated using simulations that model genealogical variation and different genetic architectures. Results indicate that this multi-site test can identify genotype-phenotype associations with reasonable power, including those generated by some simple epistatic models.
Collapse
Affiliation(s)
- Bret A Payseur
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA.
| | | | | | | | | |
Collapse
|
38
|
Pinnaduwage D, Briollais L. Comparison of genotype- and haplotype-based approaches for fine-mapping of alcohol dependence using COGA data. BMC Genet 2005; 6 Suppl 1:S65. [PMID: 16451678 PMCID: PMC1866717 DOI: 10.1186/1471-2156-6-s1-s65] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
It is generally assumed that the detection of disease susceptibility genes via fine-mapping association study is facilitated by consideration of marker haplotypes. In this study, we compared the performance of genotype-based and haplotype-based association studies using the Collaborative Study of Genetics of Alcoholism dataset, on several chromosomal regions showing evidence for linkage with ALDX1. After correction for multiple testing, the most significant results were observed with the genotype-based analyses on two regions of chromosomes 2 and 7. Interestingly, the analyses results from this dataset showed that there was no advantage of the haplotype-based analyses over genotype-based (single-locus) analyses. However, caution should be taken when generalizing these results to other chromosomal regions or to other populations.
Collapse
Affiliation(s)
- Dushanthi Pinnaduwage
- Division of Epidemiology and Biostatistics, Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Ontario, M5G 1X5, Canada
- Litwin Centre for Cancer Genetics, and Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Ontario, M5G 1X8, Canada
| | - Laurent Briollais
- Division of Epidemiology and Biostatistics, Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Ontario, M5G 1X5, Canada
- Department of Public Health Sciences, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
39
|
Lee SH, Van der Werf JHJ, Tier B. Combining the meiosis Gibbs sampler with the random walk approach for linkage and association studies with a general complex pedigree and multimarker loci. Genetics 2005; 171:2063-72. [PMID: 15965262 PMCID: PMC1456126 DOI: 10.1534/genetics.104.037028] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2004] [Accepted: 06/01/2005] [Indexed: 11/18/2022] Open
Abstract
A linkage analysis for finding inheritance states and haplotype configurations is an essential process for linkage and association mapping. The linkage analysis is routinely based upon observed pedigree information and marker genotypes for individuals in the pedigree. It is not feasible for exact methods to use all such information for a large complex pedigree especially when there are many missing genotypic data. Proposed Markov chain Monte Carlo approaches such as a single-site Gibbs sampler or the meiosis Gibbs sampler are able to handle a complex pedigree with sparse genotypic data; however, they often have reducibility problems, causing biased estimates. We present a combined method, applying the random walk approach to the reducible sites in the meiosis sampler. Therefore, one can efficiently obtain reliable estimates such as identity-by-descent coefficients between individuals based on inheritance states or haplotype configurations, and a wider range of data can be used for mapping of quantitative trait loci within a reasonable time.
Collapse
Affiliation(s)
- S H Lee
- School of Rural Science and Agriculture and Institute of Genetics and Bioinformatics, University of New England, Armidale, New South Wales 2351, Australia.
| | | | | |
Collapse
|
40
|
Hintsanen P, Sevon P, Onkamo P, Eronen L, Toivonen H. An empirical comparison of case-control and trio based study designs in high throughput association mapping. J Med Genet 2005; 43:617-24. [PMID: 16258007 PMCID: PMC2564560 DOI: 10.1136/jmg.2005.036020] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Motivated by high throughput genotyping technology, our aim in this study was to experimentally compare the power and accuracy of case-control and family trio based approaches for haplotype based, large scale, association gene mapping. We compared trio based and case-control study designs in different disease models, and partitioned the performance differences into separate components: those from the sample ascertainment, the effective sample size, and the haplotyping approaches. For systematic and controlled tests, we simulated a rapidly expanding and relatively young isolated population. The experiments were also replicated with real asthma data. We used computationally efficient methods that scale up to large amounts of both markers and individuals. Mapping is based on a haplotype association test for haplotypes of 1-10 markers. For population based haplotype reconstruction, we use HaploRec, and compare it to both a simple trio based inference and true haplotypes. Firstly and surprisingly, statistically inferred population based haplotypes can be equally powerful as true haplotypes. Secondly, as expected, the effective sample size has a clear effect on both gene detection power and mapping accuracy. Thirdly, the sample ascertainment method does not have much effect on mapping accuracy. Finally, an interesting side result is that the simple haplotype association test clearly outperformed exhaustive allelic transmission disequilibrium tests. The results suggest that the case-control design is a powerful alternative to the more laborious family based ascertainment approach, especially for large datasets, and wherever population stratification can be controlled.
Collapse
Affiliation(s)
- P Hintsanen
- Helsinki Institute for Information Technology, Basic Research Unit, Department of Computer Science, University of Helsinki, Finland
| | | | | | | | | |
Collapse
|
41
|
Clark TG, De Iorio M, Griffiths RC, Farrall M. Finding associations in dense genetic maps: a genetic algorithm approach. Hum Hered 2005; 60:97-108. [PMID: 16220001 DOI: 10.1159/000088845] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2005] [Accepted: 07/26/2005] [Indexed: 11/19/2022] Open
Abstract
Large-scale association studies hold promise for discovering the genetic basis of common human disease. These studies will consist of a large number of individuals, as well as large number of genetic markers, such as single nucleotide polymorphisms (SNPs). The potential size of the data and the resulting model space require the development of efficient methodology to unravel associations between phenotypes and SNPs in dense genetic maps. Our approach uses a genetic algorithm (GA) to construct logic trees consisting of Boolean expressions involving strings or blocks of SNPs. These blocks or nodes of the logic trees consist of SNPs in high linkage disequilibrium (LD), that is, SNPs that are highly correlated with each other due to evolutionary processes. At each generation of our GA, a population of logic tree models is modified using selection, cross-over and mutation moves. Logic trees are selected for the next generation using a fitness function based on the marginal likelihood in a Bayesian regression frame-work. Mutation and cross-over moves use LD measures to pro pose changes to the trees, and facilitate the movement through the model space. We demonstrate our method and the flexibility of logic tree structure with variable nodal lengths on simulated data from a coalescent model, as well as data from a candidate gene study of quantitative genetic variation.
Collapse
Affiliation(s)
- Taane G Clark
- Department of Epidemiology and Public Health, Imperial College, St. Mary's Campus, Norfolk Place, London W2 1PG, UK.
| | | | | | | |
Collapse
|
42
|
Abstract
Much effort and expense are being spent internationally to detect genetic polymorphisms contributing to susceptibility to complex human disease. Concomitantly, the technology for detecting and genotyping single nucleotide polymorphisms (SNPs) has undergone rapid development, yielding extensive catalogues of these polymorphisms across the genome. Population-based maps of the correlations amongst SNPs (linkage disequilibrium) are now being developed to accelerate the discovery of genes for complex human diseases. These genomic advances coincide with an increasing recognition of the importance of very large sample sizes for studying genetic effects. Together, these new genetic and epidemiological data hold renewed promise for the identification of susceptibility genes for complex traits. We review the state of knowledge about the structure of the human genome as related to SNPs and linkage disequilibrium, discuss the potential applications of this knowledge to mapping complex disease genes, and consider the issues facing whole genome association scanning using SNPs.
Collapse
Affiliation(s)
- Lyle J Palmer
- Western Australian Institute for Medical Research and University of Western Australia Centre for Medical Research, University of Western Australia.
| | | |
Collapse
|
43
|
Morris AP. Direct analysis of unphased SNP genotype data in population-based association studies via Bayesian partition modelling of haplotypes. Genet Epidemiol 2005; 29:91-107. [PMID: 15940704 DOI: 10.1002/gepi.20080] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
We describe a novel method for assessing the strength of disease association with single nucleotide polymorphisms (SNPs) in a candidate gene or small candidate region, and for estimating the corresponding haplotype relative risks of disease, using unphased genotype data directly. We begin by estimating the relative frequencies of haplotypes consistent with observed SNP genotypes. Under the Bayesian partition model, we specify cluster centres from this set of consistent SNP haplotypes. The remaining haplotypes are then assigned to the cluster with the "nearest" centre, where distance is defined in terms of SNP allele matches. Within a logistic regression modelling framework, each haplotype within a cluster is assigned the same disease risk, reducing the number of parameters required. Uncertainty in phase assignment is addressed by considering all possible haplotype configurations consistent with each unphased genotype, weighted in the logistic regression likelihood by their probabilities, calculated according to the estimated relative haplotype frequencies. We develop a Markov chain Monte Carlo algorithm to sample over the space of haplotype clusters and corresponding disease risks, allowing for covariates that might include environmental risk factors or polygenic effects. Application of the algorithm to SNP genotype data in an 890-kb region flanking the CYP2D6 gene illustrates that we can identify clusters of haplotypes with similar risk of poor drug metaboliser (PDM) phenotype, and can distinguish PDM cases carrying different high-risk variants. Further, the results of a detailed simulation study suggest that we can identify positive evidence of association for moderate relative disease risks with a sample of 1,000 cases and 1,000 controls.
Collapse
Affiliation(s)
- Andrew P Morris
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
44
|
Schouten MT, Williams CKI, Haley CS. The impact of using related individuals for haplotype reconstruction in population studies. Genetics 2005; 171:1321-30. [PMID: 15944347 PMCID: PMC1456835 DOI: 10.1534/genetics.105.042762] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Recent studies have highlighted the dangers of using haplotypes reconstructed directly from population data for a fine-scale mapping analysis. Family data may help resolve ambiguity, yet can be costly to obtain. This study is concerned with the following question: How much family data (if any) should be used to facilitate haplotype reconstruction in a population study? We conduct a simulation study to evaluate how changes in family information can affect the accuracy of haplotype frequency estimates and phase reconstruction. To reconstruct haplotypes, we introduce an EM-based algorithm that can efficiently accommodate unrelated individuals, parent-child trios, and arbitrarily large half-sib pedigrees. Simulations are conducted for a diverse set of haplotype frequency distributions, all of which have been previously published in empirical studies. A wide variety of important results regarding the effectiveness of using pedigree data in a population study are presented in a coherent, unified framework. Insight into the different properties of the haplotype frequency distribution that can influence experimental design is provided. We show that a preliminary estimate of the haplotype frequency distribution can be valuable in large population studies with fixed resources.
Collapse
Affiliation(s)
- Michael T Schouten
- School of Informatics, University of Edinburgh, Edinburgh EH1 2QL, United Kingdom.
| | | | | |
Collapse
|
45
|
Abstract
Haplotypes have played a major role in the study of highly-penetrant single-gene disorders, and recent evidence that the human genome has hot-spots and cold-spots for recombination have suggested that haplotype-based methods may play a key role in the study of common complex traits. This report reviews the motivation of using haplotypes for the study of the genetic basis of human traits, ranging from biologic function, to statistical power advantages of haplotypes, to linkage disequilibrium fine-mapping. Recent developments of regression models for haplotype analyses are reviewed, offering a synthesis of current methods, as well as their limitations and areas that require further research. Regression models provide significant advantages, such as the ability to control for non-genetic covariates, the effects of the haplotypes can be modeled, step-wise selection can be used to screen for a subset of markers that explain most of the association, haplotype x environment interactions can be evaluated, and regression diagnostics are well developed. Despite these strengths, the current regression methods tend to lack the sophisticated population genetic perspectives offered by coalescent and other similar approaches. Future work that links regression methods with population genetic models may prove beneficial.
Collapse
Affiliation(s)
- Daniel J Schaid
- Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, Minnesota 55905, USA.
| |
Collapse
|
46
|
Abstract
The International Haplotype Mapping Project (HapMap) aims to characterize the distribution and extent of linkage disequilibrium (LD) throughout the human genome, thereby facilitating genome-wide association analysis and the search for the genetic determinants of complex diseases. Implicit in the rationale behind the project is the expectation that hidden (unobserved) disease-causing variants will be in significant LD with surrounding typed markers and will thus be amenable to detection using association-based mapping approaches. In order to investigate the validity of this assumption, we examined more than 5,000 SNPs across a 10-MB region of chromosome 20 in a sample of 96 unrelated African-American and 96 unrelated Caucasian individuals. We treated observed loci as surrogates for hidden SNPs by pretending that individuals' genotypes were unknown. We then attempted to predict these genotypes at the surrogate hidden SNP by using information about LD in the region and genotypes at surrounding observed loci. Our method is based on finding the most likely genotype for each individual, given all possible haplotype pairs consistent with observed genotypes for that individual at surrounding loci, and given the frequencies of those haplotypes in an independent sample. Our method performs extremely well in predicting genotypes in areas of high LD. Furthermore, in areas of low LD, our method results in substantial gains in predictive accuracy as compared to pair-wise strategies. These results suggest that pair-wise tests of disease-marker association may be inferior to multipoint methods, which take advantage of the information contained within multi-locus haplotypes.
Collapse
Affiliation(s)
- David M Evans
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
| | | | | |
Collapse
|
47
|
Lee SH, Van der Werf JHJ. The role of pedigree information in combined linkage disequilibrium and linkage mapping of quantitative trait loci in a general complex pedigree. Genetics 2005; 169:455-66. [PMID: 15677753 PMCID: PMC1448885 DOI: 10.1534/genetics.104.033233] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2004] [Accepted: 09/20/2004] [Indexed: 11/18/2022] Open
Abstract
Combined linkage disequilibrium and linkage (LDL) mapping can exploit historical as well as recent and observed recombinations in a recorded pedigree. We investigated the role of pedigree information in LDL mapping and the performance of LDL mapping in general complex pedigrees. We compared using complete and incomplete genotypic data, spanning 5 or 10 generations of known pedigree, and we used bi- or multiallelic markers that were positioned at 1- or 5-cM intervals. Analyses carried out with or without pedigree information were compared. Results were compared with linkage mapping in some of the data sets. Linkage mapping or LDL mapping with sparse marker spacing ( approximately 5 cM) gave a poorer mapping resolution without considering pedigree information compared to that with considering pedigree information. The difference was bigger in a pedigree of more generations. However, LDL mapping with closely linked markers ( approximately 1 cM) gave a much higher mapping resolution regardless of using pedigree information. This study shows that when marker spacing is dense and there is considerable linkage disequilibrium generated from historical recombinations between flanking markers and QTL, the loss of power due to ignoring pedigree information is negligible and mapping resolution is very high.
Collapse
Affiliation(s)
- S H Lee
- School of Rural Science and Agriculture, University of New England, New South Wales 2351, Australia.
| | | |
Collapse
|
48
|
Maniatis N, Morton NE, Gibson J, Xu CF, Hosking LK, Collins A. The optimal measure of linkage disequilibrium reduces error in association mapping of affection status. Hum Mol Genet 2004; 14:145-53. [PMID: 15548543 DOI: 10.1093/hmg/ddi019] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We have developed a simple yet powerful approach for disease gene association mapping by linkage disequilibrium (LD). This method is unique because it applies a model with evolutionary theory that incorporates a parameter for the location of the causal polymorphism. The method exploits LD maps, which assign a location in LD units (LDU) for each marker. This approach is based on single marker tests within a composite likelihood framework, which avoids the heavy Bonferroni correction through multiple testing. As a proof of principle, we tested an 890 kb region flanking the CYP2D6 gene associated with poor drug-metabolizing activity in order to refine the localization of a causal mutation. Previous LD mapping studies using single markers and haplotypes have identified a 390 kb significant region associated with the poor drug-metabolizing phenotype on chromosome 22. None of the 27 Single nucleotide polymorphisms was within the gene. Using a metric LDU map, the commonest functional polymorphism within the gene was located at 14.9 kb from its true location, surrounded within a 95% confidence interval of 172 kb. The kb map had a relative efficiency of 33% compared with the LDU map. Our findings indicate that the support interval and location error are smaller than any published results. Despite the low resolution and the strong LD in the region, our results provide evidence of the substantial utility of LDU maps for disease gene association mapping. These tests are robust to large numbers of markers and are applicable to haplotypes, diplotypes, whole-genome association or candidate region studies.
Collapse
Affiliation(s)
- N Maniatis
- Human Genetics Division, University of Southampton, Southampton General Hospital, Southampton, UK.
| | | | | | | | | | | |
Collapse
|
49
|
Mitra N, Ye TZ, Smith A, Chuai S, Kirchhoff T, Peterlongo P, Nafa K, Phillips MS, Offit K, Ellis NA. Localization of Cancer Susceptibility Genes by Genome-wide Single-Nucleotide Polymorphism Linkage-Disequilibrium Mapping. Cancer Res 2004; 64:8116-25. [PMID: 15520224 DOI: 10.1158/0008-5472.can-04-1411] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
With the large numbers of single nucleotide polymorphisms (SNPs) available and new technologies that permit high throughput genotyping, we have investigated the possibility of the localization of disease genes with genome-wide panels of SNP markers and taking advantage of the linkage-disequilibrium (LD) between the disease gene and closely linked markers. For this purpose, we selected cases from the Ashkenazi Jewish population, in which the mutant alleles are expected to be identical by descent from a common founder and the regions of LD encompassing these mutant alleles are large. As a validation of this approach for localization, we performed two trials: one in autosomal recessive Bloom syndrome, in which a unique mutation of the BLM gene is present at elevated frequencies in cases, and the other in autosomal dominant hereditary nonpolyposis colorectal cancer (HNPCC), in which a unique mutation of MSH2 is present at elevated frequencies. In the Bloom syndrome trial, we genotyped 3,258 SNPs in 10 Jewish Bloom syndrome cases and 31 non-Bloom syndrome Jewish persons as a comparison group. In the HNPCC trial, we genotyped 8,549 SNPS in 13 Jewish HNPCC cases whose colon cancers exhibited microsatellite instability and in 63 healthy Jews as a comparison group. To identify significant associations, we performed (a) Fisher's exact test comparing genotypes at each locus in cases versus controls and (b) a haplotype analysis by estimating the frequency of haplotypes with the expectation-maximization algorithm and comparing haplotype frequencies in cases versus controls by logistic regression and a maximum likelihood ratio method. In the Bloom syndrome trial, by Fisher's exact test, statistically significant association was detected at a single locus, TSC0754862, which is a locus 1.7 million bp from BLM. Two-locus, three-locus, and four-locus haplotypes that included TSC0754862 and flanked BLM were also statistically more frequent in cases versus controls. In the HNPCC trial, although a significant P value was not obtained by the single SNP genotype analysis, significant associations were detected for several multilocus haplotypes in an 11-million-bp region that contained the MSH2 gene. This work demonstrates the power of the LD mapping approach in an isolated population and its general applicability to the identification of novel cancer-causing genes.
Collapse
Affiliation(s)
- Nandita Mitra
- Department of Epidemiology and Biostatistics, and Department of Medicine, Memorial Sloan-Kettering Cancer Center, New York, New York 10021, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
50
|
Lin Z, Altman RB. Finding haplotype tagging SNPs by use of principal components analysis. Am J Hum Genet 2004; 75:850-61. [PMID: 15389393 PMCID: PMC1182114 DOI: 10.1086/425587] [Citation(s) in RCA: 73] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2004] [Accepted: 08/31/2004] [Indexed: 11/03/2022] Open
Abstract
The immense volume and rapid growth of human genomic data, especially single nucleotide polymorphisms (SNPs), present special challenges for both biomedical researchers and automatic algorithms. One such challenge is to select an optimal subset of SNPs, commonly referred as "haplotype tagging SNPs" (htSNPs), to capture most of the haplotype diversity of each haplotype block or gene-specific region. This information-reduction process facilitates cost-effective genotyping and, subsequently, genotype-phenotype association studies. It also has implications for assessing the risk of identifying research subjects on the basis of SNP information deposited in public domain databases. We have investigated methods for selecting htSNPs by use of principal components analysis (PCA). These methods first identify eigenSNPs and then map them to actual SNPs. We evaluated two mapping strategies, greedy discard and varimax rotation, by assessing the ability of the selected htSNPs to reconstruct genotypes of non-htSNPs. We also compared these methods with two other htSNP finders, one of which is PCA based. We applied these methods to three experimental data sets and found that the PCA-based methods tend to select the smallest set of htSNPs to achieve a 90% reconstruction precision.
Collapse
Affiliation(s)
- Zhen Lin
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA 94305-5120, USA
| | | |
Collapse
|