1
|
Huang K, Dunn DW, Li W, Wang D, Li B. Linkage disequilibrium under polysomic inheritance. Heredity (Edinb) 2022; 128:11-20. [PMID: 34983965 PMCID: PMC8733019 DOI: 10.1038/s41437-021-00482-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 10/13/2021] [Accepted: 10/18/2021] [Indexed: 01/03/2023] Open
Abstract
Linkage disequilibrium (LD) is the non-random association of alleles at different loci. Squared LD coefficients r2 (for phased genotypes) and [Formula: see text] (for unphased genotypes) will converge to constants that are determined by the sample size, the recombination frequency, the effective population size and the mating system. LD can therefore be used for gene mapping and the estimation of effective population size. However, current methods work only with diploids. To resolve this problem, we here extend the linkage disequilibrium measures to include polysomic inheritance. We derive the values of r2 and [Formula: see text] at equilibrium state for various mating systems and different ploidy levels. For unlinked loci, [Formula: see text] for monoecious and dioecious (with random pairing) mating systems or [Formula: see text] for dioecious mating systems (with lifetime pairing), where f is the number of females in a half-sib family and η is a constant related to the ploidy level. We simulate the application of estimating Ne using unphased genotypes. We find that estimating Ne in polyploids requires similar sample sizes and numbers of loci as in diploids, with the main source of bias due to using 0.5 as the recombination frequency.
Collapse
Affiliation(s)
- Kang Huang
- Shaanxi Key Laboratory for Animal Conservation, College of Life Sciences, Northwest University, Xi'an, 710069, China
- Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC, V6T1Z4, Canada
| | - Derek W Dunn
- Shaanxi Key Laboratory for Animal Conservation, College of Life Sciences, Northwest University, Xi'an, 710069, China
| | - Wenkai Li
- Shaanxi Key Laboratory for Animal Conservation, College of Life Sciences, Northwest University, Xi'an, 710069, China
| | - Dan Wang
- Shaanxi Key Laboratory for Animal Conservation, College of Life Sciences, Northwest University, Xi'an, 710069, China
| | - Baoguo Li
- Shaanxi Key Laboratory for Animal Conservation, College of Life Sciences, Northwest University, Xi'an, 710069, China.
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China.
| |
Collapse
|
2
|
Ahsan T, Sajib AA. Drug-response related genetic architecture of Bangladeshi population. Meta Gene 2019. [DOI: 10.1016/j.mgene.2019.100585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
|
3
|
He ZX, Chen XW, Zhou ZW, Zhou SF. Impact of physiological, pathological and environmental factors on the expression and activity of human cytochrome P450 2D6 and implications in precision medicine. Drug Metab Rev 2015; 47:470-519. [PMID: 26574146 DOI: 10.3109/03602532.2015.1101131] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
With only 1.3-4.3% in total hepatic CYP content, human CYP2D6 can metabolize more than 160 drugs. It is a highly polymorphic enzyme and subject to marked inhibition by a number of drugs, causing a large interindividual variability in drug clearance and drug response and drug-drug interactions. The expression and activity of CYP2D6 are regulated by a number of physiological, pathological and environmental factors at transcriptional, post-transcriptional, translational and epigenetic levels. DNA hypermethylation and histone modifications can repress the expression of CYP2D6. Hepatocyte nuclear factor-4α binds to a directly repeated element in the promoter of CYP2D6 and thus regulates the expression of CYP2D6. Small heterodimer partner represses hepatocyte nuclear factor-4α-mediated transactivation of CYP2D6. GW4064, a farnesoid X receptor agonist, decreases hepatic CYP2D6 expression and activity while increasing small heterodimer partner expression and its recruitment to the CYP2D6 promoter. The genotypes are key determinants of interindividual variability in CYP2D6 expression and activity. Recent genome-wide association studies have identified a large number of genes that can regulate CYP2D6. Pregnancy induces CYP2D6 via unknown mechanisms. Renal or liver diseases, smoking and alcohol use have minor to moderate effects only on CYP2D6 activity. Unlike CYP1 and 3 and other CYP2 members, CYP2D6 is resistant to typical inducers such as rifampin, phenobarbital and dexamethasone. Post-translational modifications such as phosphorylation of CYP2D6 Ser135 have been observed, but the functional impact is unknown. Further functional and validation studies are needed to clarify the role of nuclear receptors, epigenetic factors and other factors in the regulation of CYP2D6.
Collapse
Affiliation(s)
- Zhi-Xu He
- a Guizhou Provincial Key Laboratory for Regenerative Medicine, Stem Cell and Tissue Engineering Research Center & Sino-US Joint Laboratory for Medical Sciences, Guizhou Medical University , Guiyang , Guizhou , China
| | - Xiao-Wu Chen
- b Department of General Surgery , The First People's Hospital of Shunde, Southern Medical University , Shunde , Foshan , Guangdong , China , and
| | - Zhi-Wei Zhou
- c Department of Pharmaceutical Science , College of Pharmacy, University of South Florida , Tampa , FL , USA
| | - Shu-Feng Zhou
- a Guizhou Provincial Key Laboratory for Regenerative Medicine, Stem Cell and Tissue Engineering Research Center & Sino-US Joint Laboratory for Medical Sciences, Guizhou Medical University , Guiyang , Guizhou , China .,c Department of Pharmaceutical Science , College of Pharmacy, University of South Florida , Tampa , FL , USA
| |
Collapse
|
4
|
Fine-scale mapping of disease susceptibility locus with Bayesian partition model. Genes Genomics 2012. [DOI: 10.1007/s13258-011-0220-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
5
|
Factors affecting the effective number of tests in genetic association studies: a comparative study of three PCA-based methods. J Hum Genet 2011; 56:428-35. [PMID: 21451529 DOI: 10.1038/jhg.2011.34] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The number of tested marker becomes numerous in genetic association studies (GAS) and one major challenge is to derive the multiple testing threshold. Some approaches calculating an effective number (M(eff)) of tests in GAS were developed and have been shown to be promising. As yet, there have been no comparisons of their robustness to influencing factors. We evaluated the performance of three principal component analysis (PCA)-based M(eff) estimation formulas (M(eff-C) in Cheverud (2001), M(eff-L) in Li and Ji (2005), and M(eff-G) in Galwey (2009)). Four influencing factors including LD measurements, marker density, population samples and the total number of tested markers were considered. We validated them by the Bonferroni's method and the permutation test with 10 000 random shuffles based on three real data sets. For each factor, M(eff-C) yielded conservative threshold except with D' coefficient, and M(eff-G) would be too liberal compared with the permutation test. Our results indicated that M(eff-L) based on r(2) coefficient achieve close approximation of the permutation threshold. As for a large number of markers, we recommended to use M(eff-L) with r(2) coefficient according to fixed-length separation, as well as fixed-number separation, to obtain accurate estimate of the multiple testing threshold and to save more computational time.
Collapse
|
6
|
Genetic polymorphism, linkage disequilibrium, haplotype structure and novel allele analysis of CYP2C19 and CYP2D6 in Han Chinese. THE PHARMACOGENOMICS JOURNAL 2009; 9:380-94. [DOI: 10.1038/tpj.2009.31] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
7
|
Gorroochurn P. Perils in the Use of Linkage Disequilibrium for Fine Gene Mapping: Simple Insights from Population Genetics. Cancer Epidemiol Biomarkers Prev 2008; 17:3292-7. [DOI: 10.1158/1055-9965.epi-08-0717] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
8
|
Tsai MY, Hsiao CK, Wen SH. A Bayesian spatial multimarker genetic random-effect model for fine-scale mapping. Ann Hum Genet 2008; 72:658-69. [PMID: 18573105 DOI: 10.1111/j.1469-1809.2008.00459.x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Multiple markers in linkage disequilibrium (LD) are usually used to localize the disease gene location. These markers may contribute to the disease etiology simultaneously. In contrast to the single-locus tests, we propose a genetic random effects model that accounts for the dependence between loci via their spatial structures. In this model, the locus-specific random effects measure not only the genetic disease risk, but also the correlations between markers. In other words, the model incorporates this relation in both mean and covariance structures, and the variance components play important roles. We consider two different settings for the spatial relations. The first is our proposal, relative distance function (RDF), which is intuitive in the sense that markers nearby are likely to correlate with each other. The second setting is a common exponential decay function (EDF). Under each setting, the inference of the genetic parameters is fully Bayesian with Markov chain Monte Carlo (MCMC) sampling. We demonstrate the validity and the utility of the proposed approach with two real datasets and simulation studies. The analyses show that the proposed model with either one of two spatial correlations performs better as compared with the single locus analysis. In addition, under the RDF model, a more precise estimate for the disease locus can be obtained even when the candidate markers are fairly dense. In all simulations, the inference under the true model provides unbiased estimates of the genetic parameters, and the model with the spatial correlation structure does lead to greater confidence interval coverage probabilities.
Collapse
Affiliation(s)
- M-Y Tsai
- Institute of Statistics and Information Science, College of Science, National Changhua University of Education
| | | | | |
Collapse
|
9
|
Hosking FJ, Sterne JAC, Smith GD, Green PJ. Inference from genome-wide association studies using a novel Markov model. Genet Epidemiol 2008; 32:497-504. [PMID: 18383184 DOI: 10.1002/gepi.20322] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
In this paper we propose a Bayesian modeling approach to the analysis of genome-wide association studies based on single nucleotide polymorphism (SNP) data. Our latent seed model combines various aspects of k-means clustering, hidden Markov models (HMMs) and logistic regression into a fully Bayesian model. It is fitted using the Markov chain Monte Carlo stochastic simulation method, with Metropolis-Hastings update steps. The approach is flexible, both in allowing different types of genetic models, and because it can be easily extended while remaining computationally feasible due to the use of fast algorithms for HMMs. It allows for inference primarily on the location of the causal locus and also on other parameters of interest. The latent seed model is used here to analyze three data sets, using both synthetic and real disease phenotypes with real SNP data, and shows promising results. Our method is able to correctly identify the causal locus in examples where single SNP analysis is both successful and unsuccessful at identifying the causal SNP.
Collapse
Affiliation(s)
- Fay J Hosking
- Department of Mathematics, University of Bristol, Bristol, UK.
| | | | | | | |
Collapse
|
10
|
Tachmazidou I, Verzilli CJ, De Iorio M. Genetic association mapping via evolution-based clustering of haplotypes. PLoS Genet 2008; 3:e111. [PMID: 17616979 PMCID: PMC1913101 DOI: 10.1371/journal.pgen.0030111] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2006] [Accepted: 05/21/2007] [Indexed: 11/19/2022] Open
Abstract
Multilocus analysis of single nucleotide polymorphism haplotypes is a promising approach to dissecting the genetic basis of complex diseases. We propose a coalescent-based model for association mapping that potentially increases the power to detect disease-susceptibility variants in genetic association studies. The approach uses Bayesian partition modelling to cluster haplotypes with similar disease risks by exploiting evolutionary information. We focus on candidate gene regions with densely spaced markers and model chromosomal segments in high linkage disequilibrium therein assuming a perfect phylogeny. To make this assumption more realistic, we split the chromosomal region of interest into sub-regions or windows of high linkage disequilibrium. The haplotype space is then partitioned into disjoint clusters, within which the phenotype-haplotype association is assumed to be the same. For example, in case-control studies, we expect chromosomal segments bearing the causal variant on a common ancestral background to be more frequent among cases than controls, giving rise to two separate haplotype clusters. The novelty of our approach arises from the fact that the distance used for clustering haplotypes has an evolutionary interpretation, as haplotypes are clustered according to the time to their most recent common ancestor. Our approach is fully Bayesian and we develop a Markov Chain Monte Carlo algorithm to sample efficiently over the space of possible partitions. We compare the proposed approach to both single-marker analyses and recently proposed multi-marker methods and show that the Bayesian partition modelling performs similarly in localizing the causal allele while yielding lower false-positive rates. Also, the method is computationally quicker than other multi-marker approaches. We present an application to real genotype data from the CYP2D6 gene region, which has a confirmed role in drug metabolism, where we succeed in mapping the location of the susceptibility variant within a small error.
Collapse
Affiliation(s)
- Ioanna Tachmazidou
- Department of Epidemiology and Public Health, Imperial College London, United Kingdom.
| | | | | |
Collapse
|
11
|
Abstract
Multi-locus association analyses, including haplotype-based analyses, can sometimes provide greater power than single-locus analyses for detecting disease susceptibility loci. This potential gain, however, can be compromised by the large number of degrees of freedom caused by irrelevant markers. Exhaustive search for the optimal set of markers might be possible for a small number of markers, yet it is computationally inefficient. In this paper, we present a sequential haplotype scan method to search for combinations of adjacent markers that are jointly associated with disease status. When evaluating each marker, we add markers close to it in a sequential manner: a marker is added if its contribution to the haplotype association with disease is warranted, conditional on current haplotypes. This conditional evaluation is based on the well-known Mantel-Haenszel statistic. We propose two permutation based methods to evaluate the growing haplotypes: a haplotype method for the combined markers, and a summary method that sums conditional statistics. We compared our proposed methods, the single-locus method, and a sliding window method using simulated data. We also applied our sequential haplotype scan algorithm to experimental data for CYP2D6. The results indicate that the sequential scan procedure can identify a set of adjacent markers whose haplotypes might have strong genetic effects or be in linkage disequilibrium with disease predisposing variants. As a result, our methods can achieve greater power than the single-locus method, yet is much more computationally efficient than sliding window methods.
Collapse
Affiliation(s)
- Zhaoxia Yu
- Division of Biostatistics, Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, Minnesota 55905, USA
| | | |
Collapse
|
12
|
Maniatis N, Collins A, Morton NE. Effects of single SNPs, haplotypes, and whole-genome LD maps on accuracy of association mapping. Genet Epidemiol 2007; 31:179-88. [PMID: 17285621 DOI: 10.1002/gepi.20199] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
We describe an association mapping approach that utilizes linkage disequilibrium (LD) maps in LD units (LDU). This method uses composite likelihood to combine information from all single marker tests, and applies a model with a parameter for the location of the causal polymorphism. Previous analyses of the poor drug metabolizer phenotype provided evidence of the substantial utility of LDU maps for disease gene association mapping. Using LDU locations for the 27 single nucleotide polymorphisms (SNPs) flanking the CYP2D6 gene on chromosome 22, the most common functional polymorphism within the gene was located at 15 kb from its true location. Here, we examine the performance of this mapping approach by exploiting the high-density LDU map constructed from the HapMap data. Expressing the locations of the 27 SNPs in LDU from the HapMap LDU map, analysis yielded an estimated location that is only 0.3 kb away from the CYP2D6 gene. This supports the use of the high marker density HapMap-derived LDU map for association mapping even though it is derived from a much smaller number of individuals compared to the CYP2D6 sample. We also examine the performance of 2-SNP haplotypes. Using the same modelling procedures and composite likelihood as for single SNPs, the haplotype data provided much poorer localization compared to single SNP analysis. Haplotypes generate more autocorrelation through multiple inclusions of the same SNPs, which could inflate significance in association studies. The results of the present study demonstrate the great potential of the genome HapMap LDU maps for high-resolution mapping of complex phenotypes.
Collapse
Affiliation(s)
- Nikolas Maniatis
- Human Genetics Division, University of Southampton, Southampton General Hospital, Southampton, UK.
| | | | | |
Collapse
|
13
|
De Iorio M, Verzilli CJ. A spatial probit model for fine-scale mapping of disease genes. Genet Epidemiol 2007; 31:252-60. [PMID: 17266116 DOI: 10.1002/gepi.20206] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We present a novel statistical method for linkage disequilibrium (LD) mapping of disease susceptibility loci in case-control studies. Such studies exploit the statistical correlation or LD that exist between variants physically close along the genome to identify those that correlate with disease status and might thus be close to a causative mutation, generally assumed unobserved. LD structure, however, varies markedly over short distances because of variation in local recombination rates, mutation and genetic drift among other factors. We propose a Bayesian multivariate probit model that flexibly accounts for the local spatial correlation between markers. In a case-control setting, we use a retrospective model that properly reflects the sampling scheme and identify regions where single- or multi-locus marker frequencies differ across cases and controls. We formally quantify these differences using information-theoretic distance measures while the fully Bayesian approach naturally accommodates unphased or missing genotype data. We demonstrate our approach on simulated data and on real data from the CYP2D6 region that has a confirmed role in drug metabolism.
Collapse
Affiliation(s)
- Maria De Iorio
- Department of Epidemiology and Public Health, Imperial College London, London, UK.
| | | |
Collapse
|
14
|
Morton N, Maniatis N, Zhang W, Ennis S, Collins A. Genome scanning by composite likelihood. Am J Hum Genet 2007; 80:19-28. [PMID: 17160891 PMCID: PMC1785319 DOI: 10.1086/510401] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2006] [Accepted: 10/24/2006] [Indexed: 01/22/2023] Open
Abstract
Ambitious programs have recently been advocated or launched to create genomewide databases for meta-analysis of association between DNA markers and phenotypes of medical and/or social concern. A necessary but not sufficient condition for success in association mapping is that the data give accurate estimates of both genomic location and its standard error, which are provided for multifactorial phenotypes by composite likelihood. That class includes the Malecot model, which we here apply with an illustrative example. This preliminary analysis leads to five inferences: permutation of cases and controls provides a test of association free of autocorrelation; two hypotheses give similar estimates, but one is consistently more accurate; estimation of the false-discovery rate is extended to causal genes in a small proportion of regions; the minimal data for successful meta-analysis are inferred; and power is robust for all genomic factors except minor-allele frequency. An extension to meta-analysis is proposed. Other approaches to genome scanning and meta-analysis should, if possible, be similarly extended so that their operating characteristics can be compared.
Collapse
Affiliation(s)
- Newton Morton
- Human Genetics Division, University of Southampton, Southampton General Hospital, Southampton ,SO16 6YD, UK.
| | | | | | | | | |
Collapse
|
15
|
Abstract
Over the last few years, association mapping of disease genes has developed into one of the most dynamic research areas of human genetics. It focuses on identifying functional polymorphisms that predispose to complex diseases. Population-based approaches are concerned with exploiting linkage disequilibrium (LD) between single-nucleotide polymorphism (SNPs) and disease-predisposing loci. The utility of SNPs in association mapping is now well established and the interest in this field has been escalated by the discovery of millions of SNPs across the genome. This chapter reviews an association-mapping method that utilizes metric LD maps in LD units and employs a composite likelihood approach to combine information from all single SNP tests. It applies a model that incorporates a parameter for the location of the causal polymorphism. A proof-of-principle application of this method to a small region is given and its potential properties to large-scale datasets are discussed.
Collapse
|
16
|
Mailund T, Besenbacher S, Schierup MH. Whole genome association mapping by incompatibilities and local perfect phylogenies. BMC Bioinformatics 2006; 7:454. [PMID: 17042942 PMCID: PMC1624851 DOI: 10.1186/1471-2105-7-454] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2006] [Accepted: 10/16/2006] [Indexed: 11/21/2022] Open
Abstract
Background With current technology, vast amounts of data can be cheaply and efficiently produced in association studies, and to prevent data analysis to become the bottleneck of studies, fast and efficient analysis methods that scale to such data set sizes must be developed. Results We present a fast method for accurate localisation of disease causing variants in high density case-control association mapping experiments with large numbers of cases and controls. The method searches for significant clustering of case chromosomes in the "perfect" phylogenetic tree defined by the largest region around each marker that is compatible with a single phylogenetic tree. This perfect phylogenetic tree is treated as a decision tree for determining disease status, and scored by its accuracy as a decision tree. The rationale for this is that the perfect phylogeny near a disease affecting mutation should provide more information about the affected/unaffected classification than random trees. If regions of compatibility contain few markers, due to e.g. large marker spacing, the algorithm can allow the inclusion of incompatibility markers in order to enlarge the regions prior to estimating their phylogeny. Haplotype data and phased genotype data can be analysed. The power and efficiency of the method is investigated on 1) simulated genotype data under different models of disease determination 2) artificial data sets created from the HapMap ressource, and 3) data sets used for testing of other methods in order to compare with these. Our method has the same accuracy as single marker association (SMA) in the simplest case of a single disease causing mutation and a constant recombination rate. However, when it comes to more complex scenarios of mutation heterogeneity and more complex haplotype structure such as found in the HapMap data our method outperforms SMA as well as other fast, data mining approaches such as HapMiner and Haplotype Pattern Mining (HPM) despite being significantly faster. For unphased genotype data, an initial step of estimating the phase only slightly decreases the power of the method. The method was also found to accurately localise the known susceptibility variants in an empirical data set – the ΔF508 mutation for cystic fibrosis – where the susceptibility variant is already known – and to find significant signals for association between the CYP2D6 gene and poor drug metabolism, although for this dataset the highest association score is about 60 kb from the CYP2D6 gene. Conclusion Our method has been implemented in the Blossoc (BLOck aSSOCiation) software. Using Blossoc, genome wide chip-based surveys of 3 million SNPs in 1000 cases and 1000 controls can be analysed in less than two CPU hours.
Collapse
Affiliation(s)
- Thomas Mailund
- Department of Statistics, University of Oxford, UK
- Bioinformatics Research Center, University of Aarhus, Denmark
| | | | | |
Collapse
|
17
|
Morris AP. A flexible Bayesian framework for modeling haplotype association with disease, allowing for dominance effects of the underlying causative variants. Am J Hum Genet 2006; 79:679-94. [PMID: 16960804 PMCID: PMC1592560 DOI: 10.1086/508264] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2006] [Accepted: 08/02/2006] [Indexed: 11/03/2022] Open
Abstract
Multilocus analysis of single-nucleotide-polymorphism (SNP) haplotypes may provide evidence of association with disease, even when the individual loci themselves do not. Haplotype-based methods are expected to outperform single-SNP analyses because (i) common genetic variation can be structured into haplotypes within blocks of strong linkage disequilibrium and (ii) the functional properties of a protein are determined by the linear sequence of amino acids corresponding to DNA variation on a haplotype. Here, I propose a flexible Bayesian framework for modeling haplotype association with disease in population-based studies of candidate genes or small candidate regions. I employ a Bayesian partition model to describe the correlation between marker-SNP haplotypes and causal variants at the underlying functional polymorphism(s). Under this model, haplotypes are clustered according to their similarity, in terms of marker-SNP allele matches, which is used as a proxy for recent shared ancestry. Haplotypes within a cluster are then assigned the same probability of carrying a causal variant at the functional polymorphism(s). In this way, I can account for the dominance effect of causal variants, here corresponding to any deviation from a multiplicative contribution to disease risk. The results of a detailed simulation study demonstrate that there is minimal cost associated with modeling these dominance effects, with substantial gains in power over haplotype-based methods that do not incorporate clustering and that assume a multiplicative model of disease risks.
Collapse
Affiliation(s)
- Andrew P Morris
- Wellcome Trust Centre for Human Genetics, Oxford, OX3 7BN, United Kingdom.
| |
Collapse
|
18
|
Johnson T. Bayesian method for gene detection and mapping, using a case and control design and DNA pooling. Biostatistics 2006; 8:546-65. [PMID: 16984977 DOI: 10.1093/biostatistics/kxl028] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Association mapping studies aim to determine the genetic basis of a trait. A common experimental design uses a sample of unrelated individuals classified into 2 groups, for example cases and controls. If the trait has a complex genetic basis, consisting of many quantitative trait loci (QTLs), each group needs to be large. Each group must be genotyped at marker loci covering the region of interest; for dense coverage of a large candidate region, or a whole-genome scan, the number of markers will be very large. The total amount of genotyping required for such a study is formidable. A laboratory effort efficient technique called DNA pooling could reduce the amount of genotyping required, but the data generated are less informative and require novel methods for efficient analysis. In this paper, a Bayesian statistical analysis of the classic model of McPeek and Strahs is proposed. In contrast to previous work on this model, I assume that data are collected using DNA pooling, so individual genotypes are not directly observed, and also account for experimental errors. A complete analysis can be performed using analytical integration, a propagation algorithm for a hidden Markov model, and quadrature. The method developed here is both statistically and computationally efficient. It allows simultaneous detection and mapping of a QTL, in a large-scale association mapping study, using data from pooled DNA. The method is shown to perform well on data sets simulated under a realistic coalescent-with-recombination model, and is shown to outperform classical single-point methods. The method is illustrated on data consisting of 27 markers in an 880-kb region around the CYP2D6 gene.
Collapse
Affiliation(s)
- Toby Johnson
- School of Biological Sciences, The University of Edinburgh, Edinburgh EH9 3JT, UK.
| |
Collapse
|
19
|
Verzilli CJ, Stallard N, Whittaker JC. Bayesian graphical models for genomewide association studies. Am J Hum Genet 2006; 79:100-12. [PMID: 16773569 PMCID: PMC1474122 DOI: 10.1086/505313] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2005] [Accepted: 04/21/2006] [Indexed: 11/03/2022] Open
Abstract
As the extent of human genetic variation becomes more fully characterized, the research community is faced with the challenging task of using this information to dissect the heritable components of complex traits. Genomewide association studies offer great promise in this respect, but their analysis poses formidable difficulties. In this article, we describe a computationally efficient approach to mining genotype-phenotype associations that scales to the size of the data sets currently being collected in such studies. We use discrete graphical models as a data-mining tool, searching for single- or multilocus patterns of association around a causative site. The approach is fully Bayesian, allowing us to incorporate prior knowledge on the spatial dependencies around each marker due to linkage disequilibrium, which reduces considerably the number of possible graphical structures. A Markov chain-Monte Carlo scheme is developed that yields samples from the posterior distribution of graphs conditional on the data from which probabilistic statements about the strength of any genotype-phenotype association can be made. Using data simulated under scenarios that vary in marker density, genotype relative risk of a causative allele, and mode of inheritance, we show that the proposed approach has better localization properties and leads to lower false-positive rates than do single-locus analyses. Finally, we present an application of our method to a quasi-synthetic data set in which data from the CYP2D6 region are embedded within simulated data on 100K single-nucleotide polymorphisms. Analysis is quick (<5 min), and we are able to localize the causative site to a very short interval.
Collapse
Affiliation(s)
- Claudio J Verzilli
- Department of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, UK.
| | | | | |
Collapse
|
20
|
Zaykin DV, Meng Z, Ehm MG. Contrasting linkage-disequilibrium patterns between cases and controls as a novel association-mapping method. Am J Hum Genet 2006; 78:737-746. [PMID: 16642430 PMCID: PMC1474029 DOI: 10.1086/503710] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2005] [Accepted: 02/22/2006] [Indexed: 01/15/2023] Open
Abstract
Identification and description of genetic variation underlying disease susceptibility, efficacy, and adverse reactions to drugs remains a difficult problem. One of the important steps in the analysis of variation in a candidate region is the characterization of linkage disequilibrium (LD). In a region of genetic association, the extent of LD varies between the case and the control groups. Separate plots of pairwise standardized measures of LD (e.g., D') for cases and controls are often presented for a candidate region, to graphically convey case-control differences in LD. However, the observed graphic differences lack statistical support. Therefore, we suggest the "LD contrast" test to compare whole matrices of disequilibrium between two samples. A common technique of assessing LD when the haplotype phase is unobserved is the expectation-maximization algorithm, with the likelihood incorporating the assumption of Hardy-Weinberg equilibrium (HWE). This approach presents a potential problem in that, in the region of genetic association, the HWE assumption may not hold when samples are selected on the basis of phenotypes. Here, we present a computationally feasible approach that does not assume HWE, along with graphic displays and a statistical comparison of pairwise matrices of LD between case and control samples. LD-contrast tests provide a useful addition to existing tools of finding and characterizing genetic associations. Although haplotype association tests are expected to provide superior power when susceptibilities are primarily determined by haplotypes, the LD-contrast tests demonstrate substantially higher power under certain haplotype-driven disease models.
Collapse
Affiliation(s)
- Dmitri V Zaykin
- National Institute of Environmental Health Sciences, National Institutes of Health.
| | - Zhaoling Meng
- Department of Biostatistics and Programming, Sanofi-Aventis, Bridgewater, NJ
| | - Margaret G Ehm
- Department of Genetics Research, GlaxoSmithKline, Research Triangle Park, NC
| |
Collapse
|
21
|
Waldron ERB, Whittaker JC, Balding DJ. Fine mapping of disease genes via haplotype clustering. Genet Epidemiol 2006; 30:170-9. [PMID: 16385468 DOI: 10.1002/gepi.20134] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We propose an algorithm for analysing SNP-based population association studies, which is a development of that introduced by Molitor et al. [2003: Am J Hum Genet 73:1368-1384]. It uses clustering of haplotypes to overcome the major limitations of many current haplotype-based approaches. We define a between-haplotype score that is simple, yet appears to capture much of the information about evolutionary relatedness of the haplotypes in the vicinity of a (unobserved) putative causal locus. Haplotype clusters can then be defined via a putative ancestral haplotype and a cut-off distance. The number of an individual's two haplotypes that lie within the cluster predicts the individual's genotype at the causal locus. This predicted genotype can then be investigated for association with the phenotype of interest. We implement our approach within a Markov-chain Monte Carlo algorithm that, in effect, searches over locations and ancestral haplotypes to identify large, case-rich clusters. The algorithm successfully fine-maps a causal mutation in a test analysis using real data, and achieves almost 98% accuracy in predicting the genotype at the causal locus. A simulation study indicates that the new algorithm is substantially superior to alternative approaches, and it also allows us to identify situations in which multi-point approaches can substantially improve over single-SNP analyses. Our algorithm runs quickly and there is scope for extension to a wide range of disease models and genomic scales.
Collapse
Affiliation(s)
- E R B Waldron
- Department of Epidemiology and Public Health, Imperial College London, St. Mary's Campus, Norfolk Place, London W2 1PG, United Kingdom.
| | | | | |
Collapse
|
22
|
Sabbagh A, Darlu P. Data-Mining Methods as Useful Tools for Predicting Individual Drug Response: Application to CYP2D6 Data. Hum Hered 2006; 62:119-34. [PMID: 17057402 DOI: 10.1159/000096416] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2006] [Accepted: 08/22/2006] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVES Selecting a maximally informative subset of polymorphisms to predict a clinical outcome, such as drug response, requires appropriate search methods due to the increased dimensionality associated with looking at multiple genotypes. In this study, we investigated the ability of several pattern recognition methods to identify the most informative markers in the CYP2D6 gene for the prediction of CYP2D6 metabolizer status. METHODS Four data-mining tools were explored: decision trees, random forests, artificial neural networks, and the multifactor dimensionality reduction (MDR) method. Marker selection was performed separately in eight population samples of different ethnic origin to evaluate to what extent the most informative markers differ across ethnic groups. RESULTS Our results show that the number of polymorphisms required to predict CYP2D6 metabolic phenotype with a high accuracy can be dramatically reduced owing to the strong haplotype block structure observed at CYP2D6. MDR and neural networks provided nearly identical results and performed the best. CONCLUSION Data-mining methods, such as MDR and neural networks, appear as promising tools to improve the efficiency of genotyping tests in pharmacogenetics with the ultimate goal of pre-screening patients for individual therapy selection with minimum genotyping effort.
Collapse
Affiliation(s)
- Audrey Sabbagh
- Unité de Recherche en Génétique Epidémiologique et Structure des Populations Humaines, INSERM U535, Villejuif, France.
| | | |
Collapse
|
23
|
Morris AP. Direct analysis of unphased SNP genotype data in population-based association studies via Bayesian partition modelling of haplotypes. Genet Epidemiol 2005; 29:91-107. [PMID: 15940704 DOI: 10.1002/gepi.20080] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
We describe a novel method for assessing the strength of disease association with single nucleotide polymorphisms (SNPs) in a candidate gene or small candidate region, and for estimating the corresponding haplotype relative risks of disease, using unphased genotype data directly. We begin by estimating the relative frequencies of haplotypes consistent with observed SNP genotypes. Under the Bayesian partition model, we specify cluster centres from this set of consistent SNP haplotypes. The remaining haplotypes are then assigned to the cluster with the "nearest" centre, where distance is defined in terms of SNP allele matches. Within a logistic regression modelling framework, each haplotype within a cluster is assigned the same disease risk, reducing the number of parameters required. Uncertainty in phase assignment is addressed by considering all possible haplotype configurations consistent with each unphased genotype, weighted in the logistic regression likelihood by their probabilities, calculated according to the estimated relative haplotype frequencies. We develop a Markov chain Monte Carlo algorithm to sample over the space of haplotype clusters and corresponding disease risks, allowing for covariates that might include environmental risk factors or polygenic effects. Application of the algorithm to SNP genotype data in an 890-kb region flanking the CYP2D6 gene illustrates that we can identify clusters of haplotypes with similar risk of poor drug metaboliser (PDM) phenotype, and can distinguish PDM cases carrying different high-risk variants. Further, the results of a detailed simulation study suggest that we can identify positive evidence of association for moderate relative disease risks with a sample of 1,000 cases and 1,000 controls.
Collapse
Affiliation(s)
- Andrew P Morris
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
24
|
Maniatis N, Morton NE, Gibson J, Xu CF, Hosking LK, Collins A. The optimal measure of linkage disequilibrium reduces error in association mapping of affection status. Hum Mol Genet 2004; 14:145-53. [PMID: 15548543 DOI: 10.1093/hmg/ddi019] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We have developed a simple yet powerful approach for disease gene association mapping by linkage disequilibrium (LD). This method is unique because it applies a model with evolutionary theory that incorporates a parameter for the location of the causal polymorphism. The method exploits LD maps, which assign a location in LD units (LDU) for each marker. This approach is based on single marker tests within a composite likelihood framework, which avoids the heavy Bonferroni correction through multiple testing. As a proof of principle, we tested an 890 kb region flanking the CYP2D6 gene associated with poor drug-metabolizing activity in order to refine the localization of a causal mutation. Previous LD mapping studies using single markers and haplotypes have identified a 390 kb significant region associated with the poor drug-metabolizing phenotype on chromosome 22. None of the 27 Single nucleotide polymorphisms was within the gene. Using a metric LDU map, the commonest functional polymorphism within the gene was located at 14.9 kb from its true location, surrounded within a 95% confidence interval of 172 kb. The kb map had a relative efficiency of 33% compared with the LDU map. Our findings indicate that the support interval and location error are smaller than any published results. Despite the low resolution and the strong LD in the region, our results provide evidence of the substantial utility of LDU maps for disease gene association mapping. These tests are robust to large numbers of markers and are applicable to haplotypes, diplotypes, whole-genome association or candidate region studies.
Collapse
Affiliation(s)
- N Maniatis
- Human Genetics Division, University of Southampton, Southampton General Hospital, Southampton, UK.
| | | | | | | | | | | |
Collapse
|
25
|
Hosking L, Lumsden S, Lewis K, Yeo A, McCarthy L, Bansal A, Riley J, Purvis I, Xu CF. Detection of genotyping errors by Hardy-Weinberg equilibrium testing. Eur J Hum Genet 2004; 12:395-9. [PMID: 14872201 DOI: 10.1038/sj.ejhg.5201164] [Citation(s) in RCA: 278] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
Genotyping data sets may contain errors that, in some instances, lead to false conclusions. Deviation from Hardy-Weinberg equilibrium (HWE) in random samples may be indicative of problematic assays. This study has analysed 107,000 genotypes generated by TaqMan, RFLP, sequencing or mass spectrometric methods from 443 single-nucleotide polymorphisms (SNPs). These SNPs are distributed both within genes and in intergenic regions. Genotype distributions for 36 out of 313 assays (11.5%) whose minor allele frequencies were >0.05 deviated from HWE (P<0.05). Some of the possible reasons for this deviation were explored: assays for five SNPs proved nonspecific, and genotyping errors were identified in 21 SNPs. For the remaining 10 SNPs, no reasons for deviation from HWE were identified. We demonstrate the successful identification of a proportion of nonspecific assays, and assays harbouring genotyping error. Consequently, our current high-throughput genotyping system incorporates tests for both assay specificity and deviation from HWE, to minimise the genotype error rate and therefore improve data quality.
Collapse
Affiliation(s)
- Louise Hosking
- GlaxoSmithKline Medicines Research Centre, Gunnels Wood Rd, Stevenage, Hertfordshire SG1 2NY, UK.
| | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Fuselli S, Dupanloup I, Frigato E, Cruciani F, Scozzari R, Moral P, Sistonen J, Sajantila A, Barbujani G. Molecular diversity at the CYP2D6 locus in the Mediterranean region. Eur J Hum Genet 2004; 12:916-24. [PMID: 15340360 DOI: 10.1038/sj.ejhg.5201243] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Despite the importance of cytochrome P450 in the metabolism of many drugs, several aspects of molecular variation at one of the main loci coding for it, CYP2D6, have never been analysed so far. Here we show that it is possible to rapidly and efficiently genotype the main European allelic variants at this locus by a SNaPshot method identifying chromosomal rearrangements and nine single-nucleotide polymorphisms. Haplotypes could be reconstructed from data on 494 chromosomes in six populations of the Mediterranean region. High levels of linkage disequilibrium were found within the chromosome region screened, suggesting that CYP2D6 may be part of a genomic recombination block, and hence that, aside from unequal crossingover that led to large chromosomal rearrangements, its haplotype diversity essentially originated through the accumulation of mutations. With the only, albeit statistically insignificant, exception of Syria, haplotype frequencies do not differ among the populations studied, despite the presence among them of three well-known genetic outliers, which could be the result of common selective pressures playing a role in shaping CYP2D6 variation over the area of Europe that we surveyed.
Collapse
Affiliation(s)
- Silvia Fuselli
- Department of Biology, University of Ferrara, via Borsari 46, 44100 Ferrara, Italy
| | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Morris AP, Whittaker JC, Balding DJ. Little loss of information due to unknown phase for fine-scale linkage-disequilibrium mapping with single-nucleotide-polymorphism genotype data. Am J Hum Genet 2004; 74:945-53. [PMID: 15077198 PMCID: PMC1181987 DOI: 10.1086/420773] [Citation(s) in RCA: 57] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2004] [Accepted: 02/12/2004] [Indexed: 11/03/2022] Open
Abstract
We present the results of a simulation study that indicate that true haplotypes at multiple, tightly linked loci often provide little extra information for linkage-disequilibrium fine mapping, compared with the information provided by corresponding genotypes, provided that an appropriate statistical analysis method is used. In contrast, a two-stage approach to analyzing genotype data, in which haplotypes are inferred and then analyzed as if they were true haplotypes, can lead to a substantial loss of information. The study uses our COLDMAP software for fine mapping, which implements a Markov chain-Monte Carlo algorithm that is based on the shattered coalescent model of genetic heterogeneity at a disease locus. We applied COLDMAP to 100 replicate data sets simulated under each of 18 disease models. Each data set consists of haplotype pairs (diplotypes) for 20 SNPs typed at equal 50-kb intervals in a 950-kb candidate region that includes a single disease locus located at random. The data sets were analyzed in three formats: (1). as true haplotypes; (2). as haplotypes inferred from genotypes using an expectation-maximization algorithm; and (3). as unphased genotypes. On average, true haplotypes gave a 6% gain in efficiency compared with the unphased genotypes, whereas inferring haplotypes from genotypes led to a 20% loss of efficiency, where efficiency is defined in terms of root mean integrated square error of the location of the disease locus. Furthermore, treating inferred haplotypes as if they were true haplotypes leads to considerable overconfidence in estimates, with nominal 50% credibility intervals achieving, on average, only 19% coverage. We conclude that (1). given appropriate statistical analyses, the costs of directly measuring haplotypes will rarely be justified by a gain in the efficiency of fine mapping and that (2). a two-stage approach of inferring haplotypes followed by a haplotype-based analysis can be very inefficient for fine mapping, compared with an analysis based directly on the genotypes.
Collapse
Affiliation(s)
- A P Morris
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, United Kingdom.
| | | | | |
Collapse
|
28
|
Morris AP, Whittaker JC, Xu CF, Hosking LK, Balding DJ. Multipoint linkage-disequilibrium mapping narrows location interval and identifies mutation heterogeneity. Proc Natl Acad Sci U S A 2003; 100:13442-6. [PMID: 14597696 PMCID: PMC263833 DOI: 10.1073/pnas.2235031100] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Single-nucleotide polymorphism (SNP) genotypes were recently examined in an 890-kb region flanking the human gene CYP2D6. Single-marker and haplotype-based analyses identified, with genomewide significance (P < 10-7), a 403-kb interval displaying strong linkage disequilibrium (LD) with predicted poor-metabolizer phenotype. However, the width of this interval makes the location of causal variants difficult: for example, the interval contains seven known or predicted genes in addition to CYP2D6. We have developed the Bayesian fine-mapping software coldmap, which, applied to these genotype data, yields a 95% location interval covering only 185 kb and establishes genomewide significance for a causal locus within the region. Strikingly, our interval correctly excludes four SNPs, which individually display association with genomewide significance, including the SNP showing strongest LD (P < 10-34). In addition, coldmap distinguishes homozygous cases for the major CYP2D6 mutation from those bearing minor mutations. We further investigate a selection of SNP subsets and find that previously reported methods lead to a 38% savings in SNPs at the cost of an increase of <20% in the width of the location interval.
Collapse
Affiliation(s)
- Andrew P Morris
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, United Kingdom
| | | | | | | | | |
Collapse
|
29
|
Meng Z, Zaykin DV, Xu CF, Wagner M, Ehm MG. Selection of genetic markers for association analyses, using linkage disequilibrium and haplotypes. Am J Hum Genet 2003; 73:115-30. [PMID: 12796855 PMCID: PMC1180574 DOI: 10.1086/376561] [Citation(s) in RCA: 104] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2003] [Accepted: 04/24/2003] [Indexed: 11/04/2022] Open
Abstract
The genotyping of closely spaced single-nucleotide polymorphism (SNP) markers frequently yields highly correlated data, owing to extensive linkage disequilibrium (LD) between markers. The extent of LD varies widely across the genome and drives the number of frequent haplotypes observed in small regions. Several studies have illustrated the possibility that LD or haplotype data could be used to select a subset of SNPs that optimize the information retained in a genomic region while reducing the genotyping effort and simplifying the analysis. We propose a method based on the spectral decomposition of the matrices of pairwise LD between markers, and we select markers on the basis of their contributions to the total genetic variation. We also modify Clayton's "haplotype tagging SNP" selection method, which utilizes haplotype information. For both methods, we propose sliding window-based algorithms that allow the methods to be applied to large chromosomal regions. Our procedures require genotype information about a small number of individuals for an initial set of SNPs and selection of an optimum subset of SNPs that could be efficiently genotyped on larger numbers of samples while retaining most of the genetic variation in samples. We identify suitable parameter combinations for the procedures, and we show that a sample size of 50-100 individuals achieves consistent results in studies of simulated data sets in linkage equilibrium and LD. When applied to experimental data sets, both procedures were similarly effective at reducing the genotyping requirement while maintaining the genetic information content throughout the regions. We also show that haplotype-association results that Hosking et al. obtained near CYP2D6 were almost identical before and after marker selection.
Collapse
Affiliation(s)
- Zhaoling Meng
- Bioinformatics Research Center, Campus Box 7566, North Carolina State University, Raleigh, NC 27695-7566, USA.
| | | | | | | | | |
Collapse
|
30
|
Sheffield LJ. The hunt for new genes and polymorphisms that can control the response to drugs. Pharmacogenomics 2002; 3:679-86. [PMID: 12223052 DOI: 10.1517/14622416.3.5.679] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
There has been a great increase in the knowledge of understanding the genetic basis for individual variation in response to drugs. The study of variation in gene structure (polymorphism) can now predict the likely metabolic behavior in an individual for a number of drugs. This review documents the different strategies that can be used to find new genes and polymorphisms within these genes. Candidate genes can be used in case-control studies or studies where the parents of the person having an adverse effect from the drug are used as controls. New genes are being discovered in the drug development process and the technological development in molecular biology is expected to greatly enhance knowledge of the genes that regulate drug metabolism.
Collapse
Affiliation(s)
- Leslie J Sheffield
- Genetic Health Services Victoria, Murdoch Childrens Research Institute, University of Melbourne, Melbourne, Australia.
| |
Collapse
|