1
|
Lau W, Ali A, Maude H, Andrew T, Swallow DM, Maniatis N. The hazards of genotype imputation when mapping disease susceptibility variants. Genome Biol 2024; 25:7. [PMID: 38172955 PMCID: PMC10763476 DOI: 10.1186/s13059-023-03140-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 12/04/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND The cost-free increase in statistical power of using imputation to infer missing genotypes is undoubtedly appealing, but is it hazard-free? This case study of three type-2 diabetes (T2D) loci demonstrates that it is not; it sheds light on why this is so and raises concerns as to the shortcomings of imputation at disease loci, where haplotypes differ between cases and reference panel. RESULTS T2D-associated variants were previously identified using targeted sequencing. We removed these significantly associated SNPs and used neighbouring SNPs to infer them by imputation. We compared imputed with observed genotypes, examined the altered pattern of T2D-SNP association, and investigated the cause of imputation errors by studying haplotype structure. Most T2D variants were incorrectly imputed with a low density of scaffold SNPs, but the majority failed to impute even at high density, despite obtaining high certainty scores. Missing and discordant imputation errors, which were observed disproportionately for the risk alleles, produced monomorphic genotype calls or false-negative associations. We show that haplotypes carrying risk alleles are considerably more common in the T2D cases than the reference panel, for all loci. CONCLUSIONS Imputation is not a panacea for fine mapping, nor for meta-analysing multiple GWAS based on different arrays and different populations. A total of 80% of the SNPs we have tested are not included in array platforms, explaining why these and other such associated variants may previously have been missed. Regardless of the choice of software and reference haplotypes, imputation drives genotype inference towards the reference panel, introducing errors at disease loci.
Collapse
Affiliation(s)
- Winston Lau
- Department of Genetics, Evolution and Environment, UCL Genetics Institute, University College London, London, UK
| | - Aminah Ali
- Department of Genetics, Evolution and Environment, UCL Genetics Institute, University College London, London, UK
| | - Hannah Maude
- Department of Metabolism, Digestion and Reproduction, Section of Genetics and Genomics, London, UK
| | - Toby Andrew
- Department of Metabolism, Digestion and Reproduction, Section of Genetics and Genomics, London, UK
| | - Dallas M Swallow
- Department of Genetics, Evolution and Environment, UCL Genetics Institute, University College London, London, UK
| | - Nikolas Maniatis
- Department of Genetics, Evolution and Environment, UCL Genetics Institute, University College London, London, UK.
| |
Collapse
|
2
|
Wang S, Zhang X, Qiang G, Wang J. DelInsCaller: An Efficient Algorithm for Identifying Delins and Estimating Haplotypes from Long Reads with High Level of Sequencing Errors. Genes (Basel) 2022; 14:4. [PMID: 36672745 PMCID: PMC9858578 DOI: 10.3390/genes14010004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 10/27/2022] [Accepted: 10/28/2022] [Indexed: 12/24/2022] Open
Abstract
Delins, as known as complex indel, is a combined genomic structural variation formed by deleting and inserting DNA fragments at a common genomic location. Recent studies emphasized the importance of delins in cancer diagnosis and treatment. Although the long reads from PacBio CLR sequencing significantly facilitate delins calling, the existing approaches still encounter computational challenges from the high level of sequencing errors, and often introduce errors in genotyping and phasing delins. In this paper, we propose an efficient algorithmic pipeline, named delInsCaller, to identify delins on haplotype resolution from the PacBio CLR sequencing data. delInsCaller design a fault-tolerant method by calculating a variation density score, which helps to locate the candidate mutational regions under a high-level of sequencing errors. It adopts a base association-based contig splicing method, which facilitates contig splicing in the presence of false-positive interference. We conducted a series of experiments on simulated datasets, and the results showed that delInsCaller outperformed several state-of-the-art approaches, e.g., SVseq3, across a wide range of parameter settings, such as read depth, sequencing error rates, etc. delInsCaller often obtained higher f-measures than other approaches; specifically, it was able to maintain advantages at ~15% sequencing errors. delInsCaller was able to significantly improve the N50 values with almost no loss of haplotype accuracy compared with the existing approach as well.
Collapse
Affiliation(s)
- Shenjie Wang
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an 710049, China
| | - Xuanping Zhang
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an 710049, China
| | - Geng Qiang
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an 710049, China
| | - Jiayin Wang
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an 710049, China
| |
Collapse
|
3
|
From Mendel to quantitative genetics in the genome era: the scientific legacy of W. G. Hill. Nat Genet 2022; 54:934-939. [PMID: 35817969 DOI: 10.1038/s41588-022-01103-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 05/18/2022] [Indexed: 11/08/2022]
Abstract
The quantitative geneticist W. G. ('Bill') Hill, awardee of the 2018 Darwin Medal of the Royal Society and the 2019 Mendel Medal of the Genetics Society (United Kingdom), died on 17 December 2021 at the age of 81 years. Here, we pay tribute to his multiple key scientific contributions, which span population and evolutionary genetics, animal and plant breeding and human genetics. We discuss his theoretical research on the role of linkage disequilibrium (LD) and mutational variance in the response to selection, the origin of the widely used LD metric r2 in genomic association studies, the genetic architecture of complex traits, the quantification of the variation in realized relationships given a pedigree relationship and much more. We demonstrate that basic theoretical research in quantitative and statistical genetics has led to profound insights into the genetics and evolution of complex traits and made predictions that were subsequently empirically validated, often decades later.
Collapse
|
4
|
Kang JTL, Rosenberg NA. Mathematical Properties of Linkage Disequilibrium Statistics Defined by Normalization of the Coefficient D = pAB - pApB. Hum Hered 2020; 84:127-143. [PMID: 32045910 DOI: 10.1159/000504171] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 10/10/2019] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Many statistics for measuring linkage disequilibrium (LD) take the form of a normalization of the LD coefficient D. Different normalizations produce statistics with different ranges, interpretations, and arguments favoring their use. METHODS Here, to compare the mathematical properties of these normalizations, we consider 5 of these normalized statistics, describing their upper bounds, the mean values of their maxima over the set of possible allele frequency pairs, and the size of the allele frequency regions accessible given specified values of the statistics. RESULTS We produce detailed characterizations of these properties for the statistics d and ρ, analogous to computations previously performed for r2. We examine the relationships among the statistics, uncovering conditions under which some of them have close connections. CONCLUSION The results contribute insight into LD measurement, particularly the understanding of differences in the features of different LD measures when computed on the same data.
Collapse
Affiliation(s)
- Jonathan T L Kang
- Department of Biology, Stanford University, Stanford, California, USA,
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, California, USA
| |
Collapse
|
5
|
Awika HO, Marconi TG, Bedre R, Mandadi KK, Avila CA. Minor alleles are associated with white rust ( Albugo occidentalis) susceptibility in spinach ( Spinacia oleracea). HORTICULTURE RESEARCH 2019; 6:129. [PMID: 31814982 PMCID: PMC6885047 DOI: 10.1038/s41438-019-0214-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2019] [Revised: 10/29/2019] [Accepted: 11/03/2019] [Indexed: 05/05/2023]
Abstract
Minor alleles (MA) have been associated with disease incidence in human studies, enabling the identification of diagnostic risk factors for various diseases. However, allelic mapping has rarely been performed in plant systems. The goal of this study was to determine whether a difference in MA prevalence is a strong enough risk factor to indicate a likely significant difference in disease resistance against white rust (WR; Albugo occidentalis) in spinach (Spinacia oleracea). We used WR disease severity ratings (WR-DSRs) in a diversity panel of 267 spinach accessions to define resistant- and susceptibility-associated groups within the distribution scores and then tested the single-nucleotide polymorphism (SNP) variants to interrogate the MA prevalence in the most susceptible (MS) vs. most resistant (MR) individuals using permutation-based allelic association tests. A total of 448 minor alleles associated with WR severity were identified in the comparison between the 25% MS and the 25% MR accessions, while the MA were generally similar between the two halves of the interquartile range. The minor alleles in the MS group were distributed across all six chromosomes and made up ~71% of the markers that were also strongly associated with WR in parallel performed genome-wide association study. These results indicate that susceptibility may be highly determined by the disproportionate overrepresentation of minor alleles, which could be used to select for resistant plants. Furthermore, by focusing on the distribution tails, allelic mapping could be used to identify plant markers associated with quantitative traits on the most informative segments of the phenotypic distribution.
Collapse
Affiliation(s)
- Henry O. Awika
- Texas A&M AgriLife Research and Extension Center, Weslaco, TX 78596 USA
| | - Thiago G. Marconi
- Texas A&M AgriLife Research and Extension Center, Weslaco, TX 78596 USA
| | - Renesh Bedre
- Texas A&M AgriLife Research and Extension Center, Weslaco, TX 78596 USA
| | - Kranthi K. Mandadi
- Texas A&M AgriLife Research and Extension Center, Weslaco, TX 78596 USA
- Department of Plant Pathology and Microbiology, College Station, TX 77843 USA
| | - Carlos A. Avila
- Texas A&M AgriLife Research and Extension Center, Weslaco, TX 78596 USA
- Department of Horticultural Sciences, Texas A&M University, College Station, TX 77843 USA
| |
Collapse
|
6
|
Sved JA, Hill WG. One Hundred Years of Linkage Disequilibrium. Genetics 2018; 209:629-636. [PMID: 29967057 PMCID: PMC6028242 DOI: 10.1534/genetics.118.300642] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Accepted: 04/15/2018] [Indexed: 11/18/2022] Open
Abstract
One hundred years ago, the first population genetic calculations were made for two loci. They indicated that populations should settle down to a state where the frequency of an allele at one locus is independent of the frequency of an allele at a second locus, even if these loci are linked. Fifty years later it was realized what is obvious in retrospect, that these calculations ignored the effect of chance segregation of linked loci, an effect now widely recognized following the association of closely linked markers (SNPs) with rare genetic diseases. Linkage disequilibrium is now accepted as the norm for closely linked loci, leading to powerful applications in the mapping of disease alleles and quantitative trait loci, in the detection of sites of selection in the human genome, in the application of genomic prediction of quantitative traits in animal and plant breeding, in the estimation of population size, and in the dating of population divergence.
Collapse
Affiliation(s)
- John A Sved
- Evolution and Ecology Research Centre, University of New South Wales, Sydney, 2052, Australia
| | - William G Hill
- Institute of Evolutionary Biology, University of Edinburgh, EH9 3FL, United Kingdom
| |
Collapse
|
7
|
Pengelly RJ, Gheyas AA, Kuo R, Mossotto E, Seaby EG, Burt DW, Ennis S, Collins A. Commercial chicken breeds exhibit highly divergent patterns of linkage disequilibrium. Heredity (Edinb) 2016; 117:375-382. [PMID: 27381324 DOI: 10.1038/hdy.2016.47] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2016] [Revised: 05/10/2016] [Accepted: 05/19/2016] [Indexed: 02/06/2023] Open
Abstract
The analysis of linkage disequilibrium (LD) underpins the development of effective genotyping technologies, trait mapping and understanding of biological mechanisms such as those driving recombination and the impact of selection. We apply the Malécot-Morton model of LD to create additive LD maps that describe the high-resolution LD landscape of commercial chickens. We investigated LD in chickens (Gallus gallus) at the highest resolution to date for broiler, white egg and brown egg layer commercial lines. There is minimal concordance between breeds of fine-scale LD patterns (correlation coefficient <0.21), and even between discrete broiler lines. Regions of LD breakdown, which may align with recombination hot spots, are enriched near CpG islands and transcription start sites (P<2.2 × 10-16), consistent with recent evidence described in finches, but concordance in hot spot locations between commercial breeds is only marginally greater than random. As in other birds, functional elements in the chicken genome are associated with recombination but, unlike evidence from other bird species, the LD landscape is not stable in the populations studied. The development of optimal genotyping panels for genome-led selection programmes will depend on careful analysis of the LD structure of each line of interest. Further study is required to fully elucidate the mechanisms underlying highly divergent LD patterns found in commercial chickens.
Collapse
Affiliation(s)
- R J Pengelly
- Human Genetics and Genomic Medicine, Faculty of Medicine, University of Southampton, Southampton, UK
| | - A A Gheyas
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - R Kuo
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - E Mossotto
- Human Genetics and Genomic Medicine, Faculty of Medicine, University of Southampton, Southampton, UK
| | - E G Seaby
- Human Genetics and Genomic Medicine, Faculty of Medicine, University of Southampton, Southampton, UK
| | - D W Burt
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK
| | - S Ennis
- Human Genetics and Genomic Medicine, Faculty of Medicine, University of Southampton, Southampton, UK
| | - A Collins
- Human Genetics and Genomic Medicine, Faculty of Medicine, University of Southampton, Southampton, UK
| |
Collapse
|
8
|
A Scale-Corrected Comparison of Linkage Disequilibrium Levels between Genic and Non-Genic Regions. PLoS One 2015; 10:e0141216. [PMID: 26517830 PMCID: PMC4627745 DOI: 10.1371/journal.pone.0141216] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Accepted: 10/06/2015] [Indexed: 12/27/2022] Open
Abstract
The understanding of non-random association between loci, termed linkage disequilibrium (LD), plays a central role in genomic research. Since causal mutations are generally not included in genomic marker data, LD between those and available markers is essential for capturing the effects of causal loci on localizing genes responsible for traits. Thus, the interpretation of association studies requires a detailed knowledge of LD patterns. It is well known that most LD measures depend on minor allele frequencies (MAF) of the considered loci and the magnitude of LD is influenced by the physical distances between loci. In the present study, a procedure to compare the LD structure between genomic regions comprising several markers each is suggested. The approach accounts for different scaling factors, namely the distribution of MAF, the distribution of pair-wise differences in MAF, and the physical extent of compared regions, reflected by the distribution of pair-wise physical distances. In the first step, genomic regions are matched based on similarity in these scaling factors. In the second step, chromosome- and genome-wide significance tests for differences in medians of LD measures in each pair are performed. The proposed framework was applied to test the hypothesis that the average LD is different in genic and non-genic regions. This was tested with a genome-wide approach with data sets for humans (Homo sapiens), a highly selected chicken line (Gallus gallus domesticus) and the model plant Arabidopsis thaliana. In all three data sets we found a significantly higher level of LD in genic regions compared to non-genic regions. About 31% more LD was detected genome-wide in genic compared to non-genic regions in Arabidopsis thaliana, followed by 13.6% in human and 6% chicken. Chromosome-wide comparison discovered significant differences on all 5 chromosomes in Arabidopsis thaliana and on one third of the human and of the chicken chromosomes.
Collapse
|
9
|
Pengelly RJ, Tapper W, Gibson J, Knut M, Tearle R, Collins A, Ennis S. Whole genome sequences are required to fully resolve the linkage disequilibrium structure of human populations. BMC Genomics 2015; 16:666. [PMID: 26335686 PMCID: PMC4558963 DOI: 10.1186/s12864-015-1854-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Accepted: 08/17/2015] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND An understanding of linkage disequilibrium (LD) structures in the human genome underpins much of medical genetics and provides a basis for disease gene mapping and investigating biological mechanisms such as recombination and selection. Whole genome sequencing (WGS) provides the opportunity to determine LD structures at maximal resolution. RESULTS We compare LD maps constructed from WGS data with LD maps produced from the array-based HapMap dataset, for representative European and African populations. WGS provides up to 5.7-fold greater SNP density than array-based data and achieves much greater resolution of LD structure, allowing for identification of up to 2.8-fold more regions of intense recombination. The absence of ascertainment bias in variant genotyping improves the population representativeness of the WGS maps, and highlights the extent of uncaptured variation using array genotyping methodologies. The complete capture of LD patterns using WGS allows for higher genome-wide association study (GWAS) power compared to array-based GWAS, with WGS also allowing for the analysis of rare variation. The impact of marker ascertainment issues in arrays has been greatest for Sub-Saharan African populations where larger sample sizes and substantially higher marker densities are required to fully resolve the LD structure. CONCLUSIONS WGS provides the best possible resource for LD mapping due to the maximal marker density and lack of ascertainment bias. WGS LD maps provide a rich resource for medical and population genetics studies. The increasing availability of WGS data for large populations will allow for improved research utilising LD, such as GWAS and recombination biology studies.
Collapse
Affiliation(s)
- Reuben J Pengelly
- Human Genetics & Genomic Medicine, Faculty of Medicine, University of Southampton, Duthie Building (MP 808), Tremona Road, Southampton, SO16 6YD, UK.
| | - William Tapper
- Human Genetics & Genomic Medicine, Faculty of Medicine, University of Southampton, Duthie Building (MP 808), Tremona Road, Southampton, SO16 6YD, UK.
| | - Jane Gibson
- Centre for Biological Sciences, Faculty of Natural & Environmental Sciences, University of Southampton, Southampton, UK.
| | - Marcin Knut
- Human Genetics & Genomic Medicine, Faculty of Medicine, University of Southampton, Duthie Building (MP 808), Tremona Road, Southampton, SO16 6YD, UK.
| | - Rick Tearle
- Complete Genomics, Inc., Mountain View, CA, USA.
| | - Andrew Collins
- Human Genetics & Genomic Medicine, Faculty of Medicine, University of Southampton, Duthie Building (MP 808), Tremona Road, Southampton, SO16 6YD, UK.
| | - Sarah Ennis
- Human Genetics & Genomic Medicine, Faculty of Medicine, University of Southampton, Duthie Building (MP 808), Tremona Road, Southampton, SO16 6YD, UK.
| |
Collapse
|
10
|
Lin CY, Xing G, Ku HC, Elston RC, Xing C. Enhancing the power to detect low-frequency variants in genome-wide screens. Genetics 2014; 196:1293-302. [PMID: 24496013 PMCID: PMC3982702 DOI: 10.1534/genetics.113.160739] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2013] [Accepted: 01/26/2014] [Indexed: 11/18/2022] Open
Abstract
In genetic association studies a conventional test statistic is proportional to the correlation coefficient between the trait and the variant, with the result that it lacks power to detect association for low-frequency variants. Considering the link between the conventional association test statistics and the linkage disequilibrium measure r(2), we propose a test statistic analogous to the standardized linkage disequilibrium D' to increase the power of detecting association for low-frequency variants. By both simulation and real data analysis we show that the proposed D' test is more powerful than the conventional methods for detecting association for low-frequency variants in a genome-wide setting. The optimal coding strategy for the D' test and its asymptotic properties are also investigated. In summary, we advocate using the D' test in a dominant model as a complementary approach to enhancing the power of detecting association for low-frequency variants with moderate to large effect sizes in case-control genome-wide association studies.
Collapse
Affiliation(s)
- Chang-Yun Lin
- McDermott Center of Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas 75390
- Department of Applied Mathematics and Institute of Statistics, National Chung Hsing University, Taichung, Taiwan
| | - Guan Xing
- Bristol-Myers Squibb Company, Pennington, New Jersey 08534
| | - Hung-Chih Ku
- McDermott Center of Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas 75390
| | - Robert C. Elston
- Department of Epidemiology and Biostatistics, Case Western Reserve University School of Medicine, Cleveland, Ohio 44106
| | - Chao Xing
- McDermott Center of Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas 75390
| |
Collapse
|
11
|
Corbin LJ, Kranis A, Blott SC, Swinburne JE, Vaudin M, Bishop SC, Woolliams JA. The utility of low-density genotyping for imputation in the Thoroughbred horse. Genet Sel Evol 2014; 46:9. [PMID: 24495673 PMCID: PMC3930001 DOI: 10.1186/1297-9686-46-9] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2013] [Accepted: 12/20/2013] [Indexed: 12/21/2022] Open
Abstract
Background Despite the dramatic reduction in the cost of high-density genotyping that has occurred over the last decade, it remains one of the limiting factors for obtaining the large datasets required for genomic studies of disease in the horse. In this study, we investigated the potential for low-density genotyping and subsequent imputation to address this problem. Results Using the haplotype phasing and imputation program, BEAGLE, it is possible to impute genotypes from low- to high-density (50K) in the Thoroughbred horse with reasonable to high accuracy. Analysis of the sources of variation in imputation accuracy revealed dependence both on the minor allele frequency of the single nucleotide polymorphisms (SNPs) being imputed and on the underlying linkage disequilibrium structure. Whereas equidistant spacing of the SNPs on the low-density panel worked well, optimising SNP selection to increase their minor allele frequency was advantageous, even when the panel was subsequently used in a population of different geographical origin. Replacing base pair position with linkage disequilibrium map distance reduced the variation in imputation accuracy across SNPs. Whereas a 1K SNP panel was generally sufficient to ensure that more than 80% of genotypes were correctly imputed, other studies suggest that a 2K to 3K panel is more efficient to minimize the subsequent loss of accuracy in genomic prediction analyses. The relationship between accuracy and genotyping costs for the different low-density panels, suggests that a 2K SNP panel would represent good value for money. Conclusions Low-density genotyping with a 2K SNP panel followed by imputation provides a compromise between cost and accuracy that could promote more widespread genotyping, and hence the use of genomic information in horses. In addition to offering a low cost alternative to high-density genotyping, imputation provides a means to combine datasets from different genotyping platforms, which is becoming necessary since researchers are starting to use the recently developed equine 70K SNP chip. However, more work is needed to evaluate the impact of between-breed differences on imputation accuracy.
Collapse
Affiliation(s)
| | | | | | | | | | | | - John A Woolliams
- Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian EH25 9RG, UK.
| |
Collapse
|
12
|
Abstract
Systems involving many variables are important in population and quantitative genetics, for example, in multi-trait prediction of breeding values and in exploration of multi-locus associations. We studied departures of the joint distribution of sets of genetic variables from independence. New measures of association based on notions of statistical distance between distributions are presented. These are more general than correlations, which are pairwise measures, and lack a clear interpretation beyond the bivariate normal distribution. Our measures are based on logarithmic (Kullback-Leibler) and on relative ‘distances’ between distributions. Indexes of association are developed and illustrated for quantitative genetics settings in which the joint distribution of the variables is either multivariate normal or multivariate-t, and we show how the indexes can be used to study linkage disequilibrium in a two-locus system with multiple alleles and present applications to systems of correlated beta distributions. Two multivariate beta and multivariate beta-binomial processes are examined, and new distributions are introduced: the GMS-Sarmanov multivariate beta and its beta-binomial counterpart.
Collapse
Affiliation(s)
- Daniel Gianola
- Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI, 53706, USA.
| | | | | |
Collapse
|
13
|
Wray NR. Allele Frequencies and ther2Measure of Linkage Disequilibrium: Impact on Design and Interpretation of Association Studies. Twin Res Hum Genet 2012. [DOI: 10.1375/twin.8.2.87] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
AbstractThe design and interpretation of genetic association studies depends on the relationship between the genotyped variants and the underlying functional variant, often parameterized as the squared correlation orr2measure of linkage disequilibrium between two loci. While it has long been recognized that placing a constraint on ther2between two loci also places a constraint on the difference in frequencies between the coupled alleles, this constraint has not been quantified. Here, quantification of this severe constraint is presented. For example, forr2≥ .8, the maximum difference in allele frequency is ± .06 which occurs when one locus has allele frequency .5. Forr2≥ .8 and allele frequency at one locus of .1, the maximum difference in allele frequency at the second locus is only ± .02. The impact on the design and interpretation of association studies is discussed.
Collapse
|
14
|
Elding H, Lau W, Swallow D, Maniatis N. Dissecting the genetics of complex inheritance: linkage disequilibrium mapping provides insight into Crohn disease. Am J Hum Genet 2011; 89:798-805. [PMID: 22152681 PMCID: PMC3234369 DOI: 10.1016/j.ajhg.2011.11.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2011] [Revised: 10/24/2011] [Accepted: 11/08/2011] [Indexed: 12/21/2022] Open
Abstract
Family studies for Crohn disease (CD) report extensive linkage on chromosome 16q and pinpoint NOD2 as a possible causative locus. However, linkage is also observed in families that do not bear the most frequent NOD2 causative mutations, but no other signals on 16q have been found so far in published genome-wide association studies. Our aim is to identify this missing genetic contribution. We apply a powerful genetic mapping approach to the Wellcome Trust Case-Control Consortium and the National Institute of Diabetes and Digestive and Kidney Diseases genome-wide association data on CD. This method takes into account the underlying structure of linkage disequilibrium (LD) by using genetic distances from LD maps and provides a location for the causal agent. We find genetic heterogeneity within the NOD2 locus and also show an independent and unsuspected involvement of the neighboring gene, CYLD. We find associations with the IRF8 region and the region containing CDH1 and CDH3, as well as substantial phenotypic and genetic heterogeneity for CD itself. The genes are known to be involved in inflammation and immune dysregulation. These findings provide insight into the genetics of CD and suggest promising directions for understanding disease heterogeneity. The application of this method thus paves the way for understanding complex inheritance in general, leading to the dissection of different pathways and ultimately, personalized treatment.
Collapse
|
15
|
Zapata C. On the uses and applications of the most commonly used measures of linkage disequilibrium from the comparative analysis of their statistical properties. Hum Hered 2011; 71:186-95. [PMID: 21778738 DOI: 10.1159/000327732] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2010] [Accepted: 03/22/2011] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND/OBJECTIVE The analysis of linkage disequilibrium is relevant for the exploration of the structure and evolution of genomes and for the gene mapping of quantitative characters and human diseases. The strength of linkage disequilibrium between diallelic loci is commonly measured by the coefficients D' and r. Recent studies suggest that r is more useful than D' as a general measure of the strength of disequilibrium because it provides much more precise (lower sampling variance) and accurate (lower bias) estimates of disequilibrium. We compared for the first time the statistical properties of D' and r taking into account their differences in range. METHODS The sampling properties of D' and r were evaluated by simulation under a variety of realistic population conditions and varying sample sizes using standardised statistics that allow for comparisons of the precision, accuracy and efficiency of estimates with different ranges. RESULTS Simulations revealed that estimates of r do not tend to be significantly more precise, accurate or efficient than those of D' when compared by means of standardised statistics. CONCLUSION The supposed advantage of r over D' based on direct comparisons of their sampling distributions is more apparent than real. The obtained results are useful to assess the uses and applications of these widely used disequilibrium measures.
Collapse
Affiliation(s)
- Carlos Zapata
- Departamento de Genética, Universidad de Santiago, Santiago de Compostela, Spain.
| |
Collapse
|
16
|
Carregaro F, Carta A, Cordeiro JA, Lobo SM, Silva EHTD, Leopoldino AM. Polymorphisms IL10-819 and TLR-2 are potentially associated with sepsis in Brazilian patients. Mem Inst Oswaldo Cruz 2010; 105:649-56. [DOI: 10.1590/s0074-02762010000500008] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2009] [Accepted: 05/12/2010] [Indexed: 11/21/2022] Open
|
17
|
Stapley J, Birkhead TR, Burke T, Slate J. Pronounced inter- and intrachromosomal variation in linkage disequilibrium across the zebra finch genome. Genome Res 2010; 20:496-502. [PMID: 20357051 DOI: 10.1101/gr.102095.109] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
The extent of nonrandom association of alleles at two or more loci, termed linkage disequilibrium (LD), can reveal much about population demography, selection, and recombination rate, and is a key consideration when designing association mapping studies. Here, we describe a genome-wide analysis of LD in the zebra finch (Taeniopygia guttata) using 838 single nucleotide polymorphisms and present LD maps for all assembled chromosomes. We found that LD declined with physical distance approximately five times faster on the microchromosomes compared to macrochromosomes. The distribution of LD across individual macrochromosomes also varied in a distinct pattern. In the center of the macrochromosomes there were large blocks of markers, sometimes spanning tens of mega bases, in strong LD whereas on the ends of macrochromosomes LD declined more rapidly. Regions of high LD were not simply the result of suppressed recombination around the centromere and this pattern has not been observed previously in other taxa. We also found evidence that this pattern of LD has remained stable across many generations. The variability in LD between and within chromosomes has important implications for genome wide association studies in birds and for our understanding of the distribution of recombination events and the processes that govern them.
Collapse
Affiliation(s)
- Jessica Stapley
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield S10 2TN, UK.
| | | | | | | |
Collapse
|
18
|
Abstract
Abstract Linkage disequilibrium (LD), the association in populations between genes at linked loci, has achieved a high degree of prominence in recent years, primarily because of its use in identifying and cloning genes of medical importance. The field has recently been reviewed by Slatkin (2008). The present article is largely devoted to a review of the theory of LD in populations, including historical aspects.
Collapse
Affiliation(s)
- John A Sved
- School of Biological Sciences, University of Sydney, Australia.
| |
Collapse
|
19
|
Terwilliger JD, Hiekkalinna T. An utter refutation of the "fundamental theorem of the HapMap". Eur J Hum Genet 2009; 14:426-37. [PMID: 16479260 DOI: 10.1038/sj.ejhg.5201583] [Citation(s) in RCA: 107] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
The International HapMap Project was proposed in order to quantify linkage disequilibrium (LD) relationships among human DNA polymorphisms in an assortment of populations, in order to facilitate the process of selecting a minimal set of markers that could capture most of the signal from the untyped markers in a genome-wide association study. The central dogma can be summarized by the argument that if a marker is in tight LD with a polymorphism that directly impacts disease risk, as measured by the metric r(2), then one would be able to detect an association between the marker and disease with sample size that was increased by a factor of 1/r(2) over that needed to detect the effect of the functional variant directly. This "fundamental theorem" holds, however, only if one assumes that the LD between loci and the etiological effect of the functional variant are independent of each other, that they are statistically independent of all other etiological factors (in exposure and action), that sampling is prospective, and that the estimates of r(2) are accurate. None of these are standard operating assumptions, however. We describe the ramifications of these implicit assumptions, and provide simple examples in which the effects of a functional variant could be unequivocally detected if it were directly genotyped, even as markers in high LD with the functional variant would never show association with disease, even in infinite sample sizes. Both theoretical and empirical refutation of the central dogma of genome-wide association studies is thus presented.
Collapse
|
20
|
Lu TT, Lao O, Nothnagel M, Junge O, Freitag-Wolf S, Caliebe A, Balascakova M, Bertranpetit J, Bindoff LA, Comas D, Holmlund G, Kouvatsi A, Macek M, Mollet I, Nielsen F, Parson W, Palo J, Ploski R, Sajantila A, Tagliabracci A, Gether U, Werge T, Rivadeneira F, Hofman A, Uitterlinden AG, Gieger C, Wichmann HE, Ruether A, Schreiber S, Becker C, Nürnberg P, Nelson MR, Kayser M, Krawczak M. An evaluation of the genetic-matched pair study design using genome-wide SNP data from the European population. Eur J Hum Genet 2009; 17:967-75. [PMID: 19156175 DOI: 10.1038/ejhg.2008.266] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Genetic matching potentially provides a means to alleviate the effects of incomplete Mendelian randomization in population-based gene-disease association studies. We therefore evaluated the genetic-matched pair study design on the basis of genome-wide SNP data (309,790 markers; Affymetrix GeneChip Human Mapping 500K Array) from 2457 individuals, sampled at 23 different recruitment sites across Europe. Using pair-wise identity-by-state (IBS) as a matching criterion, we tried to derive a subset of markers that would allow identification of the best overall matching (BOM) partner for a given individual, based on the IBS status for the subset alone. However, our results suggest that, by following this approach, the prediction accuracy is only notably improved by the first 20 markers selected, and increases proportionally to the marker number thereafter. Furthermore, in a considerable proportion of cases (76.0%), the BOM of a given individual, based on the complete marker set, came from a different recruitment site than the individual itself. A second marker set, specifically selected for ancestry sensitivity using singular value decomposition, performed even more poorly and was no more capable of predicting the BOM than randomly chosen subsets. This leads us to conclude that, at least in Europe, the utility of the genetic-matched pair study design depends critically on the availability of comprehensive genotype information for both cases and controls.
Collapse
Affiliation(s)
- Timothy Tehua Lu
- Institut für Medizinische Informatik und Statistik, Christian-Albrechts-Universität Kiel, Kiel, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Gorroochurn P. Perils in the Use of Linkage Disequilibrium for Fine Gene Mapping: Simple Insights from Population Genetics. Cancer Epidemiol Biomarkers Prev 2008; 17:3292-7. [DOI: 10.1158/1055-9965.epi-08-0717] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
22
|
Allelic association: linkage disequilibrium structure and gene mapping. Mol Biotechnol 2008; 41:83-9. [PMID: 18841501 DOI: 10.1007/s12033-008-9110-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2008] [Accepted: 09/12/2008] [Indexed: 10/21/2022]
Abstract
The linkage disequilibrium (LD) structure of the human genome is now well understood and characterised for a number of human populations. The LD structure underpins the design and execution of candidate gene and genome-wide association mapping studies. Successful association mapping studies completed to date provide vital new insights into the genetic influences on common diseases, such as diabetes, some cancers and heart disease. The LD structure also presents new avenues of research into the genetic history of human populations, the effects of natural selection and the impact of recombination on the genomic landscape. This review introduces this exciting and complex field by encompassing this range of topics.
Collapse
|
23
|
Pattaro C, Ruczinski I, Fallin DM, Parmigiani G. Haplotype block partitioning as a tool for dimensionality reduction in SNP association studies. BMC Genomics 2008; 9:405. [PMID: 18759977 PMCID: PMC2547855 DOI: 10.1186/1471-2164-9-405] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2007] [Accepted: 08/29/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Identification of disease-related genes in association studies is challenged by the large number of SNPs typed. To address the dilution of power caused by high dimensionality, and to generate results that are biologically interpretable, it is critical to take into consideration spatial correlation of SNPs along the genome. With the goal of identifying true genetic associations, partitioning the genome according to spatial correlation can be a powerful and meaningful way to address this dimensionality problem. RESULTS We developed and validated an MCMC Algorithm To Identify blocks of Linkage DisEquilibrium (MATILDE) for clustering contiguous SNPs, and a statistical testing framework to detect association using partitions as units of analysis. We compared its ability to detect true SNP associations to that of the most commonly used algorithm for block partitioning, as implemented in the Haploview and HapBlock software. Simulations were based on artificially assigning phenotypes to individuals with SNPs corresponding to region 14q11 of the HapMap database. When block partitioning is performed using MATILDE, the ability to correctly identify a disease SNP is higher, especially for small effects, than it is with the alternatives considered. Advantages can be both in terms of true positive findings and limiting the number of false discoveries. Finer partitions provided by LD-based methods or by marker-by-marker analysis are efficient only for detecting big effects, or in presence of large sample sizes. The probabilistic approach we propose offers several additional advantages, including: a) adapting the estimation of blocks to the population, technology, and sample size of the study; b) probabilistic assessment of uncertainty about block boundaries and about whether any two SNPs are in the same block; c) user selection of the probability threshold for assigning SNPs to the same block. CONCLUSION We demonstrate that, in realistic scenarios, our adaptive, study-specific block partitioning approach is as or more efficient than currently available LD-based approaches in guiding the search for disease loci.
Collapse
Affiliation(s)
- Cristian Pattaro
- Unit of Genetic Epidemiology and Biostatistics, Institute of Genetic Medicine, European Academy, Viale Druso 1, I-39100, Bolzano, Italy.
| | | | | | | |
Collapse
|
24
|
VanLiere JM, Rosenberg NA. Mathematical properties of the r2 measure of linkage disequilibrium. Theor Popul Biol 2008; 74:130-7. [PMID: 18572214 DOI: 10.1016/j.tpb.2008.05.006] [Citation(s) in RCA: 113] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2008] [Revised: 05/14/2008] [Accepted: 05/14/2008] [Indexed: 11/28/2022]
Abstract
Statistics for linkage disequilibrium (LD), the non-random association of alleles at two loci, depend on the frequencies of the alleles at the loci under consideration. Here, we examine the r(2) measure of LD and its mathematical relationship to allele frequencies, quantifying the constraints on its maximum value. Assuming independent uniform distributions for the allele frequencies of two biallelic loci, we find that the mean maximum value of r(2) is approximately 0.43051, and that r(2) can exceed a threshold of 4/5 in only approximately 14.232% of the allele frequency space. If one locus is assumed to have known allele frequencies--the situation in an association study in which LD between a known marker locus and an unknown trait locus is of interest--we find that the mean maximum value of r(2) is greatest when the known locus has a minor allele frequency of approximately 0.30131. We find that in 1/4 of the space of allowed values of minor allele frequencies and haplotype frequencies at a pair of loci, the unconstrained maximum r(2) allowing for the possibility of recombination between the loci exceeds the constrained maximum assuming that no recombination has occurred. Finally, we use r(max)(2) to examine the connection between r(2) and the D(') measure of linkage disequilibrium, finding that r(2)/r(max)(2)=D('2) for approximately 72.683% of the space of allowed values of (p(a),p(b),p(ab)). Our results concerning the properties of r(2) have the potential to inform the interpretation of unusual LD behavior and to assist in the design of LD-based association-mapping studies.
Collapse
Affiliation(s)
- Jenna M VanLiere
- Center for Computational Medicine and Biology, University of Michigan, Ann Arbor, MI 48109, USA.
| | | |
Collapse
|
25
|
Yang HC, Hsieh HY, Fann CSJ. Kernel-based association test. Genetics 2008; 179:1057-68. [PMID: 18558654 PMCID: PMC2429859 DOI: 10.1534/genetics.107.084616] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2007] [Accepted: 03/23/2008] [Indexed: 11/18/2022] Open
Abstract
Association mapping (i.e., linkage disequilibrium mapping) is a powerful tool for positional cloning of disease genes. We propose a kernel-based association test (KBAT), which is a composite function of "P-values of single-locus association tests" and "kernel weights related to intermarker distances and/or linkage disequilibria." The KBAT is a general form of some current test statistics. This method can be applied to the study of candidate genes and can scan each chromosome using a moving average procedure. We evaluated the performance of the KBAT through simulation studies that considered evolutionary parameters, disease models, sample sizes, kernel functions, test statistics, window attributes, empirical P-value estimations, and genetic/physical maps. The results showed that the KBAT had a well-controlled false positive rate and high power compared to existing methods. In addition, the KBAT was also applied to analyze a genomewide data set from the Collaborative Study on the Genetics of Alcoholism. Important genes associated with alcoholism dependence were identified. In summary, the merits of the KBAT are multifold: the KBAT is robust against the inclusion of nuisance markers, is invariant to the map scale, and accommodates different types of genomic data, study designs, and study purposes. The proposed methods are packaged in the user-friendly software, KBAT, available at http://www.stat.sinica.edu.tw/hsinchou/genetics/association/KBAT.htm.
Collapse
Affiliation(s)
- Hsin-Chou Yang
- Institute of Statistical Science, Academia Sinica, 128 Academia Rd., Sec. 2, Nankang, Taipei, Taiwan 115.
| | | | | |
Collapse
|
26
|
Khatkar MS, Nicholas FW, Collins AR, Zenger KR, Cavanagh JAL, Barris W, Schnabel RD, Taylor JF, Raadsma HW. Extent of genome-wide linkage disequilibrium in Australian Holstein-Friesian cattle based on a high-density SNP panel. BMC Genomics 2008; 9:187. [PMID: 18435834 PMCID: PMC2386485 DOI: 10.1186/1471-2164-9-187] [Citation(s) in RCA: 160] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2008] [Accepted: 04/24/2008] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND The extent of linkage disequilibrium (LD) within a population determines the number of markers that will be required for successful association mapping and marker-assisted selection. Most studies on LD in cattle reported to date are based on microsatellite markers or small numbers of single nucleotide polymorphisms (SNPs) covering one or only a few chromosomes. This is the first comprehensive study on the extent of LD in cattle by analyzing data on 1,546 Holstein-Friesian bulls genotyped for 15,036 SNP markers covering all regions of all autosomes. Furthermore, most studies in cattle have used relatively small sample sizes and, consequently, may have had biased estimates of measures commonly used to describe LD. We examine minimum sample sizes required to estimate LD without bias and loss in accuracy. Finally, relatively little information is available on comparative LD structures including other mammalian species such as human and mouse, and we compare LD structure in cattle with public-domain data from both human and mouse. RESULTS We computed three LD estimates, D', Dvol and r2, for 1,566,890 syntenic SNP pairs and a sample of 365,400 non-syntenic pairs. Mean D' is 0.189 among syntenic SNPs, and 0.105 among non-syntenic SNPs; mean r2 is 0.024 among syntenic SNPs and 0.0032 among non-syntenic SNPs. All three measures of LD for syntenic pairs decline with distance; the decline is much steeper for r2 than for D' and Dvol. The value of D' and Dvol are quite similar. Significant LD in cattle extends to 40 kb (when estimated as r2) and 8.2 Mb (when estimated as D'). The mean values for LD at large physical distances are close to those for non-syntenic SNPs. Minor allelic frequency threshold affects the distribution and extent of LD. For unbiased and accurate estimates of LD across marker intervals spanning < 1 kb to > 50 Mb, minimum sample sizes of 400 (for D') and 75 (for r2) are required. The bias due to small samples sizes increases with inter-marker interval. LD in cattle is much less extensive than in a mouse population created from crossing inbred lines, and more extensive than in humans. CONCLUSION For association mapping in Holstein-Friesian cattle, for a given design, at least one SNP is required for each 40 kb, giving a total requirement of at least 75,000 SNPs for a low power whole-genome scan (median r2 > 0.19) and up to 300,000 markers at 10 kb intervals for a high power genome scan (median r2 > 0.62). For estimation of LD by D' and Dvol with sufficient precision, a sample size of at least 400 is required, whereas for r2 a minimum sample of 75 is adequate.
Collapse
Affiliation(s)
- Mehar S Khatkar
- Centre for Advanced Technologies in Animal Genetics and Reproduction (ReproGen), University of Sydney, Camden, NSW 2570, Australia.
| | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Li N. The promise of composite likelihood methods for addressing computationally intensive challenges. ADVANCES IN GENETICS 2008; 60:637-654. [PMID: 18358335 DOI: 10.1016/s0065-2660(07)00422-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
High-dimensional genetic data, due to its complex correlation structure, poses an enormous challenge to standard likelihood-based methods for making statistical inference. As an approximation, composite likelihood has proved to be a successful strategy for some genetic applications. It has the potential to see even wider application and much research is needed. We first give a brief description of composite likelihood. The advantage of this method and potential challenges in inference are noted. Next, its applications in genetic studies are reviewed, specifically in estimating population genetics parameters such as recombination rate, and in multi-locus linkage disequilibrium mapping of disease genes with some discussion about future research directions.
Collapse
Affiliation(s)
- Na Li
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
28
|
Abstract
Although single chi-square analysis of the North American Rheumatoid Arthritis Consortium (NARAC) data identifies many single-nucleotide polymorphisms (SNPs) with p-values less than 0.05, none remain significant after Bonferroni correction. In contrast, CHROMSCAN evades heavy Bonferroni correction and auto-correlation between SNPs by using composite likelihood to model association across all markers in a region and permutation to assess significance. Analysis by CHROMSCAN identifies a 36-kb interval that includes the most significant SNP (msSNP) observed in a 10-Mb target suggested by linkage. Unexpectedly, stratification by gender and age of onset shows that association evidence comes almost entirely from females with age of onset less than 40. Combining evidence from a meta-analysis of linkage studies and three subsets of the NARAC data provides significant evidence for a determinant of rheumatoid arthritis in a 36-kb interval and illustrates the principle that estimates of location and its information are more powerful than estimates of p-values alone.
Collapse
Affiliation(s)
- William Tapper
- Human Genetics Division, University of Southampton, Southampton General Hospital, Tremona Road, Southampton, Hampshire SO16 6YD, UK.
| | | | | |
Collapse
|
29
|
Marquard V, Beckmann L, Bermejo JL, Fischer C, Chang-Claude J. Comparison of measures for haplotype similarity. BMC Proc 2007; 1 Suppl 1:S128. [PMID: 18466470 PMCID: PMC2367614 DOI: 10.1186/1753-6561-1-s1-s128] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Measuring the association of haplotype similarities with phenotype similarities has been used to develop statistical tests of genetic association. Previously, we applied the general approach of Mantel statistics to correlate genetic and phenotype similarity, where genetic similarity was defined by the number of intervals flanked by markers identical by state for pairs of haplotypes. Here we investigated in the case-control study design the effect on power of the Mantel statistics for five different measures of genetic similarity based on haplotypes: 1) the number of shared intervals, 2) the physical length of the shared intervals, 3) the genetic length of the shared intervals in centimorgans, 4) the genetic length of the shared intervals in linkage disequilibrium units (LDU) and 5) Yu's measure that attaches more weight to the sharing of rare than common alleles. With prior knowledge of the answers of Genetic Analysis Workshop 15 Problem 3, we analyzed the simulated data sets in two genomic regions surrounding the disease loci on chromosomes 6 and 18. For the dense map on chromosome 6, all methods showed a very high power of comparable magnitude. For chromosome 18, we observed a power between 19% and 99% at the pointwise 5% significance level using 1000 cases and 1000 controls for all methods except Yu's measure. While it yielded a much lower power, Yu's measure had 80% power around the disease locus.
Collapse
Affiliation(s)
- Vivien Marquard
- Cancer Epidemiology, German Cancer Research Center DKFZ, Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
| | - Lars Beckmann
- Cancer Epidemiology, German Cancer Research Center DKFZ, Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
| | - Justo L Bermejo
- Molecular Genetic Epidemiology, German Cancer Research Center DKFZ, Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
| | - Christine Fischer
- Institute of Human Genetics, University of Heidelberg, Im Neuenheimer Feld 366, 69120 Heidelberg, Germany
| | - Jenny Chang-Claude
- Cancer Epidemiology, German Cancer Research Center DKFZ, Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
| |
Collapse
|
30
|
CHROMSCAN: genome-wide association using a linkage disequilibrium map. J Hum Genet 2007; 53:121-126. [DOI: 10.1007/s10038-007-0226-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2007] [Accepted: 11/07/2007] [Indexed: 10/22/2022]
|
31
|
Linkage disequilibrium maps and location databases. Methods Mol Biol 2007. [PMID: 17984536 DOI: 10.1007/978-1-59745-389-9_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Effective application of association mapping for complex traits requires characterization of linkage disequilibrium (LD) patterns that reflect the dominant process of recombination and its duration in addition to the more subtle influences of mutation, selection, and genetic drift. Maps expressed in linkage disequilibrium units (LDUs) reflect the influences of these factors with the use of a modified version of Malecot's isolation-by-distance model. As a result, LDU maps are analogous to linkage maps in so far as their provision of an additive metric that is related to recombination and facilitates association-mapping studies. However, unlike linkage maps, LDUs also reflect the partly cumulative effects of multiple historical bottlenecks that account for substantial variations in LD patterns between populations. This chapter provides an overview of the data requirements and methodology used to construct LDU maps, their applications outside association mapping, and their integration into location databases.
Collapse
|
32
|
Chiu YF, Liang KY, Chuang LM, Beaty TH. Incorporation of covariates into multipoint linkage disequilibrium mapping in case-control studies. Genet Epidemiol 2007; 32:143-51. [PMID: 17968989 DOI: 10.1002/gepi.20271] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Case-control designs are commonly adopted in genetic epidemiological studies because they are cost effective and offer powerful tests for genetic and environmental risk factors, as well as their interactions. Previously, we proposed an association mapping approach to estimate the position of an unobserved disease locus as well as measuring its genetic effect on risk. The method provides a confidence interval for the estimated map position to help narrow the chromosomal region potentially harboring a disease locus. However, concerns often rise about case-control designs including possible false positives or bias due to confounders, heterogeneity or interactions among genes and between genes and environments. In the present work, we extended the multipoint linkage disequilibrium mapping approach for case-control studies to incorporate information about factors influencing the effect of causal genes to improve precision and efficiency of the estimated location. The efficiency, bias and coverage probability of this extended approach for locating a disease locus using case-control data with and without additional information on a covariate were compared through simulation. An example of a case-control study for type 2 diabetes was used to illustrate this extended method. In this study, a strong association between diabetes and a candidate gene, SCL2A10, was detected among nonobese subjects, whereas no evidence of association was found for either obese subjects or the whole sample when obesity was ignored. Simulation studies and these diabetes data both demonstrate how the efficiency of the estimated location of a disease gene can be improved substantially by incorporating information on covariates.
Collapse
Affiliation(s)
- Yen-Feng Chiu
- Division of Biostatistics and Bioinformatics, National Health Research Institutes, Zhunan, Taiwan
| | | | | | | |
Collapse
|
33
|
Wollstein A, Herrmann A, Wittig M, Nothnagel M, Franke A, Nürnberg P, Schreiber S, Krawczak M, Hampe J. Efficacy assessment of SNP sets for genome-wide disease association studies. Nucleic Acids Res 2007; 35:e113. [PMID: 17726055 PMCID: PMC2034459 DOI: 10.1093/nar/gkm621] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The power of a genome-wide disease association study depends critically upon the properties of the marker set used, particularly the number and physical spacing of markers, and the level of inter-marker association due to linkage disequilibrium. Extending our previously devised theoretical framework for the entropy-based selection of genetic markers, we have developed a local measure of the efficacy of a marker set, relative to including a maximally polymorphic single nucleotide polymorphism (SNP) at the map position of interest. Using this quantitative criterion, we evaluated five currently available SNP sets, namely Affymetrix 100K and 500K, and Illumina 100K, 300K and 550K in the CEU, YRI and JPT + CHB HapMap populations. At 50% relative efficacy, the commercial marker sets cover between 19 and 68% of the human genome, depending upon the population under study. An optimal technology-independent 500K marker set constructed from HapMap for Caucasians, in contrast, would achieve 73% coverage at the same relative efficacy.
Collapse
Affiliation(s)
- Andreas Wollstein
- Cologne Center for Genomics, Cologne, Institute of Clinical Molecular Biology, Christian-Albrechts University, Ist Department of Medicine and Institute of Medical Informatics and Statistics, Christian-Albrechts University, University Hospital Schleswig-Holstein Campus Kiel, Kiel, Germany
| | - Alexander Herrmann
- Cologne Center for Genomics, Cologne, Institute of Clinical Molecular Biology, Christian-Albrechts University, Ist Department of Medicine and Institute of Medical Informatics and Statistics, Christian-Albrechts University, University Hospital Schleswig-Holstein Campus Kiel, Kiel, Germany
| | - Michael Wittig
- Cologne Center for Genomics, Cologne, Institute of Clinical Molecular Biology, Christian-Albrechts University, Ist Department of Medicine and Institute of Medical Informatics and Statistics, Christian-Albrechts University, University Hospital Schleswig-Holstein Campus Kiel, Kiel, Germany
| | - Michael Nothnagel
- Cologne Center for Genomics, Cologne, Institute of Clinical Molecular Biology, Christian-Albrechts University, Ist Department of Medicine and Institute of Medical Informatics and Statistics, Christian-Albrechts University, University Hospital Schleswig-Holstein Campus Kiel, Kiel, Germany
| | - Andre Franke
- Cologne Center for Genomics, Cologne, Institute of Clinical Molecular Biology, Christian-Albrechts University, Ist Department of Medicine and Institute of Medical Informatics and Statistics, Christian-Albrechts University, University Hospital Schleswig-Holstein Campus Kiel, Kiel, Germany
| | - Peter Nürnberg
- Cologne Center for Genomics, Cologne, Institute of Clinical Molecular Biology, Christian-Albrechts University, Ist Department of Medicine and Institute of Medical Informatics and Statistics, Christian-Albrechts University, University Hospital Schleswig-Holstein Campus Kiel, Kiel, Germany
| | - Stefan Schreiber
- Cologne Center for Genomics, Cologne, Institute of Clinical Molecular Biology, Christian-Albrechts University, Ist Department of Medicine and Institute of Medical Informatics and Statistics, Christian-Albrechts University, University Hospital Schleswig-Holstein Campus Kiel, Kiel, Germany
| | - Michael Krawczak
- Cologne Center for Genomics, Cologne, Institute of Clinical Molecular Biology, Christian-Albrechts University, Ist Department of Medicine and Institute of Medical Informatics and Statistics, Christian-Albrechts University, University Hospital Schleswig-Holstein Campus Kiel, Kiel, Germany
| | - Jochen Hampe
- Cologne Center for Genomics, Cologne, Institute of Clinical Molecular Biology, Christian-Albrechts University, Ist Department of Medicine and Institute of Medical Informatics and Statistics, Christian-Albrechts University, University Hospital Schleswig-Holstein Campus Kiel, Kiel, Germany
- *To whom correspondence should be addressed. +49 431 597 1246+49 431 597 1842
| |
Collapse
|
34
|
Angius A, Hyland FCL, Persico I, Pirastu N, Woodage T, Pirastu M, De la Vega FM. Patterns of linkage disequilibrium between SNPs in a Sardinian population isolate and the selection of markers for association studies. Hum Hered 2007; 65:9-22. [PMID: 17652959 DOI: 10.1159/000106058] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2006] [Accepted: 04/30/2007] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVE In isolated populations, 'background' linkage disequilibrium (LD) has been shown to extend over large genetic distances. This and their reduced environmental and genetic heterogeneity has stimulated interest in their potential for association mapping. We compared LD unit map distances with pair-wise measurements of LD in a dense single nucleotide polymorphism (SNP) set. METHODS We genotyped 771 SNPs in an 8 Mb segment of chromosome 22 on 101 individuals from the isolated village of Talana, Sardinia, and compared with outbred European populations. RESULTS Heterozygosity was remarkably similar in both populations. In contrast, the extent of LD observed was quite different. The decay of LD with distance is slower in the isolate. The differences in LD map lengths suggest that useful LD extends up to three times farther in the Sardinian population; smaller differences are seen with pairwise LD metrics. While LD map length slightly decreases with average relatedness, cryptic relatedness does not explain the decrease in LD map length. Haplotypes, block boundaries, and patterns of LD are similar in both populations, suggesting a shared distribution of recombination hotspots. CONCLUSIONS About 15% fewer haplotype tagging SNPs need to be genotyped in the isolate, and possibly 70% fewer if selecting SNPs evenly spaced on the metric LD map.
Collapse
|
35
|
Sebastiani P, Abad-Grau MM. Bayesian estimates of linkage disequilibrium. BMC Genet 2007; 8:36. [PMID: 17592642 PMCID: PMC1924864 DOI: 10.1186/1471-2156-8-36] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2007] [Accepted: 06/25/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The maximum likelihood estimator of D'--a standard measure of linkage disequilibrium--is biased toward disequilibrium, and the bias is particularly evident in small samples and rare haplotypes. RESULTS This paper proposes a Bayesian estimation of D' to address this problem. The reduction of the bias is achieved by using a prior distribution on the pair-wise associations between single nucleotide polymorphisms (SNP)s that increases the likelihood of equilibrium with increasing physical distances between pairs of SNPs. We show how to compute the Bayesian estimate using a stochastic estimation based on MCMC methods, and also propose a numerical approximation to the Bayesian estimates that can be used to estimate patterns of LD in large datasets of SNPs. CONCLUSION Our Bayesian estimator of D' corrects the bias toward disequilibrium that affects the maximum likelihood estimator. A consequence of this feature is a more objective view about the extent of linkage disequilibrium in the human genome, and a more realistic number of tagging SNPs to fully exploit the power of genome wide association studies.
Collapse
Affiliation(s)
- Paola Sebastiani
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA
| | - María M Abad-Grau
- Software Engineering Department, University of Granada, Granada 18071, Spain
| |
Collapse
|
36
|
Müller-Steinhardt M, Ebel B, Härtel C. The impact of interleukin-6 promoter -597/-572/-174genotype on interleukin-6 production after lipopolysaccharide stimulation. Clin Exp Immunol 2007; 147:339-45. [PMID: 17223976 PMCID: PMC1810465 DOI: 10.1111/j.1365-2249.2006.03273.x] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
Interleukin (IL)-6 is a pleiotropic cytokine, produced by different cells. There is accumulating evidence that IL-6 promoter polymorphisms impact substantially on various diseases and we identified kidney transplant recipients carrying the IL-6 GGG/GGG (-597/-572/-174)genotype to have superior graft survival. To prove a functional impact on gene expression, we analysed systematically IL-6 production in healthy individuals with respect to the IL-6 (-597/-572/-174)genotype. IL-6 was determined in 100 healthy blood donors at protein and mRNA levels upon specific stimulation in monocytes and T lymphocytes under whole blood conditions. GGG/GGG individuals showed a lower IL-6 secretion upon lipopolysaccharide (LPS)-stimulation versus all others (P = 0.039). This link was even stronger when (-597) and (-174)GG genotypes were reanalysed separately (P = 0.008, P = 0.017). However, we found neither a difference at the mRNA level or percentage of CD14(+) cells nor after T cell stimulation. We found evidence for the IL-6 (-597/-572/-174)genotype to affect IL-6 synthesis, i.e. lower levels of IL-6 protein upon LPS-stimulation in GGG/GGG individuals. Further studies are needed in kidney transplant recipients to investigate the potential link between the GGG/GGG genotype and graft survival. In line with this, determination of the genetic risk profiles might be promising to improve the transplant outcome in the individual patient.
Collapse
Affiliation(s)
- M Müller-Steinhardt
- Institute of Transfusion Medicine and Immunology, Faculty of Medicine Mannheim, University of Heidelberg, Germany.
| | | | | |
Collapse
|
37
|
Morton N, Maniatis N, Zhang W, Ennis S, Collins A. Genome scanning by composite likelihood. Am J Hum Genet 2007; 80:19-28. [PMID: 17160891 PMCID: PMC1785319 DOI: 10.1086/510401] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2006] [Accepted: 10/24/2006] [Indexed: 01/22/2023] Open
Abstract
Ambitious programs have recently been advocated or launched to create genomewide databases for meta-analysis of association between DNA markers and phenotypes of medical and/or social concern. A necessary but not sufficient condition for success in association mapping is that the data give accurate estimates of both genomic location and its standard error, which are provided for multifactorial phenotypes by composite likelihood. That class includes the Malecot model, which we here apply with an illustrative example. This preliminary analysis leads to five inferences: permutation of cases and controls provides a test of association free of autocorrelation; two hypotheses give similar estimates, but one is consistently more accurate; estimation of the false-discovery rate is extended to causal genes in a small proportion of regions; the minimal data for successful meta-analysis are inferred; and power is robust for all genomic factors except minor-allele frequency. An extension to meta-analysis is proposed. Other approaches to genome scanning and meta-analysis should, if possible, be similarly extended so that their operating characteristics can be compared.
Collapse
Affiliation(s)
- Newton Morton
- Human Genetics Division, University of Southampton, Southampton General Hospital, Southampton ,SO16 6YD, UK.
| | | | | | | | | |
Collapse
|
38
|
Abstract
Over the last few years, association mapping of disease genes has developed into one of the most dynamic research areas of human genetics. It focuses on identifying functional polymorphisms that predispose to complex diseases. Population-based approaches are concerned with exploiting linkage disequilibrium (LD) between single-nucleotide polymorphism (SNPs) and disease-predisposing loci. The utility of SNPs in association mapping is now well established and the interest in this field has been escalated by the discovery of millions of SNPs across the genome. This chapter reviews an association-mapping method that utilizes metric LD maps in LD units and employs a composite likelihood approach to combine information from all single SNP tests. It applies a model that incorporates a parameter for the location of the causal polymorphism. A proof-of-principle application of this method to a small region is given and its potential properties to large-scale datasets are discussed.
Collapse
|
39
|
Abstract
The basis for recent developments on the characterization of the linkage-disequilibrium structure of the genome and the application of association mapping to genes for common human diseases is described. Patterns of linkage disequilibrium are now understood, for a number of human populations, in unprecedented detail. This information not only provides a vital resource for the design and execution of powerful association-mapping studies, but opens new avenues of research into the genetic history of human populations and the effects of natural selection, mutation, and recombination on the genomic landscape.
Collapse
|
40
|
LDMAP: the construction of high-resolution linkage disequilibrium maps of the human genome. Methods Mol Biol 2007; 376:47-57. [PMID: 17984537 DOI: 10.1007/978-1-59745-389-9_4] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
The precise characterization of the linkage disequilibrium (LD) landscape from high-density single-nucleotide polymorphism (SNP) data underpins the association mapping of diseases and other studies. We describe the algorithm and implementation of a powerful approach for constructing LD genetic maps with meaningful map distances. The computational problems posed by the enormous number of SNPs typed in the HapMap data are addressed by developing segmental map construction with the potential for parallelization, which we are developing. There is remarkably little loss of information (1-2%) through this approach, but the computation times are dramatically reduced (more than fourfold for sequential map assembly). These developments enable the construction of very high-density genome-wide LD maps using data from more than 3 million SNPs in HapMap. We anticipate that a whole-genome LD map will be useful for disease gene mapping, genomic research, and population genetics.
Collapse
|
41
|
Morton NE. A history of association mapping. Methods Mol Biol 2007; 376:17-21. [PMID: 17984535 DOI: 10.1007/978-1-59745-389-9_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
The current exciting developments in association mapping are founded on theory, which has been developed since the beginning of the last century. I hereby review these developments in their historical context.
Collapse
|
42
|
Menon R, Fortunato SJ, Thorsen P, Williams S. Genetic associations in preterm birth: a primer of marker selection, study design, and data analysis. ACTA ACUST UNITED AC 2006; 13:531-41. [PMID: 17088082 DOI: 10.1016/j.jsgi.2006.09.006] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2006] [Indexed: 01/16/2023]
Abstract
Spontaneous preterm birth (PTB; delivery before 37 weeks gestation) is a primary risk factor for infant morbidity and mortality. The etiology is unclear, but there is evidence that there is a genetic predisposition to PTB. Armed with the suggestion of genetic risk factors and the failure to identify useful biomarkers, investigators are starting to actively pursue the role of genetic predisposition in PTB. Several studies have been done to date assessing the role of single gene variants. However, positive findings have failed to replicate. We argue that heterogeneity in study designs, definition of phenotype, single-nucleotide polymorphism (SNP) selection, population selection, and sample size makes data interpretation difficult in complex phenotypes such as PTB. In this review, we introduce general concepts of study designs in genetic epidemiology, selection of candidate genes and markers for analysis, and analytical methodologies. We also introduce how the concept of gene-gene interactions (biologic epistasis) and gene-environment interactions may affect the predisposition to PTB.
Collapse
|
43
|
Abstract
Linkage maps have been invaluable for the positional cloning of many genes involved in severe human diseases. Standard genetic linkage maps have been constructed for this purpose from the Centre d'Etude du Polymorphisme Humain and other panels, and have been widely used. Now that attention has shifted towards identifying genes predisposing to common disorders using linkage disequilibrium (LD) and maps of single nucleotide polymorphisms (SNPs), it is of interest to consider a standard LD map which is somewhat analogous to the corresponding map for linkage. We have constructed and evaluated a cosmopolitan LD map by combining samples from a small number of populations using published data from a 10-megabase region on chromosome 20. In support of a pilot study, which examined a number of small genomic regions with a lower density of markers, we have found that a cosmopolitan map, which serves all populations when appropriately scaled, recovers 91 to 95 per cent of the information within population-specific maps. Recombination hot spots appear to have a dominant role in shaping patterns of LD. The success of the cosmopolitan map might be attributed to the co-localisation of hot spots in all populations. Although there must be finer scale differences between populations due to other processes (mutation, drift, selection), the results suggest that a whole-genome standard LD map would indeed be a useful resource for disease gene mapping.
Collapse
Affiliation(s)
- Jane Gibson
- Department of Human Genetics, School of Medicine, University of Southampton, Southampton, SO16 6YD, UK
| | - William Tapper
- Department of Human Genetics, School of Medicine, University of Southampton, Southampton, SO16 6YD, UK
| | - Weihua Zhang
- Department of Human Genetics, School of Medicine, University of Southampton, Southampton, SO16 6YD, UK
| | - Newton Morton
- Department of Human Genetics, School of Medicine, University of Southampton, Southampton, SO16 6YD, UK
| | - Andrew Collins
- Department of Human Genetics, School of Medicine, University of Southampton, Southampton, SO16 6YD, UK
| |
Collapse
|
44
|
Khatkar MS, Collins A, Cavanagh JAL, Hawken RJ, Hobbs M, Zenger KR, Barris W, McClintock AE, Thomson PC, Nicholas FW, Raadsma HW. A first-generation metric linkage disequilibrium map of bovine chromosome 6. Genetics 2006; 174:79-85. [PMID: 16816421 PMCID: PMC1569786 DOI: 10.1534/genetics.106.060418] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We constructed a metric linkage disequilibrium (LD) map of bovine chromosome 6 (BTA6) on the basis of data from 220 SNPs genotyped on 433 Australian dairy bulls. This metric LD map has distances in LD units (LDUs) that are analogous to centimorgans in linkage maps. The LD map of BTA6 has a total length of 8.9 LDUs. Within the LD map, regions of high LD (represented as blocks) and regions of low LD (steps) are observed, when plotted against the integrated map in kilobases. At the most stringent block definition, namely a set of loci with zero LDU increase over the span of these markers, BTA6 comprises 40 blocks, accounting for 41% of the chromosome. At a slightly lower stringency of block definition (a set of loci covering a maximum of 0.2 LDUs on the LD map), up to 81% of BTA6 is spanned by 46 blocks and with 13 steps that are likely to reflect recombination hot spots. The mean swept radius (the distance over which LD is likely to be useful for mapping) is 13.3 Mb, confirming extensive LD in Holstein-Friesian dairy cattle, which makes such populations ideal for whole-genome association studies.
Collapse
Affiliation(s)
- Mehar S Khatkar
- Centre for Advanced Technologies in Animal Genetics and Reproduction (ReproGen), University of Sydney and CRC for Innovative Dairy Products, Camden NSW 2570, Australia.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Service S, DeYoung J, Karayiorgou M, Roos JL, Pretorious H, Bedoya G, Ospina J, Ruiz-Linares A, Macedo A, Palha JA, Heutink P, Aulchenko Y, Oostra B, van Duijn C, Jarvelin MR, Varilo T, Peddle L, Rahman P, Piras G, Monne M, Murray S, Galver L, Peltonen L, Sabatti C, Collins A, Freimer N. Magnitude and distribution of linkage disequilibrium in population isolates and implications for genome-wide association studies. Nat Genet 2006; 38:556-60. [PMID: 16582909 DOI: 10.1038/ng1770] [Citation(s) in RCA: 183] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2005] [Accepted: 02/28/2006] [Indexed: 11/09/2022]
Abstract
The genome-wide distribution of linkage disequilibrium (LD) determines the strategy for selecting markers for association studies, but it varies between populations. We assayed LD in large samples (200 individuals) from each of 11 well-described population isolates and an outbred European-derived sample, using SNP markers spaced across chromosome 22. Most isolates show substantially higher levels of LD than the outbred sample and many fewer regions of very low LD (termed 'holes'). Young isolates known to have had relatively few founders show particularly extensive LD with very few holes; these populations offer substantial advantages for genome-wide association mapping.
Collapse
Affiliation(s)
- Susan Service
- Center for Neurobehavioral Genetics, University of California, Los Angeles, California 90095, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
46
|
Wang Y, Zhao LP, Dudoit S. A fine-scale linkage-disequilibrium measure based on length of haplotype sharing. Am J Hum Genet 2006; 78:615-28. [PMID: 16532392 PMCID: PMC1424683 DOI: 10.1086/502632] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2005] [Accepted: 01/23/2006] [Indexed: 11/03/2022] Open
Abstract
High-throughput genotyping technologies for SNPs have enabled the recent completion of the International HapMap Project (phase I), which has stimulated much interest in studying genomewide linkage-disequilibrium (LD) patterns. Conventional LD measures, such as D' and r(2), are two-point measurements, and their relationship with physical distance is highly noisy. We propose a new LD measure, Delta , defined in terms of the correlation coefficient for shared haplotype lengths around two loci, thereby borrowing information from multiple loci. A U-statistic-based estimator of Delta , which takes into consideration the dependence structure of the observed data, is developed and compared with an estimator based on the usual empirical correlation coefficient. Furthermore, we propose methods for inferring LD-decay rates and recombination hotspots on the basis of Delta . The results from coalescent-simulation studies and analysis of HapMap SNP data demonstrate that the proposed estimators of Delta are superior to the two most popular conventional LD measures, in terms of their close relationship with physical distance and recombination rate, their small variability, and their strong robustness to marker-allele frequencies. These merits may offer new opportunities for mapping complex disease genes and for investigating recombination mechanisms on the basis of better-quantified LD.
Collapse
Affiliation(s)
- Yan Wang
- Division of Biostatistics, University of California, Berkeley, USA.
| | | | | |
Collapse
|
47
|
Pe’er I, Chretien YR, de Bakker PIW, Barrett JC, Daly MJ, Altshuler DM. Biases and reconciliation in estimates of linkage disequilibrium in the human genome. Am J Hum Genet 2006; 78:588-603. [PMID: 16532390 PMCID: PMC1424697 DOI: 10.1086/502803] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2005] [Accepted: 01/20/2006] [Indexed: 01/07/2023] Open
Abstract
Genetic association studies of common disease often rely on linkage disequilibrium (LD) along the human genome and in the population under study. Although understanding the characteristics of this correlation has been the focus of many large-scale surveys (culminating in genomewide haplotype maps), the results of different studies have yielded wide-ranging estimates. Since understanding these differences (and whether they can be reconciled) has important implications for whole-genome association studies, in this article we dissect biases in these estimations that are due to known aspects of study design and analytic methodology. In particular, we document in the empirical data that the long-known complicating effects of allele frequency, marker density, and sample size largely reconcile all large-scale surveys. Two exceptions are an underappraisal of redundancy among single-nucleotide polymorphisms (SNPs) when evaluation is limited to short regions (as in candidate-gene resequencing studies) and an inflation in the extent of LD in HapMap phase I, which is likely due to oversampling of specific haplotypes in the creation of the public SNP map. Understanding these factors can guide the understanding of empirical LD surveys and has implications for genetic association studies.
Collapse
Affiliation(s)
- Itsik Pe’er
- Center for Human Genetic Research, Department of Molecular Biology, and Diabetes Unit, Massachusetts General Hospital, and Departments of Genetics and Medicine, Harvard Medical School, Boston; Broad Institute of M.I.T. and Harvard and Harvard-M.I.T. Division of Health Sciences and Technology, Cambridge, MA; and Wellcome Trust Genome Campus, Oxford, United Kingdom
| | - Yves R. Chretien
- Center for Human Genetic Research, Department of Molecular Biology, and Diabetes Unit, Massachusetts General Hospital, and Departments of Genetics and Medicine, Harvard Medical School, Boston; Broad Institute of M.I.T. and Harvard and Harvard-M.I.T. Division of Health Sciences and Technology, Cambridge, MA; and Wellcome Trust Genome Campus, Oxford, United Kingdom
| | - Paul I. W. de Bakker
- Center for Human Genetic Research, Department of Molecular Biology, and Diabetes Unit, Massachusetts General Hospital, and Departments of Genetics and Medicine, Harvard Medical School, Boston; Broad Institute of M.I.T. and Harvard and Harvard-M.I.T. Division of Health Sciences and Technology, Cambridge, MA; and Wellcome Trust Genome Campus, Oxford, United Kingdom
| | - Jeffrey C. Barrett
- Center for Human Genetic Research, Department of Molecular Biology, and Diabetes Unit, Massachusetts General Hospital, and Departments of Genetics and Medicine, Harvard Medical School, Boston; Broad Institute of M.I.T. and Harvard and Harvard-M.I.T. Division of Health Sciences and Technology, Cambridge, MA; and Wellcome Trust Genome Campus, Oxford, United Kingdom
| | - Mark J. Daly
- Center for Human Genetic Research, Department of Molecular Biology, and Diabetes Unit, Massachusetts General Hospital, and Departments of Genetics and Medicine, Harvard Medical School, Boston; Broad Institute of M.I.T. and Harvard and Harvard-M.I.T. Division of Health Sciences and Technology, Cambridge, MA; and Wellcome Trust Genome Campus, Oxford, United Kingdom
| | - David M. Altshuler
- Center for Human Genetic Research, Department of Molecular Biology, and Diabetes Unit, Massachusetts General Hospital, and Departments of Genetics and Medicine, Harvard Medical School, Boston; Broad Institute of M.I.T. and Harvard and Harvard-M.I.T. Division of Health Sciences and Technology, Cambridge, MA; and Wellcome Trust Genome Campus, Oxford, United Kingdom
| |
Collapse
|
48
|
Morton NE. Fifty years of genetic epidemiology, with special reference to Japan. J Hum Genet 2006; 51:269-277. [PMID: 16479316 DOI: 10.1007/s10038-006-0366-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2005] [Accepted: 12/18/2005] [Indexed: 10/25/2022]
Abstract
Genetic epidemiology deals with etiology, distribution, and control of disease in groups of relatives and with inherited causes of disease in populations. It took its first steps before its recognition as a discipline, and did not reach its present scope until the Human Genome Project succeeded. The intimate relationship between genetics and epidemiology was discussed by Neel and Schull (1954), just a year after Watson and Crick reported the DNA double helix, and 2 years before human cytogenetics and the Japan Society of Human Genetics were founded. It is convenient to divide the next half-century into three phases. The first of these (1956-1979) was before DNA polymorphisms were typed, and so the focus was on segregation and linkage of major genes, cytogenetics, population studies, and biochemical genetics. The next phase (1980-2001) progressively identified DNA polymorphisms and their application to complex inheritance. The last phase began with a reliable sequence of the human genome (2002), followed by exploration of genomic diversity. Linkage continues to be useful to study recombination and to map major genes, but association mapping gives much greater resolution and enables studies of complex inheritance. The generation now entering human genetics will have collaborative opportunities undreamed of a few years ago, without the independence that led to great advances during the past half-century.
Collapse
Affiliation(s)
- Newton E Morton
- Human Genetics Division, Southampton General Hospital, School of Medicine, , University of Southampton, Duthie Building (MP 808), SO16 6YD, Southampton, UK.
| |
Collapse
|
49
|
Liu Z, Lin S. Multilocus LD measure and tagging SNP selection with generalized mutual information. Genet Epidemiol 2006; 29:353-64. [PMID: 16173096 PMCID: PMC2596944 DOI: 10.1002/gepi.20092] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Linkage disequilibrium (LD) plays a central role in fine mapping of disease genes and, more recently, in characterizing haplotype blocks. Classical LD measures, such as D' and r(2), are frequently used to quantify relationship between two loci. A pairwise "distance" matrix among a set of loci can be constructed using such a measure, and based upon which a number of haplotype block detection and tagging single nucleotide polymorphism (SNP) selection algorithms have been devised. Although successful in many applications, the pairwise nature of these measures does not provide a direct characterization of joint linkage disequilibrium among multiple loci. Consequently, applications based on them may lead to loss of important information. In this report, we propose a multilocus LD measure based on generalized mutual information, which is also known as relative entropy or Kullback-Leibler distance. In essence, this measure seeks to quantify the distance between the observed haplotype distribution and the expected distribution assuming linkage equilibrium. We can show that this measure is approximately equal to r(2) in the special case with two loci. Based on this multilocus LD measure and an entropy measure that characterizes haplotype diversity, we propose a class of stepwise tagging SNP selection algorithms. This represents a unified approach for SNP selection in that it takes into account both the haplotype diversity and linkage disequilibrium objectives. Applications to both simulated and real data demonstrate the utility of the proposed methods for handling a large number of SNPs. The results indicate that multilocus LD patterns can be captured well, and informative and nonredundant SNPs can be selected effectively from a large set of loci.
Collapse
Affiliation(s)
- Zhenqiu Liu
- Department of Statistics, Ohio State University, Columbus, Ohio 43210-1247, USA
| | | |
Collapse
|
50
|
Franke A, Wollstein A, Teuber M, Wittig M, Lu T, Hoffmann K, Nürnberg P, Krawczak M, Schreiber S, Hampe J. GENOMIZER: an integrated analysis system for genome-wide association data. Hum Mutat 2006; 27:583-8. [PMID: 16652332 DOI: 10.1002/humu.20306] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Genome-wide association analysis appears to be a promising way to identify heritable susceptibility factors for complex human disorders. However, the feasibility of large-scale genotyping experiments is currently limited by an incomplete marker coverage of the genome, a restricted understanding of the functional role of given genomic regions, and the small sample sizes used. Thus, genome-wide association analysis will be a screening tool to facilitate subsequent gene discovery rather than a means to completely resolve individual genetic risk profiles. The validation of association findings will continue to rely upon the replication of "leads" in independent samples from either the same or different populations. Even under such pragmatic conditions, the timely analysis of the large data sets in question poses serious technical challenges. We have therefore developed public-domain software, GENOMIZER, that implements the workflow of an association experiment, including data management, single-point and haplotype analysis, "lead" definition, and data visualization. GENOMIZER (www.ikmb.uni-kiel.de/genomizer) comes with a complete user manual, and is open-source software licensed under the GNU Lesser General Public License. We suggest that the use of this software will facilitate the handling and interpretation of the currently emerging genome-wide association data.
Collapse
Affiliation(s)
- Andre Franke
- Institute of Clinical Molecular Biology, Kiel Center of the German National Genotyping Platform, Christian-Albrechts-University, Kiel, Germany
| | | | | | | | | | | | | | | | | | | |
Collapse
|