1
|
Ali AT, Liebert A, Lau W, Maniatis N, Swallow DM. The hazards of genotype imputation in chromosomal regions under selection: A case study using the Lactase gene region. Ann Hum Genet 2021; 86:24-33. [PMID: 34523124 DOI: 10.1111/ahg.12444] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Revised: 07/15/2021] [Accepted: 08/11/2021] [Indexed: 11/30/2022]
Abstract
Although imputation of missing SNP results has been widely used in genetic studies, claims about the quality and usefulness of imputation have outnumbered the few studies that have questioned its limitations. But it is becoming clear that these limitations are real-for example, disease association signals can be missed in regions of LD breakdown. Here, as a case study, using the chromosomal region of the well-known lactase gene, LCT, we address the issue of imputation in the context of variants that have become frequent in a limited number of modern population groups only recently, due to selection. We study SNPs in a 500 bp region covering the enhancer of LCT, and compare imputed genotypes with directly genotyped data. We examine the haplotype pairs of all individuals with discrepant and missing genotypes. We highlight the nonrandom nature of the allelic errors and show that most incorrect imputations and missing data result from long haplotypes that are evolutionarily closely related to those carrying the derived alleles, while some relate to rare and recombinant haplotypes. We conclude that bias of incorrectly imputed and missing genotypes can decrease the accuracy of imputed results substantially.
Collapse
Affiliation(s)
- Aminah T Ali
- University College London Research Department of Genetics Evolution and Environment, London, UK
| | - Anke Liebert
- University College London Research Department of Genetics Evolution and Environment, London, UK
| | - Winston Lau
- University College London Research Department of Genetics Evolution and Environment, London, UK
| | - Nikolas Maniatis
- University College London Research Department of Genetics Evolution and Environment, London, UK
| | - Dallas M Swallow
- University College London Research Department of Genetics Evolution and Environment, London, UK
| |
Collapse
|
2
|
Liebert A, López S, Jones BL, Montalva N, Gerbault P, Lau W, Thomas MG, Bradman N, Maniatis N, Swallow DM. World-wide distributions of lactase persistence alleles and the complex effects of recombination and selection. Hum Genet 2017; 136:1445-1453. [PMID: 29063188 PMCID: PMC5702378 DOI: 10.1007/s00439-017-1847-y] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2017] [Accepted: 10/07/2017] [Indexed: 01/17/2023]
Abstract
The genetic trait of lactase persistence (LP) is associated with at least five independent functional single nucleotide variants in a regulatory region about 14 kb upstream of the lactase gene [−13910*T (rs4988235), −13907*G (rs41525747), −13915*G (rs41380347), −14009*G (rs869051967) and −14010*C (rs145946881)]. These alleles have been inferred to have spread recently and present-day frequencies have been attributed to positive selection for the ability of adult humans to digest lactose without risk of symptoms of lactose intolerance. One of the inferential approaches used to estimate the level of past selection has been to determine the extent of haplotype homozygosity (EHH) of the sequence surrounding the SNP of interest. We report here new data on the frequencies of the known LP alleles in the ‘Old World’ and their haplotype lineages. We examine and confirm EHH of each of the LP alleles in relation to their distinct lineages, but also show marked EHH for one of the older haplotypes that does not carry any of the five LP alleles. The region of EHH of this (B) haplotype exactly coincides with a region of suppressed recombination that is detectable in families as well as in population data, and the results show how such suppression may have exaggerated haplotype-based measures of past selection.
Collapse
Affiliation(s)
- Anke Liebert
- Research Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London, WC1E 6BT, UK
- Department of Paediatrics, University of Cambridge, Box 116, Level 8, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, UK
| | - Saioa López
- Research Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London, WC1E 6BT, UK
| | - Bryony Leigh Jones
- Research Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London, WC1E 6BT, UK
| | - Nicolas Montalva
- Research Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London, WC1E 6BT, UK
- UCL Department of Anthropology, Human Evolutionary Ecology Group, University College London, 14 Taviton Street, London, WC1H 0BW, UK
- Departmento de Antropología, Facultad de Ciencias Sociales y Jurídicas, Universidad de Tarapacá, 384 Calle Cardenal Caro, Arica, Chile
| | - Pascale Gerbault
- Research Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London, WC1E 6BT, UK
- Department of Life Sciences, Faculty of Science and Technology, University of Westminster, 115 New Cavendish Street, London, W1W 6UW, UK
| | - Winston Lau
- Research Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London, WC1E 6BT, UK
| | - Mark G Thomas
- Research Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London, WC1E 6BT, UK
| | - Neil Bradman
- Henry Stewart Group, 28/30 Little Russell Street, London, WC1A 2HN, UK
| | - Nikolas Maniatis
- Research Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London, WC1E 6BT, UK
| | - Dallas M Swallow
- Research Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
3
|
Lau W, Andrew T, Maniatis N. High-Resolution Genetic Maps Identify Multiple Type 2 Diabetes Loci at Regulatory Hotspots in African Americans and Europeans. Am J Hum Genet 2017; 100:803-816. [PMID: 28475862 DOI: 10.1016/j.ajhg.2017.04.007] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2016] [Accepted: 04/11/2017] [Indexed: 10/19/2022] Open
Abstract
Interpretation of results from genome-wide association studies for T2D is challenging. Only very few loci have been replicated in African ancestry populations and the identification of the implicated functional genes remain largely undefined. We used genetic maps that capture detailed linkage disequilibrium information in European and African Americans and applied these to large T2D case-control samples in order to estimate locations for putative functional variants in both populations. Replicated T2D locations were tested for evidence of being regulatory hotspots using adipose expression. We validated a sample of our co-location intervals using next generation sequencing and functional annotation, including enhancers, transcription, and chromatin modifications. We identified 111 additional disease-susceptibility locations, 93 of which are cosmopolitan and 18 of which are European specific. We show that many previously known signals are also risk loci in African Americans. The majority of the disease locations appear to confer risk of T2D via the regulation of expression levels for a large number (266) of cis-regulated genes, the majority of which are not the nearest genes to the disease loci. Sequencing three cosmopolitan locations provided candidate functional variants that precisely co-locate with cell-specific chromatin domains and pancreatic islet enhancers. These variants have large effect sizes and are common across populations. Results show that disease-associated loci in different populations, gene expression, and cell-specific regulatory annotation can be effectively integrated by localizing these effects on high-resolution genetic maps. The cis-regulated genes provide insights into the complex molecular pathways involved and can be used as targets for sequencing and functional molecular studies.
Collapse
|
4
|
Collier DA, Eastwood BJ, Malki K, Mokrab Y. Advances in the genetics of schizophrenia: toward a network and pathway view for drug discovery. Ann N Y Acad Sci 2016; 1366:61-75. [DOI: 10.1111/nyas.13066] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Revised: 03/15/2016] [Accepted: 03/18/2016] [Indexed: 11/28/2022]
Affiliation(s)
- David A. Collier
- Discovery Neuroscience Research; Eli Lilly and Company Ltd; Windlesham Surrey United Kingdom
| | - Brian J. Eastwood
- Discovery Neuroscience Research; Eli Lilly and Company Ltd; Windlesham Surrey United Kingdom
| | - Karim Malki
- Discovery Neuroscience Research; Eli Lilly and Company Ltd; Windlesham Surrey United Kingdom
| | - Younes Mokrab
- Discovery Neuroscience Research; Eli Lilly and Company Ltd; Windlesham Surrey United Kingdom
- Sidra Medical and Research Center; Doha Qatar
| |
Collapse
|
5
|
Elding H, Lau W, Swallow D, Maniatis N. Refinement in localization and identification of gene regions associated with Crohn disease. Am J Hum Genet 2013; 92:107-13. [PMID: 23246291 PMCID: PMC3542460 DOI: 10.1016/j.ajhg.2012.11.004] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2012] [Revised: 08/03/2012] [Accepted: 11/05/2012] [Indexed: 12/13/2022] Open
Abstract
The risk of Crohn disease (CD) has a large genetic component. A recent meta-analysis of 6 genome-wide association studies reported 71 chromosomal intervals but does not account for all of the known genetic contribution. Here, we refine localization of the previously reported intervals and also identify additional CD susceptibility genes using a mapping approach that localizes causal variants based on genetic maps in linkage disequilibrium units (LDU maps). Using 2 of the 6 cohorts, 66 of the 71 previously reported loci are confirmed and more precise location estimates for these intervals are given. We identify 78 additional gene regions that pass genome-wide significance, providing strong evidence for 144 genes. Additionally, 56 nominally significant signals, but with more stringent and precise colocalization, are identified. In total, we provide evidence for 200 gene regions confirming that CD is truly multifactorial and complex in nature. Many identified genes have functions that are compatible with involvement in immune/inflammatory processes and seem to have a large effect in individuals with extra ileal as well as ileal inflammation. The precise locations and the evidence that some genes reflect phenotypic subgroups will help identify functional variants and will lead to greater insight of CD etiology.
Collapse
Affiliation(s)
- Heather Elding
- Research Department of Genetics Evolution & Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Winston Lau
- Research Department of Genetics Evolution & Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Dallas M. Swallow
- Research Department of Genetics Evolution & Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Nikolas Maniatis
- Research Department of Genetics Evolution & Environment, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
6
|
Elding H, Lau W, Swallow D, Maniatis N. Dissecting the genetics of complex inheritance: linkage disequilibrium mapping provides insight into Crohn disease. Am J Hum Genet 2011; 89:798-805. [PMID: 22152681 PMCID: PMC3234369 DOI: 10.1016/j.ajhg.2011.11.006] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2011] [Revised: 10/24/2011] [Accepted: 11/08/2011] [Indexed: 12/21/2022] Open
Abstract
Family studies for Crohn disease (CD) report extensive linkage on chromosome 16q and pinpoint NOD2 as a possible causative locus. However, linkage is also observed in families that do not bear the most frequent NOD2 causative mutations, but no other signals on 16q have been found so far in published genome-wide association studies. Our aim is to identify this missing genetic contribution. We apply a powerful genetic mapping approach to the Wellcome Trust Case-Control Consortium and the National Institute of Diabetes and Digestive and Kidney Diseases genome-wide association data on CD. This method takes into account the underlying structure of linkage disequilibrium (LD) by using genetic distances from LD maps and provides a location for the causal agent. We find genetic heterogeneity within the NOD2 locus and also show an independent and unsuspected involvement of the neighboring gene, CYLD. We find associations with the IRF8 region and the region containing CDH1 and CDH3, as well as substantial phenotypic and genetic heterogeneity for CD itself. The genes are known to be involved in inflammation and immune dysregulation. These findings provide insight into the genetics of CD and suggest promising directions for understanding disease heterogeneity. The application of this method thus paves the way for understanding complex inheritance in general, leading to the dissection of different pathways and ultimately, personalized treatment.
Collapse
|
7
|
Politopoulos I, Gibson J, Tapper W, Ennis S, Eccles D, Collins A. Composite likelihood-based meta-analysis of breast cancer association studies. J Hum Genet 2011; 56:377-82. [DOI: 10.1038/jhg.2011.23] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
8
|
Borzani I, Tola MR, Caniatti L, Collins A, De Santis G, Luiselli D, Mamolini E, Scapoli C. The interleukin-1 cluster gene region is associated with multiple sclerosis in an Italian Caucasian population. Eur J Neurol 2010; 17:930-8. [DOI: 10.1111/j.1468-1331.2010.02952.x] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
9
|
Individual disease risk and multimetric analysis of Crohn disease. Proc Natl Acad Sci U S A 2008; 105:15843-7. [PMID: 18843111 DOI: 10.1073/pnas.0808009105] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Rare dominant genes with high penetrance can be identified by linkage without inbreeding, whereas rare recessive genes with high penetrance are most efficiently recognized by autozygosity mapping of homozygotes in pedigrees with preferential inbreeding. On the contrary, complex inheritance is characterized by common genes with low penetrance, for which family studies and inbreeding are inefficient. Here, we develop the Fisherian theory for diallelic cases and controls, show that it compares favorably with Bayesian estimates, and evaluate their currently low power for discriminating cases and controls in Crohn disease (CD). Significance is enhanced by inclusion of composite likelihood, but identification of causal loci is delayed by low recognition of gene function. Clearly, association mapping is not yet optimal, and so strenuous effort is justified to develop a more inclusive gene map and association tests more powerful than single markers and the current use of composite likelihood. Because of its relatively high heritability and the correspondingly large number of detected causal loci, CD presents an ideal test system to determine the power and flaws of competing methods of whole-genome case/control association analysis in publicly available data. Until such a test is exploited by competing statisticians, their Herculean efforts will be inconclusive, and the costly advances from increased sample size will be suboptimal and disappointing.
Collapse
|
10
|
Andrew T, Maniatis N, Carbonaro F, Liew SHM, Lau W, Spector TD, Hammond CJ. Identification and replication of three novel myopia common susceptibility gene loci on chromosome 3q26 using linkage and linkage disequilibrium mapping. PLoS Genet 2008; 4:e1000220. [PMID: 18846214 PMCID: PMC2556391 DOI: 10.1371/journal.pgen.1000220] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2008] [Accepted: 09/10/2008] [Indexed: 12/22/2022] Open
Abstract
Refractive error is a highly heritable quantitative trait responsible for considerable morbidity. Following an initial genome-wide linkage study using microsatellite markers, we confirmed evidence for linkage to chromosome 3q26 and then conducted fine-scale association mapping using high-resolution linkage disequilibrium unit (LDU) maps. We used a preliminary discovery marker set across the 30-Mb region with an average SNP density of 1 SNP/15 kb (Map 1). Map 1 was divided into 51 LDU windows and additional SNPs were genotyped for six regions (Map 2) that showed preliminary evidence of multi-marker association using composite likelihood. A total of 575 cases and controls selected from the tails of the trait distribution were genotyped for the discovery sample. Malecot model estimates indicate three loci with putative common functional variants centred on MFN1 (180,566 kb; 95% confidence interval 180,505-180, 655 kb), approximately 156 kb upstream from alternate-splicing SOX2OT (182,595 kb; 95% CI 182,533-182,688 kb) and PSARL (184,386 kb; 95% CI 184,356-184,411 kb), with the loci showing modest to strong evidence of association for the Map 2 discovery samples (p<10(-7), p<10(-10), and p = 0.01, respectively). Using an unselected independent sample of 1,430 individuals, results replicated for the MFN1 (p = 0.006), SOX2OT (p = 0.0002), and PSARL (p = 0.0005) gene regions. MFN1 and PSARL both interact with OPA1 to regulate mitochondrial fusion and the inhibition of mitochondrial-led apoptosis, respectively. That two mitochondrial regulatory processes in the retina are implicated in the aetiology of myopia is surprising and is likely to provide novel insight into the molecular genetic basis of common myopia.
Collapse
Affiliation(s)
- Toby Andrew
- Twin Research and Genetic Epidemiology, King's College London, St Thomas' Hospital, London, UK.
| | | | | | | | | | | | | |
Collapse
|
11
|
Li N. The promise of composite likelihood methods for addressing computationally intensive challenges. ADVANCES IN GENETICS 2008; 60:637-654. [PMID: 18358335 DOI: 10.1016/s0065-2660(07)00422-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
High-dimensional genetic data, due to its complex correlation structure, poses an enormous challenge to standard likelihood-based methods for making statistical inference. As an approximation, composite likelihood has proved to be a successful strategy for some genetic applications. It has the potential to see even wider application and much research is needed. We first give a brief description of composite likelihood. The advantage of this method and potential challenges in inference are noted. Next, its applications in genetic studies are reviewed, specifically in estimating population genetics parameters such as recombination rate, and in multi-locus linkage disequilibrium mapping of disease genes with some discussion about future research directions.
Collapse
Affiliation(s)
- Na Li
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
12
|
[Toward a non-empirical treatment for rheumatoid arthritis based on its molecular pathology]. ACTA ACUST UNITED AC 2008; 4:19-31. [PMID: 21794490 DOI: 10.1016/s1699-258x(08)71791-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2007] [Accepted: 11/29/2007] [Indexed: 11/21/2022]
Abstract
Rheumatoid arthritis (RA) is a chronic, disabbling disease that affects individuals during the productive years of their lives. Modern treatment for RA includes the so called "biologic" therapy, which is based on recombinant proteins that modify the biologic processes. These agents have potent therapeutic effects and different mechanisms of action. Nevertheless, therapeutic failure still prevails. Treatment that prevents disability in RA must be started in an early manner, before the development of complications and, ideally, with a minimum possibility of therapeutic failure. As yet, there are no clinical or laboratory criteria to identify those patients with a higher probability of responding to particular types of therapy, delaying control of RA ad affecting the prevention of incapacity. Research into gene diversity through single-nucleotide polymorphisms (SNPs) by means of microarray systems, allows the detailed analysis of gene factors associated to a given disease. SNPs have been recently applied to the study of RA, where the major polymorphisms associated to RA occur primarily in genes that code for proteins related to the initiation of an immune response and/or the control of cellular activity in the immune system, in addition to genes related to tissue repair. The specific meaning of these findings is in its initial stages of research. On the other hand, proteomics relate to the analysis of protein expression profiles at multiple levels. Both types of studies will contribute to the knowledge of patterns of gene expression in RA compared to the general population, and will allow an understanding of the pathogenesis of RA. Moreover, proteomic and genomic profiles can be employed to designs probes that identify individuals with the risk of developing RA, individually predict the response to different therapeutic modalities (pharmacogenomics) and for the follow-up of the biologic response to therapy.
Collapse
|
13
|
Abstract
Association methods based on linkage disequilibrium (LD) offer a promising approach for detecting genetic variations that are responsible for complex human diseases. Although methods based on individual single nucleotide polymorphisms (SNPs) may lead to significant findings, methods based on haplotypes comprising multiple SNPs on the same inherited chromosome may provide additional power for mapping disease genes and also provide insight on factors influencing the dependency among genetic markers. Such insights may provide information essential for understanding human evolution and also for identifying cis-interactions between two or more causal variants. Because obtaining haplotype information directly from experiments can be cost prohibitive in most studies, especially in large scale studies, haplotype analysis presents many unique challenges. In this chapter, we focus on two main issues: haplotype inference and haplotype-association analysis. We first provide a detailed review of methods for haplotype inference using unrelated individuals as well as related individuals from pedigrees. We then cover a number of statistical methods that employ haplotype information in association analysis. In addition, we discuss the advantages and limitations of different methods.
Collapse
Affiliation(s)
- Nianjun Liu
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | | | | |
Collapse
|
14
|
Abstract
Over the last few years, association mapping of disease genes has developed into one of the most dynamic research areas of human genetics. It focuses on identifying functional polymorphisms that predispose to complex diseases. Population-based approaches are concerned with exploiting linkage disequilibrium (LD) between single-nucleotide polymorphism (SNPs) and disease-predisposing loci. The utility of SNPs in association mapping is now well established and the interest in this field has been escalated by the discovery of millions of SNPs across the genome. This chapter reviews an association-mapping method that utilizes metric LD maps in LD units and employs a composite likelihood approach to combine information from all single SNP tests. It applies a model that incorporates a parameter for the location of the causal polymorphism. A proof-of-principle application of this method to a small region is given and its potential properties to large-scale datasets are discussed.
Collapse
|