1
|
Fu Y, Kenttämies A, Ruotsalainen S, Pirinen M, Tukiainen T. Role of X chromosome and dosage-compensation mechanisms in complex trait genetics. Am J Hum Genet 2025:S0002-9297(25)00145-4. [PMID: 40359939 DOI: 10.1016/j.ajhg.2025.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2025] [Revised: 04/16/2025] [Accepted: 04/16/2025] [Indexed: 05/15/2025] Open
Abstract
The X chromosome (chrX) is often excluded from genome-wide association studies due to its unique biology complicating the analysis and interpretation of genetic data. Consequently, the influence of chrX on human complex traits remains debated. Here, we systematically assessed the relevance of chrX and the effect of its biology on complex traits by analyzing 48 quantitative traits in 343,695 individuals in UK Biobank with replication in 412,181 individuals from FinnGen. We show that, in the general population, chrX contributes to complex trait heritability at a rate of 3% of the autosomal heritability, consistent with the amount of genetic variation observed in chrX. We find that a pronounced male bias in chrX heritability supports the presence of near-complete dosage compensation between sexes through X chromosome inactivation (XCI). However, we also find subtle yet plausible evidence of escape from XCI contributing to human height. Assuming full XCI, the observed chrX contribution to complex trait heritability in both sexes is greater than expected given the presence of only a single active copy of chrX, mirroring potential dosage compensation between chrX and the autosomes. We find this enhanced contribution attributable to systematically larger active allele effects from chrX compared to autosomes in both sexes, independent of allele frequency and variant deleteriousness. Together, these findings support a model in which the two dosage-compensation mechanisms work in concert to balance the influence of chrX across the population while preserving sex-specific differences at a manageable level. Overall, our study advocates for more comprehensive locus discovery efforts in chrX.
Collapse
Affiliation(s)
- Yu Fu
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, 00014 Helsinki, Finland
| | - Aino Kenttämies
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, 00014 Helsinki, Finland
| | - Sanni Ruotsalainen
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, 00014 Helsinki, Finland
| | - Matti Pirinen
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, 00014 Helsinki, Finland; Department of Public Health, University of Helsinki, 00014 Helsinki, Finland; Department of Mathematics and Statistics, University of Helsinki, 00014 Helsinki, Finland
| | - Taru Tukiainen
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, 00014 Helsinki, Finland.
| |
Collapse
|
2
|
Subramanian S. The Abundance of Harmful Rare Homozygous Variants in Children of Consanguineous Parents. BIOLOGY 2025; 14:310. [PMID: 40136566 PMCID: PMC11940780 DOI: 10.3390/biology14030310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2024] [Revised: 03/16/2025] [Accepted: 03/17/2025] [Indexed: 03/27/2025]
Abstract
The children born of consanguineous union were found to have a higher incidence of recessive genetic diseases than the offspring of unrelated parents. The reason for this was predicted to be the presence of more deleterious rare homozygous genetic variants in the former compared to the latter. However, the magnitude of this difference is unknown. Using more than 2500 whole genomes, we show here that the individuals born of the union between double (paternal and maternal) first cousins had 20 times more deleterious rare homozygous single nucleotide variants (SNVs) than those who had unrelated parents. Furthermore, the children of first cousins had 10 times, and the children of second cousins had two times more of these SNVs compared to those present in the offspring of unrelated parents. Similar magnitudes of differences were found for the nonsynonymous deleterious rare homozygous SNVs (19, 10, and 2 times, respectively). In contrast, the differences in the number of deleterious low-frequency and common homozygous variants between the children of cousins and those of unrelated parents were 1-3 times and 1-7%, respectively. These results suggest that the offspring of consanguineous union could have a 20 times higher risk of recessive autosomal diseases caused by rare variants. Conversely, consanguinity appears to have little effect on the risk of common diseases. These findings have implications for future clinical research in identifying genetic variants associated with inherited diseases. Furthermore, the magnitude of the elevated risk revealed in this study could be useful in genetic counseling and for public health in creating awareness.
Collapse
Affiliation(s)
- Sankar Subramanian
- Centre for Bioinnovation, School of Science, Technology, and Engineering, The University of the Sunshine Coast, Moreton Bay, QLD 4502, Australia
| |
Collapse
|
3
|
Wang J, Zhang Z, Lu Z, Mancuso N, Gazal S. Genes with differential expression across ancestries are enriched in ancestry-specific disease effects likely due to gene-by-environment interactions. Am J Hum Genet 2024; 111:2117-2128. [PMID: 39191255 PMCID: PMC11480800 DOI: 10.1016/j.ajhg.2024.07.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 07/26/2024] [Accepted: 07/30/2024] [Indexed: 08/29/2024] Open
Abstract
Multi-ancestry genome-wide association studies (GWASs) have highlighted the existence of variants with ancestry-specific effect sizes. Understanding where and why these ancestry-specific effects occur is fundamental to understanding the genetic basis of human diseases and complex traits. Here, we characterized genes differentially expressed across ancestries (ancDE genes) at the cell-type level by leveraging single-cell RNA-sequencing data in peripheral blood mononuclear cells for 21 individuals with East Asian (EAS) ancestry and 23 individuals with European (EUR) ancestry (172,385 cells); then, we tested whether variants surrounding those genes were enriched in disease variants with ancestry-specific effect sizes by leveraging ancestry-matched GWASs of 31 diseases and complex traits (average n ∼ 90,000 and ∼ 267,000 in EAS and EUR, respectively). We observed that ancDE genes tended to be cell-type specific and enriched in genes interacting with the environment and in variants with ancestry-specific disease effect sizes, which suggests cell-type-specific, gene-by-environment interactions shared between regulatory and disease architectures. Finally, we illustrated how different environments might have led to ancestry-specific myeloid cell leukemia 1 (MCL1) expression in B cells and ancestry-specific allele effect sizes in lymphocyte count GWASs for variants surrounding MCL1. Our results imply that large single-cell and GWAS datasets from diverse ancestries are required to improve our understanding of human diseases.
Collapse
Affiliation(s)
- Juehan Wang
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
| | - Zixuan Zhang
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Zeyun Lu
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Nicholas Mancuso
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Steven Gazal
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
4
|
Joseph J. Increased Positive Selection in Highly Recombining Genes Does not Necessarily Reflect an Evolutionary Advantage of Recombination. Mol Biol Evol 2024; 41:msae107. [PMID: 38829800 PMCID: PMC11173204 DOI: 10.1093/molbev/msae107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 04/08/2024] [Accepted: 05/28/2024] [Indexed: 06/05/2024] Open
Abstract
It is commonly thought that the long-term advantage of meiotic recombination is to dissipate genetic linkage, allowing natural selection to act independently on different loci. It is thus theoretically expected that genes with higher recombination rates evolve under more effective selection. On the other hand, recombination is often associated with GC-biased gene conversion (gBGC), which theoretically interferes with selection by promoting the fixation of deleterious GC alleles. To test these predictions, several studies assessed whether selection was more effective in highly recombining genes (due to dissipation of genetic linkage) or less effective (due to gBGC), assuming a fixed distribution of fitness effects (DFE) for all genes. In this study, I directly derive the DFE from a gene's evolutionary history (shaped by mutation, selection, drift, and gBGC) under empirical fitness landscapes. I show that genes that have experienced high levels of gBGC are less fit and thus have more opportunities for beneficial mutations. Only a small decrease in the genome-wide intensity of gBGC leads to the fixation of these beneficial mutations, particularly in highly recombining genes. This results in increased positive selection in highly recombining genes that is not caused by more effective selection. Additionally, I show that the death of a recombination hotspot can lead to a higher dN/dS than its birth, but with substitution patterns biased towards AT, and only at selected positions. This shows that controlling for a substitution bias towards GC is therefore not sufficient to rule out the contribution of gBGC to signatures of accelerated evolution. Finally, although gBGC does not affect the fixation probability of GC-conservative mutations, I show that by altering the DFE, gBGC can also significantly affect nonsynonymous GC-conservative substitution patterns.
Collapse
Affiliation(s)
- Julien Joseph
- Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, CNRS, UMR 5558, Villeurbanne, France
| |
Collapse
|
5
|
Laurent R, Gineau L, Utge J, Lafosse S, Phoeung CL, Hegay T, Olaso R, Boland A, Deleuze JF, Toupance B, Heyer E, Leutenegger AL, Chaix R. Measuring the Efficiency of Purging by non-random Mating in Human Populations. Mol Biol Evol 2024; 41:msae094. [PMID: 38839045 PMCID: PMC11184347 DOI: 10.1093/molbev/msae094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 05/07/2024] [Accepted: 05/10/2024] [Indexed: 06/07/2024] Open
Abstract
Human populations harbor a high concentration of deleterious genetic variants. Here, we tested the hypothesis that non-random mating practices affect the distribution of these variants, through exposure in the homozygous state, leading to their purging from the population gene pool. To do so, we produced whole-genome sequencing data for two pairs of Asian populations exhibiting different alliance rules and rates of inbreeding, but with similar effective population sizes. The results show that populations with higher rates of inbred matings do not purge deleterious variants more efficiently. Purging therefore has a low efficiency in human populations, and different mating practices lead to a similar mutational load.
Collapse
Affiliation(s)
- Romain Laurent
- Eco-anthropologie (EA), Muséum National d'Histoire Naturelle, CNRS, Université Paris Cité, 75016 Paris, France
| | - Laure Gineau
- IRD, MERIT, Université Paris Cité, 75006 Paris, France
| | - José Utge
- Eco-anthropologie (EA), Muséum National d'Histoire Naturelle, CNRS, Université Paris Cité, 75016 Paris, France
| | - Sophie Lafosse
- Eco-anthropologie (EA), Muséum National d'Histoire Naturelle, CNRS, Université Paris Cité, 75016 Paris, France
| | | | - Tatyana Hegay
- Laboratory of Genome-cell technology, Institute of Immunology and Human genomics, Academy of Sciences, Tashkent, Uzbekistan
| | - Robert Olaso
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Université Paris-Saclay, 91057, Evry, France
| | - Anne Boland
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Université Paris-Saclay, 91057, Evry, France
| | - Jean-François Deleuze
- Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Université Paris-Saclay, 91057, Evry, France
| | - Bruno Toupance
- Eco-anthropologie (EA), Muséum National d'Histoire Naturelle, CNRS, Université Paris Cité, 75016 Paris, France
- Eco-Anthropologie, Université Paris Cité, 75006 Paris, France
| | - Evelyne Heyer
- Eco-anthropologie (EA), Muséum National d'Histoire Naturelle, CNRS, Université Paris Cité, 75016 Paris, France
| | | | - Raphaëlle Chaix
- Eco-anthropologie (EA), Muséum National d'Histoire Naturelle, CNRS, Université Paris Cité, 75016 Paris, France
| |
Collapse
|
6
|
Aw AJ, Spence JP, Song YS. A SIMPLE AND FLEXIBLE TEST OF SAMPLE EXCHANGEABILITY WITH APPLICATIONS TO STATISTICAL GENOMICS. Ann Appl Stat 2024; 18:858-881. [PMID: 38784669 PMCID: PMC11115382 DOI: 10.1214/23-aoas1817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
In scientific studies involving analyses of multivariate data, basic but important questions often arise for the researcher: Is the sample exchangeable, meaning that the joint distribution of the sample is invariant to the ordering of the units? Are the features independent of one another, or perhaps the features can be grouped so that the groups are mutually independent? In statistical genomics, these considerations are fundamental to downstream tasks such as demographic inference and the construction of polygenic risk scores. We propose a non-parametric approach, which we call the V test, to address these two questions, namely, a test of sample exchangeability given dependency structure of features, and a test of feature independence given sample exchangeability. Our test is conceptually simple, yet fast and flexible. It controls the Type I error across realistic scenarios, and handles data of arbitrary dimensions by leveraging large-sample asymptotics. Through extensive simulations and a comparison against unsupervised tests of stratification based on random matrix theory, we find that our test compares favorably in various scenarios of interest. We apply the test to data from the 1000 Genomes Project, demonstrating how it can be employed to assess exchangeability of the genetic sample, or find optimal linkage disequilibrium (LD) splits for downstream analysis. For exchangeability assessment, we find that removing rare variants can substantially increase the p -value of the test statistic. For optimal LD splitting, the V test reports different optimal splits than previous approaches not relying on hypothesis testing. Software for our methods is available in R (CRAN: flintyR) and Python (PyPI: flintyPy).
Collapse
Affiliation(s)
- Alan J Aw
- Department of Statistics, University of California, Berkeley
| | | | - Yun S Song
- Department of Statistics and Computer Science Division, University of California, Berkeley
| |
Collapse
|
7
|
Kerdoncuff E, Skov L, Patterson N, Zhao W, Lueng YY, Schellenberg GD, Smith JA, Dey S, Ganna A, Dey AB, Kardia SL, Lee J, Moorjani P. 50,000 years of Evolutionary History of India: Insights from ~2,700 Whole Genome Sequences. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.15.580575. [PMID: 38405782 PMCID: PMC10888882 DOI: 10.1101/2024.02.15.580575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
India has been underrepresented in whole genome sequencing studies. We generated 2,762 high coverage genomes from India-including individuals from most geographic regions, speakers of all major languages, and tribal and caste groups-providing a comprehensive survey of genetic variation in India. With these data, we reconstruct the evolutionary history of India through space and time at fine scales. We show that most Indians derive ancestry from three ancestral groups related to ancient Iranian farmers, Eurasian Steppe pastoralists and South Asian hunter-gatherers. We uncover a common source of Iranian-related ancestry from early Neolithic cultures of Central Asia into the ancestors of Ancestral South Indians (ASI), Ancestral North Indians (ANI), Austro-asiatic-related and East Asian-related groups in India. Following these admixtures, India experienced a major demographic shift towards endogamy, resulting in extensive homozygosity and identity-by-descent sharing among individuals. At deep time scales, Indians derive around 1-2% of their ancestry from gene flow from archaic hominins, Neanderthals and Denisovans. By assembling the surviving fragments of archaic ancestry in modern Indians, we recover ~1.5 Gb (or 50%) of the introgressing Neanderthal and ~0.6 Gb (or 20%) of the introgressing Denisovan genomes, more than any other previous archaic ancestry study. Moreover, Indians have the largest variation in Neanderthal ancestry, as well as the highest amount of population-specific Neanderthal segments among worldwide groups. Finally, we demonstrate that most of the genetic variation in Indians stems from a single major migration out of Africa that occurred around 50,000 years ago, with minimal contribution from earlier migration waves. Together, these analyses provide a detailed view of the population history of India and underscore the value of expanding genomic surveys to diverse groups outside Europe.
Collapse
Affiliation(s)
- Elise Kerdoncuff
- Department of Molecular and Cell Biology, University of California, Berkeley, United States of America
| | - Laurits Skov
- Department of Molecular and Cell Biology, University of California, Berkeley, United States of America
| | - Nick Patterson
- Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Wei Zhao
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yuk Yee Lueng
- Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, United States of America
| | - Gerard D. Schellenberg
- Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, United States of America
| | - Jennifer A. Smith
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Sharmistha Dey
- Department of Biophysics, All India Institute of Medical Sciences, New Delhi, India
| | - Andrea Ganna
- Institute for Molecular Medicine Finland, Helsinki, Finland
| | - AB Dey
- Department of Geriatric Medicine, All India Institute of Medical Sciences, New Delhi, India
| | - Sharon L.R. Kardia
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Jinkook Lee
- Department of Economics, and Center for Economic & Social Research, University of Southern California, Los Angeles, California, United States of America
| | - Priya Moorjani
- Department of Molecular and Cell Biology, University of California, Berkeley, United States of America
- Center for Computational Biology, University of California, Berkeley, United States of America
| |
Collapse
|
8
|
Ye Z, Wei W, Pfrender ME, Lynch M. Evolutionary Insights from a Large-Scale Survey of Population-Genomic Variation. Mol Biol Evol 2023; 40:msad233. [PMID: 37863047 PMCID: PMC10630549 DOI: 10.1093/molbev/msad233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 09/11/2023] [Accepted: 10/03/2023] [Indexed: 10/22/2023] Open
Abstract
The field of genomics has ushered in new methods for studying molecular-genetic variation in natural populations. However, most population-genomic studies still rely on small sample sizes (typically, <100 individuals) from single time points, leaving considerable uncertainties with respect to the behavior of relatively young (and rare) alleles and, owing to the large sampling variance of measures of variation, to the specific gene targets of unusually strong selection. Genomic sequences of ∼1,700 haplotypes distributed over a 10-year period from a natural population of the microcrustacean Daphnia pulex reveal evolutionary-genomic features at a refined scale, including previously hidden information on the behavior of rare alleles predicted by recent theory. Background selection, resulting from the recurrent introduction of deleterious alleles, appears to strongly influence the dynamics of neutral alleles, inducing indirect negative selection on rare variants and positive selection on common variants. Temporally fluctuating selection increases the persistence of nonsynonymous alleles with intermediate frequencies, while reducing standing levels of variation at linked silent sites. Combined with the results from an equally large metapopulation survey of the study species, classes of genes that are under strong positive selection can now be confidently identified in this key model organism. Most notable among rapidly evolving Daphnia genes are those associated with ribosomes, mitochondrial functions, sensory systems, and lifespan determination.
Collapse
Affiliation(s)
- Zhiqiang Ye
- Hubei Key Laboratory of Genetic Regulation & Integrative Biology, School of Life Sciences, Central China Normal University, Wuhan 430079, China
| | - Wen Wei
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287, USA
| | - Michael E Pfrender
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Michael Lynch
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287, USA
| |
Collapse
|
9
|
Wang J, Gazal S. Ancestry-specific regulatory and disease architectures are likely due to cell-type-specific gene-by-environment interactions. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.10.20.23297214. [PMID: 37905038 PMCID: PMC10615008 DOI: 10.1101/2023.10.20.23297214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Multi-ancestry genome-wide association studies (GWAS) have highlighted the existence of variants with ancestry-specific effect sizes. Understanding where and why these ancestry-specific effects occur is fundamental to understanding the genetic basis of human diseases and complex traits. Here, we characterized genes differentially expressed across ancestries (ancDE genes) at the cell-type level by leveraging single-cell RNA-seq data in peripheral blood mononuclear cells for 21 individuals with East Asian (EAS) ancestry and 23 individuals with European (EUR) ancestry (172K cells); then, we tested if variants surrounding those genes were enriched in disease variants with ancestry-specific effect sizes by leveraging ancestry-matched GWAS of 31 diseases and complex traits (average N = 90K and 267K in EAS and EUR, respectively). We observed that ancDE genes tend to be cell-type-specific, to be enriched in genes interacting with the environment, and in variants with ancestry-specific disease effect sizes, suggesting the impact of shared cell-type-specific gene-by-environment (GxE) interactions between regulatory and disease architectures. Finally, we illustrated how GxE interactions might have led to ancestry-specific MCL1 expression in B cells, and ancestry-specific allele effect sizes in lymphocyte count GWAS for variants surrounding MCL1. Our results imply that large single-cell and GWAS datasets in diverse populations are required to improve our understanding on the effect of genetic variants on human diseases.
Collapse
Affiliation(s)
- Juehan Wang
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Steven Gazal
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
10
|
González-Peñas J, de Hoyos L, Díaz-Caneja CM, Andreu-Bernabeu Á, Stella C, Gurriarán X, Fañanás L, Bobes J, González-Pinto A, Crespo-Facorro B, Martorell L, Vilella E, Muntané G, Molto MD, Gonzalez-Piqueras JC, Parellada M, Arango C, Costas J. Recent natural selection conferred protection against schizophrenia by non-antagonistic pleiotropy. Sci Rep 2023; 13:15500. [PMID: 37726359 PMCID: PMC10509162 DOI: 10.1038/s41598-023-42578-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 09/12/2023] [Indexed: 09/21/2023] Open
Abstract
Schizophrenia is a debilitating psychiatric disorder associated with a reduced fertility and decreased life expectancy, yet common predisposing variation substantially contributes to the onset of the disorder, which poses an evolutionary paradox. Previous research has suggested balanced selection, a mechanism by which schizophrenia risk alleles could also provide advantages under certain environments, as a reliable explanation. However, recent studies have shown strong evidence against a positive selection of predisposing loci. Furthermore, evolutionary pressures on schizophrenia risk alleles could have changed throughout human history as new environments emerged. Here in this study, we used 1000 Genomes Project data to explore the relationship between schizophrenia predisposing loci and recent natural selection (RNS) signatures after the human diaspora out of Africa around 100,000 years ago on a genome-wide scale. We found evidence for significant enrichment of RNS markers in derived alleles arisen during human evolution conferring protection to schizophrenia. Moreover, both partitioned heritability and gene set enrichment analyses of mapped genes from schizophrenia predisposing loci subject to RNS revealed a lower involvement in brain and neuronal related functions compared to those not subject to RNS. Taken together, our results suggest non-antagonistic pleiotropy as a likely mechanism behind RNS that could explain the persistence of schizophrenia common predisposing variation in human populations due to its association to other non-psychiatric phenotypes.
Collapse
Affiliation(s)
- Javier González-Peñas
- Department of Child and Adolescent Psychiatry, Institute of Psychiatry and Mental Health, Hospital General Universitario Gregorio Marañón, Calle Ibiza, 43, 28009, Madrid, Spain.
- Instituto de Investigación Sanitaria Gregorio Marañón (IiSGM), Madrid, Spain.
- CIBERSAM, Centro Investigación Biomédica en Red Salud Mental, Madrid, Spain.
| | - Lucía de Hoyos
- Department of Child and Adolescent Psychiatry, Institute of Psychiatry and Mental Health, Hospital General Universitario Gregorio Marañón, Calle Ibiza, 43, 28009, Madrid, Spain
- Instituto de Investigación Sanitaria Gregorio Marañón (IiSGM), Madrid, Spain
- Language and Genetics Department, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Covadonga M Díaz-Caneja
- Department of Child and Adolescent Psychiatry, Institute of Psychiatry and Mental Health, Hospital General Universitario Gregorio Marañón, Calle Ibiza, 43, 28009, Madrid, Spain
- Instituto de Investigación Sanitaria Gregorio Marañón (IiSGM), Madrid, Spain
- CIBERSAM, Centro Investigación Biomédica en Red Salud Mental, Madrid, Spain
- School of Medicine, Universidad Complutense, Madrid, Spain
| | - Álvaro Andreu-Bernabeu
- Department of Child and Adolescent Psychiatry, Institute of Psychiatry and Mental Health, Hospital General Universitario Gregorio Marañón, Calle Ibiza, 43, 28009, Madrid, Spain
- Instituto de Investigación Sanitaria Gregorio Marañón (IiSGM), Madrid, Spain
- School of Medicine, Universidad Complutense, Madrid, Spain
| | - Carol Stella
- Department of Child and Adolescent Psychiatry, Institute of Psychiatry and Mental Health, Hospital General Universitario Gregorio Marañón, Calle Ibiza, 43, 28009, Madrid, Spain
- Instituto de Investigación Sanitaria Gregorio Marañón (IiSGM), Madrid, Spain
| | - Xaquín Gurriarán
- Department of Child and Adolescent Psychiatry, Institute of Psychiatry and Mental Health, Hospital General Universitario Gregorio Marañón, Calle Ibiza, 43, 28009, Madrid, Spain
- Instituto de Investigación Sanitaria Gregorio Marañón (IiSGM), Madrid, Spain
| | - Lourdes Fañanás
- CIBERSAM, Centro Investigación Biomédica en Red Salud Mental, Madrid, Spain
- Department of Evolutionary Biology, Ecology and Environmental Sciences, Faculty of Biology, University of Barcelona, Barcelona, Spain
| | - Julio Bobes
- CIBERSAM, Centro Investigación Biomédica en Red Salud Mental, Madrid, Spain
- Faculty of Medicine and Health Sciences - Psychiatry, Universidad de Oviedo, ISPA, INEUROPA, Oviedo, Spain
| | - Ana González-Pinto
- CIBERSAM, Centro Investigación Biomédica en Red Salud Mental, Madrid, Spain
- BIOARABA Health Research Institute, OSI Araba, University Hospital, University of the Basque Country, Vitoria, Spain
| | - Benedicto Crespo-Facorro
- CIBERSAM, Centro Investigación Biomédica en Red Salud Mental, Madrid, Spain
- Department of Psychiatry, Hospital Universitario Virgen del Rocío, Universidad de Sevilla, Seville, Spain
| | - Lourdes Martorell
- CIBERSAM, Centro Investigación Biomédica en Red Salud Mental, Madrid, Spain
- Hospital Universitari Institut Pere Mata, IISPV, Universitat Rovira I Virgili, Reus, Spain
| | - Elisabet Vilella
- CIBERSAM, Centro Investigación Biomédica en Red Salud Mental, Madrid, Spain
- Hospital Universitari Institut Pere Mata, IISPV, Universitat Rovira I Virgili, Reus, Spain
| | - Gerard Muntané
- CIBERSAM, Centro Investigación Biomédica en Red Salud Mental, Madrid, Spain
- Hospital Universitari Institut Pere Mata, IISPV, Universitat Rovira I Virgili, Reus, Spain
| | - María Dolores Molto
- CIBERSAM, Centro Investigación Biomédica en Red Salud Mental, Madrid, Spain
- Department of Genetics, University of Valencia, Campus of Burjassot, Valencia, Spain
- Department of Medicine, Universitat de València, Valencia, Spain
| | - Jose Carlos Gonzalez-Piqueras
- CIBERSAM, Centro Investigación Biomédica en Red Salud Mental, Madrid, Spain
- Department of Medicine, Universitat de València, Valencia, Spain
- Fundación Investigación Hospital Clínico de Valencia, INCLIVA, 46010, Valencia, Spain
| | - Mara Parellada
- Department of Child and Adolescent Psychiatry, Institute of Psychiatry and Mental Health, Hospital General Universitario Gregorio Marañón, Calle Ibiza, 43, 28009, Madrid, Spain
- Instituto de Investigación Sanitaria Gregorio Marañón (IiSGM), Madrid, Spain
- CIBERSAM, Centro Investigación Biomédica en Red Salud Mental, Madrid, Spain
- School of Medicine, Universidad Complutense, Madrid, Spain
| | - Celso Arango
- Department of Child and Adolescent Psychiatry, Institute of Psychiatry and Mental Health, Hospital General Universitario Gregorio Marañón, Calle Ibiza, 43, 28009, Madrid, Spain
- Instituto de Investigación Sanitaria Gregorio Marañón (IiSGM), Madrid, Spain
- CIBERSAM, Centro Investigación Biomédica en Red Salud Mental, Madrid, Spain
- School of Medicine, Universidad Complutense, Madrid, Spain
| | - Javier Costas
- Instituto de Investigación Sanitaria (IDIS) de Santiago de Compostela, Complexo Hospitalario Universitario de Santiago de Compostela (CHUS), Servizo Galego de Saúde (SERGAS), Santiago de Compostela, Galicia, Spain
| |
Collapse
|
11
|
Ye Z, Wei W, Pfrender M, Lynch M. Evolutionary Insights from a Large-scale Survey of Population-genomic Variation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.03.539276. [PMID: 37205430 PMCID: PMC10187179 DOI: 10.1101/2023.05.03.539276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Results from data on > 1000 haplotypes distributed over a nine-year period from a natural population of the microcrustacean Daphnia pulex reveal evolutionary-genomic features at a refined scale, including key population-genetic properties that are obscured in studies with smaller sample sizes. Background selection, resulting from the recurrent introduction of deleterious alleles, appears to strongly influence the dynamics of neutral alleles, inducing indirect negative selection on rare variants and positive selection on common variants. Fluctuating selection increases the persistence of nonsynonymous alleles with intermediate frequencies, while reducing standing levels of variation at linked silent sites. Combined with the results from an equally large metapopulation survey of the study species, regions of gene structure that are under strong purifying selection and classes of genes that are under strong positive selection in this key species can be confidently identified. Most notable among rapidly evolving Daphnia genes are those associated with ribosomes, mitochondrial functions, sensory systems, and lifespan determination.
Collapse
Affiliation(s)
- Zhiqiang Ye
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287
| | - Wen Wei
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287
| | - Michael Pfrender
- Department of Biological Sciences, Notre Dame University, Notre Dame, IN 46556
| | - Michael Lynch
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85287
| |
Collapse
|
12
|
Papageorgiou L, Papakonstantinou E, Diakou I, Pierouli K, Dragoumani K, Bacopoulou F, Chrousos GP, Eliopoulos E, Vlachakis D. Semantic and Population Analysis of the Genetic Targets Related to COVID-19 and Its Association with Genes and Diseases. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2023; 1423:59-78. [PMID: 37525033 DOI: 10.1007/978-3-031-31978-5_6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/02/2023]
Abstract
SARS-CoV-2 is a coronavirus responsible for one of the most serious, modern worldwide pandemics, with lasting and multifaceted effects. By late 2021, SARS-CoV-2 has infected more than 180 million people and has killed more than 3 million. The virus gains entrance to human cells through binding to ACE2 via its surface spike protein and causes a complex disease of the respiratory system, termed COVID-19. Vaccination efforts are being made to hinder the viral spread, and therapeutics are currently under development. Toward this goal, scientific attention is shifting toward variants and SNPs that affect factors of the disease such as susceptibility and severity. This genomic grammar, tightly related to the dark part of our genome, can be explored through the use of modern methods such as natural language processing. We present a semantic analysis of SARS-CoV-2-related publications, which yielded a repertoire of SNPs, genes, and disease ontologies. Population data from the 1000 Genomes Project were subsequently integrated into the pipeline. Data mining approaches of this scale have the potential to elucidate the complex interaction between COVID-19 pathogenesis and host genetic variation; the resulting knowledge can facilitate the management of high-risk groups and aid the efforts toward precision medicine.
Collapse
Affiliation(s)
- Louis Papageorgiou
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, Athens, Greece
| | - Eleni Papakonstantinou
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, Athens, Greece
| | - Io Diakou
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, Athens, Greece
| | - Katerina Pierouli
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, Athens, Greece
| | - Konstantina Dragoumani
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, Athens, Greece
| | - Flora Bacopoulou
- University Research Institute of Maternal and Child Health & Precision Medicine, National and Kapodistrian University of Athens, "Aghia Sophia" Children's Hospital, Athens, Greece
| | - George P Chrousos
- University Research Institute of Maternal and Child Health & Precision Medicine, National and Kapodistrian University of Athens, "Aghia Sophia" Children's Hospital, Athens, Greece
| | - Elias Eliopoulos
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, Athens, Greece
| | - Dimitrios Vlachakis
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, Athens, Greece.
- University Research Institute of Maternal and Child Health & Precision Medicine, National and Kapodistrian University of Athens, "Aghia Sophia" Children's Hospital, Athens, Greece.
- Division of Endocrinology and Metabolism, Center of Clinical, Experimental Surgery and Translational Research, Biomedical Research Foundation of the Academy of Athens, Athens, Greece.
| |
Collapse
|
13
|
Li B, Aouizerat BE, Cheng Y, Anastos K, Justice AC, Zhao H, Xu K. Incorporating local ancestry improves identification of ancestry-associated methylation signatures and meQTLs in African Americans. Commun Biol 2022; 5:401. [PMID: 35488087 PMCID: PMC9054854 DOI: 10.1038/s42003-022-03353-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 04/11/2022] [Indexed: 12/03/2022] Open
Abstract
Here we report three epigenome-wide association studies (EWAS) of DNA methylation on self-reported race, global genetic ancestry, and local genetic ancestry in admixed Americans from three sets of samples, including internal and external replications (Ntotal = 1224). Our EWAS on local ancestry (LA) identified the largest number of ancestry-associated DNA methylation sites and also featured the highest replication rate. Furthermore, by incorporating ancestry origins of genetic variations, we identified 36 methylation quantitative trait loci (meQTL) clumps for LA-associated CpGs that cannot be captured by a model that assumes identical genetic effects across ancestry origins. Lead SNPs at 152 meQTL clumps had significantly different genetic effects in the context of an African or European ancestry background. Local ancestry information enables superior capture of ancestry-associated methylation signatures and identification of ancestry-specific genetic effects on DNA methylation. These findings highlight the importance of incorporating local ancestry for EWAS in admixed samples from multi-ancestry cohorts.
Collapse
Affiliation(s)
- Boyang Li
- Department of Biostatistics, School of Public Health, Yale University, New Haven, CT, United States
- VA Connecticut Healthcare System, US Department of Veterans Affairs, West Haven, CT, United States
| | - Bradley E Aouizerat
- Bluestone Center for Clinical Research, New York University, New York, NY, United States
- Department of Oral and Maxillofacial Surgery, New York University, New York, NY, United States
| | - Youshu Cheng
- Department of Biostatistics, School of Public Health, Yale University, New Haven, CT, United States
| | - Kathryn Anastos
- Division of General Internal Medicine, Albert Einstein College of Medicine, Montefiore Health System, Bronx, NY, United States
| | - Amy C Justice
- VA Connecticut Healthcare System, US Department of Veterans Affairs, West Haven, CT, United States
- Department of Health Policy and Management, Yale University, New Haven, CT, United States
| | - Hongyu Zhao
- Department of Biostatistics, School of Public Health, Yale University, New Haven, CT, United States.
- VA Connecticut Healthcare System, US Department of Veterans Affairs, West Haven, CT, United States.
| | - Ke Xu
- VA Connecticut Healthcare System, US Department of Veterans Affairs, West Haven, CT, United States.
- Department of Psychiatry, School of Medicine, Yale University, New Haven, CT, United States.
| |
Collapse
|
14
|
Colomer-Vilaplana A, Murga-Moreno J, Canalda-Baltrons A, Inserte C, Soto D, Coronado-Zamora M, Barbadilla A, Casillas S. PopHumanVar: an interactive application for the functional characterization and prioritization of adaptive genomic variants in humans. Nucleic Acids Res 2022; 50:D1069-D1076. [PMID: 34664660 PMCID: PMC8728255 DOI: 10.1093/nar/gkab925] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Revised: 09/17/2021] [Accepted: 09/28/2021] [Indexed: 12/22/2022] Open
Abstract
Adaptive challenges that humans faced as they expanded across the globe left specific molecular footprints that can be decoded in our today's genomes. Different sets of metrics are used to identify genomic regions that have undergone selection. However, there are fewer methods capable of pinpointing the allele ultimately responsible for this selection. Here, we present PopHumanVar, an interactive online application that is designed to facilitate the exploration and thorough analysis of candidate genomic regions by integrating both functional and population genomics data currently available. PopHumanVar generates useful summary reports of prioritized variants that are putatively causal of recent selective sweeps. It compiles data and graphically represents different layers of information, including natural selection statistics, as well as functional annotations and genealogical estimations of variant age, for biallelic single nucleotide variants (SNVs) of the 1000 Genomes Project phase 3. Specifically, PopHumanVar amasses SNV-based information from GEVA, SnpEFF, GWAS Catalog, ClinVar, RegulomeDB and DisGeNET databases, as well as accurate estimations of iHS, nSL and iSAFE statistics. Notably, PopHumanVar can successfully identify known causal variants of frequently reported candidate selection regions, including EDAR in East-Asians, ACKR1 (DARC) in Africans and LCT/MCM6 in Europeans. PopHumanVar is open and freely available at https://pophumanvar.uab.cat.
Collapse
Affiliation(s)
- Aina Colomer-Vilaplana
- Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
| | - Jesús Murga-Moreno
- Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
| | - Aleix Canalda-Baltrons
- Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
| | - Clara Inserte
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
| | - Daniel Soto
- Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
| | - Marta Coronado-Zamora
- Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
| | - Antonio Barbadilla
- Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
| | - Sònia Casillas
- Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
| |
Collapse
|
15
|
Zhang QS, Goudet J, Weir BS. Rank-invariant estimation of inbreeding coefficients. Heredity (Edinb) 2022; 128:1-10. [PMID: 34824382 PMCID: PMC8733021 DOI: 10.1038/s41437-021-00471-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 09/05/2021] [Accepted: 09/05/2021] [Indexed: 11/18/2022] Open
Abstract
The two alleles an individual carries at a locus are identical by descent (ibd) if they have descended from a single ancestral allele in a reference population, and the probability of such identity is the inbreeding coefficient of the individual. Inbreeding coefficients can be predicted from pedigrees with founders constituting the reference population, but estimation from genetic data is not possible without data from the reference population. Most inbreeding estimators that make explicit use of sample allele frequencies as estimates of allele probabilities in the reference population are confounded by average kinships with other individuals. This means that the ranking of those estimates depends on the scope of the study sample and we show the variation in rankings for common estimators applied to different subdivisions of 1000 Genomes data. Allele-sharing estimators of within-population inbreeding relative to average kinship in a study sample, however, do have invariant rankings across all studies including those individuals. They are unbiased with a large number of SNPs. We discuss how allele sharing estimates are the relevant quantities for a range of empirical applications.
Collapse
Affiliation(s)
- Qian S Zhang
- Department of Biostatistics, University of Washington, Seattle, WA, 98195-1617, USA
| | - Jérôme Goudet
- Department of Ecology and Evolution, University of Lausanne, CH-1015, Lausanne, Switzerland
| | - Bruce S Weir
- Department of Biostatistics, University of Washington, Seattle, WA, 98195-1617, USA.
| |
Collapse
|
16
|
Garcia JA, Lohmueller KE. Negative linkage disequilibrium between amino acid changing variants reveals interference among deleterious mutations in the human genome. PLoS Genet 2021; 17:e1009676. [PMID: 34319975 PMCID: PMC8351996 DOI: 10.1371/journal.pgen.1009676] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 08/09/2021] [Accepted: 06/22/2021] [Indexed: 11/18/2022] Open
Abstract
Evolutionary forces like Hill-Robertson interference and negative epistasis can lead to deleterious mutations being found on distinct haplotypes. However, the extent to which these forces depend on the selection and dominance coefficients of deleterious mutations and shape genome-wide patterns of linkage disequilibrium (LD) in natural populations with complex demographic histories has not been tested. In this study, we first used forward-in-time simulations to predict how negative selection impacts LD. Under models where deleterious mutations have additive effects on fitness, deleterious variants less than 10 kb apart tend to be carried on different haplotypes relative to pairs of synonymous SNPs. In contrast, for recessive mutations, there is no consistent ordering of how selection coefficients affect LD decay, due to the complex interplay of different evolutionary effects. We then examined empirical data of modern humans from the 1000 Genomes Project. LD between derived alleles at nonsynonymous SNPs is lower compared to pairs of derived synonymous variants, suggesting that nonsynonymous derived alleles tend to occur on different haplotypes more than synonymous variants. This result holds when controlling for potential confounding factors by matching SNPs for frequency in the sample (allele count), physical distance, magnitude of background selection, and genetic distance between pairs of variants. Lastly, we introduce a new statistic HR(j) which allows us to detect interference using unphased genotypes. Application of this approach to high-coverage human genome sequences confirms our finding that nonsynonymous derived alleles tend to be located on different haplotypes more often than are synonymous derived alleles. Our findings suggest that interference may play a pervasive role in shaping patterns of LD between deleterious variants in the human genome, and consequently influences genome-wide patterns of LD.
Collapse
Affiliation(s)
- Jesse A. Garcia
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, California, United States of America
| | - Kirk E. Lohmueller
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, California, United States of America
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, United States of America
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California, United States of America
| |
Collapse
|
17
|
Zabad S, Ragsdale AP, Sun R, Li Y, Gravel S. Assumptions about frequency-dependent architectures of complex traits bias measures of functional enrichment. Genet Epidemiol 2021; 45:621-632. [PMID: 34157784 DOI: 10.1002/gepi.22388] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 04/10/2021] [Accepted: 05/04/2021] [Indexed: 11/05/2022]
Abstract
Linkage-Disequilibrium Score Regression (LDSC) is a popular framework for analyzing Genome-wide Association Studies (GWAS) summary statistics that allows for estimating single nucleotide polymorphism heritability, confounding, and functional enrichment of genetic variants with different annotations. Recent work has highlighted the influence of implicit and explicit assumptions of the model on the biological interpretation of the results. In this study, we explored a formulation of LDSC that replaces the r 2 measure of LD with a recently proposed unbiased estimator of the D 2 statistic. In addition to modest statistical difference across estimators, this derivation highlighted implicit and unrealistic assumptions about the relationship between allele frequency, effect size, and annotation status. We carry out a systematic comparison of alternative LDSC formulations by applying them to summary statistics from 47 GWAS traits. Our results show that commonly used models likely underestimate functional enrichment. These results highlight the importance of calibrating the LDSC model to achieve a more robust understanding of polygenic traits.
Collapse
Affiliation(s)
- Shadi Zabad
- Department of Computer Science, McGill University, Montreal, Quebec, Canada
| | - Aaron P Ragsdale
- Department of Human Genetics, McGill University, Montreal, Quebec, Canada
| | - Rosie Sun
- Department of Human Genetics, McGill University, Montreal, Quebec, Canada
| | - Yue Li
- Department of Computer Science, McGill University, Montreal, Quebec, Canada.,Quantitative Life Science, McGill University, Montreal, Quebec, Canada
| | - Simon Gravel
- Department of Human Genetics, McGill University, Montreal, Quebec, Canada.,Quantitative Life Science, McGill University, Montreal, Quebec, Canada
| |
Collapse
|
18
|
Tilot AK, Khramtsova EA, Liang D, Grasby KL, Jahanshad N, Painter J, Colodro-Conde L, Bralten J, Hibar DP, Lind PA, Liu S, Brotman SM, Thompson PM, Medland SE, Macciardi F, Stranger BE, Davis LK, Fisher SE, Stein JL. The Evolutionary History of Common Genetic Variants Influencing Human Cortical Surface Area. Cereb Cortex 2021; 31:1873-1887. [PMID: 33290510 PMCID: PMC7945014 DOI: 10.1093/cercor/bhaa327] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 10/09/2020] [Accepted: 10/09/2020] [Indexed: 12/15/2022] Open
Abstract
Structural brain changes along the lineage leading to modern Homo sapiens contributed to our distinctive cognitive and social abilities. However, the evolutionarily relevant molecular variants impacting key aspects of neuroanatomy are largely unknown. Here, we integrate evolutionary annotations of the genome at diverse timescales with common variant associations from large-scale neuroimaging genetic screens. We find that alleles with evidence of recent positive polygenic selection over the past 2000-3000 years are associated with increased surface area (SA) of the entire cortex, as well as specific regions, including those involved in spoken language and visual processing. Therefore, polygenic selective pressures impact the structure of specific cortical areas even over relatively recent timescales. Moreover, common sequence variation within human gained enhancers active in the prenatal cortex is associated with postnatal global SA. We show that such variation modulates the function of a regulatory element of the developmentally relevant transcription factor HEY2 in human neural progenitor cells and is associated with structural changes in the inferior frontal cortex. These results indicate that non-coding genomic regions active during prenatal cortical development are involved in the evolution of human brain structure and identify novel regulatory elements and genes impacting modern human brain structure.
Collapse
Affiliation(s)
- Amanda K Tilot
- Language and Genetics Department, Max Planck Institute for Psycholinguistics, Nijmegen, 6500 AH, Netherlands
- Mark and Mary Stevens Neuroimaging and Informatics Institute, Keck School of Medicine, University of Southern California, Marina del Rey, CA 90292, USA
| | - Ekaterina A Khramtsova
- Department of Medicine, Section of Genetic Medicine & Institute for Genomics and Systems Biology, University of Chicago, Chicago, IL 60637, USA
- Computational Sciences, Janssen Pharmaceuticals, Spring House, PA 19477, USA
| | - Dan Liang
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA
- UNC Neuroscience Center, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Katrina L Grasby
- Psychiatric Genetics, QIMR Berghofer Medical Research Institute, Brisbane, QLD 4006, Australia
| | - Neda Jahanshad
- Mark and Mary Stevens Neuroimaging and Informatics Institute, Keck School of Medicine, University of Southern California, Marina del Rey, CA 90292, USA
| | - Jodie Painter
- Psychiatric Genetics, QIMR Berghofer Medical Research Institute, Brisbane, QLD 4006, Australia
| | - Lucía Colodro-Conde
- Psychiatric Genetics, QIMR Berghofer Medical Research Institute, Brisbane, QLD 4006, Australia
| | - Janita Bralten
- Radboud University Medical Center, 6525 XZ Nijmegen, Netherlands
| | | | - Penelope A Lind
- Psychiatric Genetics, QIMR Berghofer Medical Research Institute, Brisbane, QLD 4006, Australia
| | - Siyao Liu
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA
- UNC Neuroscience Center, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Sarah M Brotman
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA
- UNC Neuroscience Center, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Paul M Thompson
- Mark and Mary Stevens Neuroimaging and Informatics Institute, Keck School of Medicine, University of Southern California, Marina del Rey, CA 90292, USA
| | - Sarah E Medland
- Psychiatric Genetics, QIMR Berghofer Medical Research Institute, Brisbane, QLD 4006, Australia
| | - Fabio Macciardi
- Department of Psychiatry and Human Behavior, University of California, Irvine, CA 92697, USA
| | - Barbara E Stranger
- Department of Medicine, Section of Genetic Medicine & Institute for Genomics and Systems Biology, University of Chicago, Chicago, IL 60637, USA
- Department of Pharmacology, Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Lea K Davis
- Department of Medicine, Division of Medical Genetics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Vanderbilt University Medical Center, Vanderbilt Genetics Institute, Nashville, TN 37232, USA
| | - Simon E Fisher
- Language and Genetics Department, Max Planck Institute for Psycholinguistics, Nijmegen, 6500 AH, Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, 6500 HB, Netherlands
| | - Jason L Stein
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA
- UNC Neuroscience Center, University of North Carolina, Chapel Hill, NC 27599, USA
| |
Collapse
|
19
|
Napolioni V, Scelsi MA, Khan RR, Altmann A, Greicius MD. Recent Consanguinity and Outbred Autozygosity Are Associated With Increased Risk of Late-Onset Alzheimer's Disease. Front Genet 2021; 11:629373. [PMID: 33584820 PMCID: PMC7879576 DOI: 10.3389/fgene.2020.629373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2020] [Accepted: 12/31/2020] [Indexed: 11/13/2022] Open
Abstract
Prior work in late-onset Alzheimer's disease (LOAD) has resulted in discrepant findings as to whether recent consanguinity and outbred autozygosity are associated with LOAD risk. In the current study, we tested the association between consanguinity and outbred autozygosity with LOAD in the largest such analysis to date, in which 20 LOAD GWAS datasets were retrieved through public databases. Our analyses were restricted to eight distinct ethnic groups: African-Caribbean, Ashkenazi-Jewish European, European-Caribbean, French-Canadian, Finnish European, North-Western European, South-Eastern European, and Yoruba African for a total of 21,492 unrelated subjects (11,196 LOAD and 10,296 controls). Recent consanguinity determination was performed using FSuite v1.0.3, according to subjects' ancestral background. The level of autozygosity in the outbred population was assessed by calculating inbreeding estimates based on the proportion (FROH) and the number (NROH) of runs of homozygosity (ROHs). We analyzed all eight ethnic groups using a fixed-effect meta-analysis, which showed a significant association of recent consanguinity with LOAD (N = 21,481; OR = 1.262, P = 3.6 × 10-4), independently of APOE ∗4 (N = 21,468, OR = 1.237, P = 0.002), and years of education (N = 9,257; OR = 1.274, P = 0.020). Autozygosity in the outbred population was also associated with an increased risk of LOAD, both for F ROH (N = 20,237; OR = 1.204, P = 0.030) and N ROH metrics (N = 20,237; OR = 1.019, P = 0.006), independently of APOE ∗4 [(F ROH, N = 20,225; OR = 1.222, P = 0.029) (N ROH, N = 20,225; OR = 1.019, P = 0.007)]. By leveraging the Alzheimer's Disease Sequencing Project (ADSP) whole-exome sequencing (WES) data, we determined that LOAD subjects do not show an enrichment of rare, risk-enhancing minor homozygote variants compared to the control population. A two-stage recessive GWAS using ADSP data from 201 consanguineous subjects in the discovery phase followed by validation in 10,469 subjects led to the identification of RPH3AL p.A303V (rs117190076) as a rare minor homozygote variant increasing the risk of LOAD [discovery: Genotype Relative Risk (GRR) = 46, P = 2.16 × 10-6; validation: GRR = 1.9, P = 8.0 × 10-4]. These results confirm that recent consanguinity and autozygosity in the outbred population increase risk for LOAD. Subsequent work, with increased samples sizes of consanguineous subjects, should accelerate the discovery of non-additive genetic effects in LOAD.
Collapse
Affiliation(s)
- Valerio Napolioni
- Genomic and Molecular Epidemiology (GAME) Lab, School of Biosciences and Veterinary Medicine, University of Camerino, Camerino, Italy
| | - Marzia A. Scelsi
- Computational Biology in Imaging and Genetics (COMBINE) Lab, Centre for Medical Image Computing, Department of Medical Physics and Biomedical Engineering, University College London, London, United Kingdom
| | - Raiyan R. Khan
- Department of Computer Science, Columbia University, New York, NY, United States
| | - Andre Altmann
- Computational Biology in Imaging and Genetics (COMBINE) Lab, Centre for Medical Image Computing, Department of Medical Physics and Biomedical Engineering, University College London, London, United Kingdom
| | - Michael D. Greicius
- Functional Imaging in Neuropsychiatric Disorders (FIND) Lab, Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Stanford, CA, United States
| |
Collapse
|
20
|
Svensson D, Rentoft M, Dahlin AM, Lundholm E, Olason PI, Sjödin A, Nylander C, Melin BS, Trygg J, Johansson E. A whole-genome sequenced control population in northern Sweden reveals subregional genetic differences. PLoS One 2020; 15:e0237721. [PMID: 32915809 PMCID: PMC7485808 DOI: 10.1371/journal.pone.0237721] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Accepted: 07/31/2020] [Indexed: 12/30/2022] Open
Abstract
The number of national reference populations that are whole-genome sequenced are rapidly increasing. Partly driving this development is the fact that genetic disease studies benefit from knowing the genetic variation typical for the geographical area of interest. A whole-genome sequenced Swedish national reference population (n = 1000) has been recently published but with few samples from northern Sweden. In the present study we have whole-genome sequenced a control population (n = 300) (ACpop) from Västerbotten County, a sparsely populated region in northern Sweden previously shown to be genetically different from southern Sweden. The aggregated variant frequencies within ACpop are publicly available (DOI 10.17044/NBIS/G000005) to function as a basic resource in clinical genetics and for genetic studies. Our analysis of ACpop, representing approximately 0.11% of the population in Västerbotten, indicates the presence of a genetic substructure within the county. Furthermore, a demographic analysis showed that the population from which samples were drawn was to a large extent geographically stationary, a finding that was corroborated in the genetic analysis down to the level of municipalities. Including ACpop in the reference population when imputing unknown variants in a Västerbotten cohort resulted in a strong increase in the number of high-confidence imputed variants (up to 81% for variants with minor allele frequency < 5%). ACpop was initially designed for cancer disease studies, but the genetic structure within the cohort will be of general interest for all genetic disease studies in northern Sweden.
Collapse
Affiliation(s)
- Daniel Svensson
- Department of Chemistry, Computational Life Science Cluster, Umeå University, Umeå, Sweden
| | - Matilda Rentoft
- Department of Medical Biochemistry and Biophysics, Umeå University, Umeå, Sweden
| | - Anna M. Dahlin
- Department of Radiation Sciences, Oncology, Umeå University, Umeå, Sweden
| | - Emma Lundholm
- Centre for Demography and Ageing, Umeå University, Umeå, Sweden
| | - Pall I. Olason
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - Andreas Sjödin
- Department of Chemistry, Computational Life Science Cluster, Umeå University, Umeå, Sweden
- Division of CBRN Security and Defence, FOI–Swedish Defence Research Agency, Umeå, Sweden
| | - Carin Nylander
- Department of Radiation Sciences, Oncology, Umeå University, Umeå, Sweden
| | - Beatrice S. Melin
- Department of Radiation Sciences, Oncology, Umeå University, Umeå, Sweden
| | - Johan Trygg
- Department of Chemistry, Computational Life Science Cluster, Umeå University, Umeå, Sweden
| | - Erik Johansson
- Department of Medical Biochemistry and Biophysics, Umeå University, Umeå, Sweden
- * E-mail:
| |
Collapse
|
21
|
Ullah E, Aupetit M, Das A, Patil A, Al Muftah N, Rawi R, Saad M, Bensmail H. KinVis: a visualization tool to detect cryptic relatedness in genetic datasets. Bioinformatics 2020; 35:2683-2685. [PMID: 30590437 DOI: 10.1093/bioinformatics/bty1028] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2018] [Revised: 10/20/2018] [Accepted: 12/17/2018] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION It is important to characterize individual relatedness in terms of familial relationships and underlying population structure in genome-wide association studies for correct downstream analysis. The characterization of individual relatedness becomes vital if the cohort is to be used as reference panel in other studies for association tests and for identifying ethnic diversities. In this paper, we propose a kinship visualization tool to detect cryptic relatedness between subjects. We utilize multi-dimensional scaling, bar charts, heat maps and node-link visualizations to enable analysis of relatedness information. AVAILABILITY AND IMPLEMENTATION Available online as well as can be downloaded at http://shiny-vis.qcri.org/public/kinvis/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ehsan Ullah
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Michaël Aupetit
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Arun Das
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Abhishek Patil
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Noora Al Muftah
- Department of Computational Biology and Quantitative Genetics, Harvard School of Public Health, Boston, MA, USA
| | - Reda Rawi
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar.,Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Mohamad Saad
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Halima Bensmail
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| |
Collapse
|
22
|
Mehrpour S, Rodrigues CR, Ferreira RC, Briones MRDS, Oliveira ASB. Hardy-Weinberg Equilibrium in different mitochondrial haplogroups of four genes associated with neuroprotection and neurodegeneration. ARQUIVOS DE NEURO-PSIQUIATRIA 2020; 78:269-276. [PMID: 32490968 DOI: 10.1590/0004-282x20200002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 12/09/2019] [Indexed: 11/21/2022]
Abstract
BACKGROUND Malfunctioning or damaged mitochondria result in altered energy metabolism, redox equilibrium, and cellular dynamics and is a central point in the pathogenesis of neurological disorders such as Alzheimer's disease, Parkinson's disease, Huntington's disease and Amyotrophic Lateral Sclerosis. Therefore, it is of utmost importance to identify mitochondrial genetic susceptibility markers for neurodegenerative diseases. Potential markers include the respiratory chain enzymes Riboflavin kinase (RFK), Flavin adenine dinucleotide synthetase (FAD), Succinate dehydrogenase B subunit (SDHB), and Cytochrome C1 (CYC1). These enzymes are associated with neuroprotection and neurodegeneration. OBJECTIVE To test if variants in genes RFK, FAD, SDHB and CYC1 deviate from Hardy-Weinberg Equilibrium (HWE) in different human mitochondrial haplogroups. METHODS Sequence variants in genes RFK, FAD, SDHB and CYC1 of 2,504 non-affected individuals of the 1,000 genomes project were used for mitochondrial haplogroup assessment and HWE calculations in different mitochondrial haplogroups. RESULTS We show that RFK variants deviate from HWE in haplogroups G, H, L, V and W, variants of FAD in haplogroups B, J, L, U, and C, variants of SDHB in relation to the C, W, and A and CYC1 variants in B, L, U, D, and T. HWE deviation indicates action of selective pressures and genetic drift. CONCLUSIONS HWE deviation of particular variants in relation to global populational HWE, could be, at least in part, associated with the differential susceptibility of specific populations and ethnicities to neurodegenerative diseases. Our data might contribute to the epidemiology and diagnostic/prognostic methods for neurodegenerative diseases.
Collapse
Affiliation(s)
- Sheida Mehrpour
- Departamento de Neurologia e Neurocirurgia, Universidade Federal de São Paulo, São Paulo, SP, Brazil
| | - Camila Ronqui Rodrigues
- Departamento de Microbiologia, Imunologia e Parasitologia, Universidade Federal de São Paulo, São Paulo, SP, Brazil
| | - Renata Carmona Ferreira
- Departamento de Neurologia e Neurocirurgia, Universidade Federal de São Paulo, São Paulo, SP, Brazil.,Bridges Genomics, São Paulo SP, Brazil
| | | | | |
Collapse
|
23
|
Garcia-Erill G, Albrechtsen A. Evaluation of model fit of inferred admixture proportions. Mol Ecol Resour 2020; 20:936-949. [PMID: 32323416 DOI: 10.1111/1755-0998.13171] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2019] [Revised: 03/11/2020] [Accepted: 04/15/2020] [Indexed: 12/12/2022]
Abstract
Model based methods for genetic clustering of individuals, such as those implemented in structure or ADMIXTURE, allow the user to infer individual ancestries and study population structure. The underlying model makes several assumptions about the demographic history that shaped the analysed genetic data. One assumption is that all individuals are a result of K homogeneous ancestral populations that are all well represented in the data, while another assumption is that no drift happened after the admixture event. The histories of many real world populations do not conform to that model, and in that case taking the inferred admixture proportions at face value might be misleading. We propose a method to evaluate the fit of admixture models based on estimating the correlation of the residual difference between the true genotypes and the genotypes predicted by the model. When the model assumptions are not violated, the residuals from a pair of individuals are not correlated. In the case of a bad fitting admixture model, individuals with similar demographic histories have a positive correlation of their residuals. Using simulated and real data, we show how the method is able to detect a bad fit of inferred admixture proportions due to using an insufficient number of clusters K or to demographic histories that deviate significantly from the admixture model assumptions, such as admixture from ghost populations, drift after admixture events and nondiscrete ancestral populations. We have implemented the method as an open source software that can be applied to both unphased genotypes and low depth sequencing data.
Collapse
Affiliation(s)
- Genís Garcia-Erill
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen N, Denmark
| | - Anders Albrechtsen
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen N, Denmark
| |
Collapse
|
24
|
Murga-Moreno J, Coronado-Zamora M, Hervas S, Casillas S, Barbadilla A. iMKT: the integrative McDonald and Kreitman test. Nucleic Acids Res 2020; 47:W283-W288. [PMID: 31081014 PMCID: PMC6602517 DOI: 10.1093/nar/gkz372] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Revised: 04/18/2019] [Accepted: 05/03/2019] [Indexed: 01/07/2023] Open
Abstract
The McDonald and Kreitman test (MKT) is one of the most powerful and widely used methods to detect and quantify recurrent natural selection using DNA sequence data. Here we present iMKT (acronym for integrative McDonald and Kreitman test), a novel web-based service performing four distinct MKT types. It allows the detection and estimation of four different selection regimes −adaptive, neutral, strongly deleterious and weakly deleterious− acting on any genomic sequence. iMKT can analyze both user's own population genomic data and pre-loaded Drosophila melanogaster and human sequences of protein-coding genes obtained from the largest population genomic datasets to date. Advanced options in the website allow testing complex hypotheses such as the application example showed here: do genes located in high recombination regions undergo higher rates of adaptation? We aim that iMKT will become a reference site tool for the study of evolutionary adaptation in massive population genomics datasets, especially in Drosophila and humans. iMKT is a free resource online at https://imkt.uab.cat.
Collapse
Affiliation(s)
- Jesús Murga-Moreno
- Institut de Biotecnologia i de Biomedicina and Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| | - Marta Coronado-Zamora
- Institut de Biotecnologia i de Biomedicina and Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| | - Sergi Hervas
- Institut de Biotecnologia i de Biomedicina and Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| | - Sònia Casillas
- Institut de Biotecnologia i de Biomedicina and Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| | - Antonio Barbadilla
- Institut de Biotecnologia i de Biomedicina and Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| |
Collapse
|
25
|
Abramovs N, Brass A, Tassabehji M. Hardy-Weinberg Equilibrium in the Large Scale Genomic Sequencing Era. Front Genet 2020; 11:210. [PMID: 32231685 PMCID: PMC7083100 DOI: 10.3389/fgene.2020.00210] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Accepted: 02/21/2020] [Indexed: 12/21/2022] Open
Abstract
Hardy-Weinberg Equilibrium (HWE) is used to estimate the number of homozygous and heterozygous variant carriers based on its allele frequency in populations that are not evolving. Deviations from HWE in large population databases have been used to detect genotyping errors, which can result in extreme heterozygote excess (HetExc). However, HetExc might also be a sign of natural selection since recessive disease causing variants should occur less frequently in a homozygous state in the population, but may reach high allele frequency in a heterozygous state, especially if they are advantageous. We developed a filtering strategy to detect these variants and applied it on genome data from 137,842 individuals. The main limitations of this approach were quality of genotype calls and insufficient population sizes, whereas population structure and inbreeding can reduce sensitivity, but not precision, in certain populations. Nevertheless, we identified 161 HetExc variants in 149 genes, most of which were specific to African/African American populations (∼79.5%). Although the majority of them were not associated with known diseases, or were classified as clinically "benign," they were enriched in genes associated with autosomal recessive diseases. The resulting dataset also contained two known recessive disease causing variants with evidence of heterozygote advantage in the sickle-cell anemia (HBB) and cystic fibrosis (CFTR). Finally, we provide supporting in silico evidence of a novel heterozygote advantageous variant in the chromodomain helicase DNA binding protein 6 gene (CHD6; involved in influenza virus replication). We anticipate that our approach will aid the detection of rare recessive disease causing variants in the future.
Collapse
Affiliation(s)
- Nikita Abramovs
- School of Computer Science, University of Manchester, Manchester, United Kingdom
- Faculty of Biology, Medicine and Health, School of Biological Sciences, University of Manchester, Manchester, United Kingdom
| | - Andrew Brass
- School of Computer Science, University of Manchester, Manchester, United Kingdom
- Faculty of Biology, Medicine and Health, School of Health Sciences, University of Manchester, Manchester, United Kingdom
| | - May Tassabehji
- Faculty of Biology, Medicine and Health, School of Biological Sciences, University of Manchester, Manchester, United Kingdom
- Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester Academic Health Sciences Centre (MAHSC), Manchester, United Kingdom
| |
Collapse
|
26
|
Dandine-Roulland C, Laurent R, Dall'Ara I, Toupance B, Chaix R. Genomic evidence for MHC disassortative mating in humans. Proc Biol Sci 2020; 286:20182664. [PMID: 30890093 DOI: 10.1098/rspb.2018.2664] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Although pervasive in many animal species, the evidence for major histocompatibility complex (MHC) disassortative mating in humans remains inconsistent across studies. Here, to revisit this issue, we analyse dense genotype data for 883 European and Middle Eastern couples. To distinguish MHC-specific effects from socio-cultural confounders, the pattern of relatedness between spouses in the MHC region is compared to the rest of the genome. Couples from Israel exhibit no significant pattern of relatedness across the MHC region, whereas across the genome, they are more similar than random pairs of individuals, which may reflect social homogamy and/or cousin marriages. On the other hand, couples from The Netherlands and more generally from Northern Europe are significantly more MHC-dissimilar than random pairs of individuals, and this pattern of dissimilarity is extreme when compared with the rest of the genome. Our findings support the hypothesis that the MHC influences mate choice in humans in a context-dependent way: MHC-driven preferences may exist in all populations but, in some populations, social constraints over mate choice may reduce the ability of individuals to rely on such biological cues when choosing their mates.
Collapse
Affiliation(s)
- Claire Dandine-Roulland
- Eco-Anthropologie, UMR 7206, CNRS, MNHN, Université Paris Diderot , Sorbonne Paris Cité, Paris , France
| | - Romain Laurent
- Eco-Anthropologie, UMR 7206, CNRS, MNHN, Université Paris Diderot , Sorbonne Paris Cité, Paris , France
| | - Irene Dall'Ara
- Eco-Anthropologie, UMR 7206, CNRS, MNHN, Université Paris Diderot , Sorbonne Paris Cité, Paris , France
| | - Bruno Toupance
- Eco-Anthropologie, UMR 7206, CNRS, MNHN, Université Paris Diderot , Sorbonne Paris Cité, Paris , France
| | - Raphaëlle Chaix
- Eco-Anthropologie, UMR 7206, CNRS, MNHN, Université Paris Diderot , Sorbonne Paris Cité, Paris , France
| |
Collapse
|
27
|
The genetic history of France. Eur J Hum Genet 2020; 28:853-865. [PMID: 32042083 DOI: 10.1038/s41431-020-0584-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Revised: 11/25/2019] [Accepted: 01/28/2020] [Indexed: 12/15/2022] Open
Abstract
The study of the genetic structure of different countries within Europe has provided significant insights into their demographic history and population structure. Although France occupies a particular location at the western part of Europe and at the crossroads of migration routes, few population genetic studies have been conducted so far with genome-wide data. In this study, we analyzed SNP-chip genetic data from 2184 individuals born in France who were enrolled in two independent population cohorts. Using FineSTRUCTURE, six different genetic clusters of individuals were found that were very consistent between the two cohorts. These clusters correspond closely to geographic, historical, and linguistic divisions of France, and contain different proportions of ancestry from Stone and Bronze Age populations. By modeling the relationship between genetics and geography using EEMS, we were able to detect gene flow barriers that are similar across the two cohorts and correspond to major rivers and mountain ranges. Estimations of effective population sizes also revealed very similar patterns in both cohorts with a rapid increase of effective population sizes over the last 150 generations similar to other European countries. A marked bottleneck is also consistently seen in the two datasets starting in the 14th century when the Black Death raged in Europe. In conclusion, by performing the first exhaustive study of the genetic structure of France, we fill a gap in genetic studies of Europe that will be useful to medical geneticists, historians, and archeologists.
Collapse
|
28
|
Shen F, Kidd JM. Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2. Genes (Basel) 2020; 11:genes11020141. [PMID: 32013076 PMCID: PMC7073954 DOI: 10.3390/genes11020141] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Revised: 01/21/2020] [Accepted: 01/24/2020] [Indexed: 12/22/2022] Open
Abstract
Gene duplication is a major mechanism for the evolution of gene novelty, and copy-number variation makes a major contribution to inter-individual genetic diversity. However, most approaches for studying copy-number variation rely upon uniquely mapping reads to a genome reference and are unable to distinguish among duplicated sequences. Specialized approaches to interrogate specific paralogs are comparatively slow and have a high degree of computational complexity, limiting their effective application to emerging population-scale data sets. We present QuicK-mer2, a self-contained, mapping-free approach that enables the rapid construction of paralog-specific copy-number maps from short-read sequence data. This approach is based on the tabulation of unique k-mer sequences from short-read data sets, and is able to analyze a 20X coverage human genome in approximately 20 min. We applied our approach to newly released sequence data from the 1000 Genomes Project, constructed paralog-specific copy-number maps from 2457 unrelated individuals, and uncovered copy-number variation of paralogous genes. We identify nine genes where none of the analyzed samples have a copy number of two, 92 genes where the majority of samples have a copy number other than two, and describe rare copy number variation effecting multiple genes at the APOBEC3 locus.
Collapse
Affiliation(s)
- Feichen Shen
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA;
| | - Jeffrey M. Kidd
- Department of Human Genetics and Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Correspondence:
| |
Collapse
|
29
|
Ly G, Laurent R, Lafosse S, Monidarin C, Diffloth G, Bourdier F, Evrard O, Toupance B, Pavard S, Chaix R. From matrimonial practices to genetic diversity in Southeast Asian populations: the signature of the matrilineal puzzle. Philos Trans R Soc Lond B Biol Sci 2019; 374:20180434. [PMID: 31303171 PMCID: PMC6664126 DOI: 10.1098/rstb.2018.0434] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/17/2019] [Indexed: 01/27/2023] Open
Abstract
In matrilineal populations, the descent group affiliation is transmitted by women whereas the socio-political power frequently remains in the hands of men. This situation, named the 'matrilineal puzzle', is expected to promote local endogamy as a coping mechanism allowing men to maintain their decision-making power over their natal descent group. In this paper, we revisit this 'matrilineal puzzle' from a population genetics' point of view. Indeed, such tendency for local endogamy in matrilineal populations is expected to increase their genetic inbreeding and generate isolation-by-distance patterns between villages. To test this hypothesis, we collected ethno-demographic data for 3261 couples and high-density genetic data for 675 individuals from 11 Southeast Asian populations with a wide range of social organizations: matrilineal and matrilocal populations (M), patrilineal and patrilocal populations (P) or cognatic populations with predominant matrilocal residence (C). We observed that M and C populations have higher levels of village endogamy than P populations, and that such higher village endogamy leads to higher genetic inbreeding. M populations also exhibit isolation-by-distance patterns between villages. We interpret such genetic patterns as the signature of the 'matrilineal puzzle'. Notably, our results suggest that any form of matrilocal marriage (whatever the descent rule is) increases village endogamy. These findings suggest that male dominance, when combined with matrilocality, constrains inter-village migrations, and constitutes an underexplored cultural process shaping genetic patterns in human populations. This article is part of the theme issue 'The evolution of female-biased kinship in humans and other mammals'.
Collapse
Affiliation(s)
- Goki Ly
- Unité Eco-anthropologie (EA), Muséum National d'Histoire Naturelle, CNRS, Université de Paris, 17 place du Trocadéro, 75016 Paris, France
| | - Romain Laurent
- Unité Eco-anthropologie (EA), Muséum National d'Histoire Naturelle, CNRS, Université de Paris, 17 place du Trocadéro, 75016 Paris, France
| | - Sophie Lafosse
- Unité Eco-anthropologie (EA), Muséum National d'Histoire Naturelle, CNRS, Université de Paris, 17 place du Trocadéro, 75016 Paris, France
| | - Chou Monidarin
- Rodolphe Merieux Laboratory and Faculty of Pharmacy of University of Health Sciences, Phnom Penh, Cambodia
| | | | - Frédéric Bourdier
- Unité 201 Développement et Sociétés (DEVSOC), IEDES/IRD, Panthéon Sorbonne, Paris, France
| | - Olivier Evrard
- Unité Patrimoines Locaux et Gouvernance (PALOC), Muséum National d'Histoire Naturelle, CNRS, IRD, 75006 Paris, France
| | - Bruno Toupance
- Unité Eco-anthropologie (EA), Muséum National d'Histoire Naturelle, CNRS, Université de Paris, 17 place du Trocadéro, 75016 Paris, France
| | - Samuel Pavard
- Unité Eco-anthropologie (EA), Muséum National d'Histoire Naturelle, CNRS, Université de Paris, 17 place du Trocadéro, 75016 Paris, France
| | - Raphaëlle Chaix
- Unité Eco-anthropologie (EA), Muséum National d'Histoire Naturelle, CNRS, Université de Paris, 17 place du Trocadéro, 75016 Paris, France
| |
Collapse
|
30
|
Casillas S, Mulet R, Villegas-Mirón P, Hervas S, Sanz E, Velasco D, Bertranpetit J, Laayouni H, Barbadilla A. PopHuman: the human population genomics browser. Nucleic Acids Res 2019; 46:D1003-D1010. [PMID: 29059408 PMCID: PMC5753332 DOI: 10.1093/nar/gkx943] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Accepted: 10/04/2017] [Indexed: 12/17/2022] Open
Abstract
The 1000 Genomes Project (1000GP) represents the most comprehensive world-wide nucleotide variation data set so far in humans, providing the sequencing and analysis of 2504 genomes from 26 populations and reporting >84 million variants. The availability of this sequence data provides the human lineage with an invaluable resource for population genomics studies, allowing the testing of molecular population genetics hypotheses and eventually the understanding of the evolutionary dynamics of genetic variation in human populations. Here we present PopHuman, a new population genomics-oriented genome browser based on JBrowse that allows the interactive visualization and retrieval of an extensive inventory of population genetics metrics. Efficient and reliable parameter estimates have been computed using a novel pipeline that faces the unique features and limitations of the 1000GP data, and include a battery of nucleotide variation measures, divergence and linkage disequilibrium parameters, as well as different tests of neutrality, estimated in non-overlapping windows along the chromosomes and in annotated genes for all 26 populations of the 1000GP. PopHuman is open and freely available at http://pophuman.uab.cat.
Collapse
Affiliation(s)
- Sònia Casillas
- Institut de Biotecnologia i de Biomedicina and Department de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
- To whom correspondence should be addressed. Sònia Casillas. Tel: +34 93 5868958; Fax: +34 93 5812011; . Correspondence may also be addressed to Antonio Barbadilla.
| | - Roger Mulet
- Institut de Biotecnologia i de Biomedicina and Department de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| | - Pablo Villegas-Mirón
- Institute of Evolutionary Biology (UPF-CSIC), Universitat Pompeu Fabra, Doctor Aiguader 88 (PRBB), 08003 Barcelona, Catalonia, Spain
| | - Sergi Hervas
- Institut de Biotecnologia i de Biomedicina and Department de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| | - Esteve Sanz
- Servei de Genòmica i Bioinformàtica, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| | - Daniel Velasco
- Institut de Biotecnologia i de Biomedicina and Department de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| | - Jaume Bertranpetit
- Institute of Evolutionary Biology (UPF-CSIC), Universitat Pompeu Fabra, Doctor Aiguader 88 (PRBB), 08003 Barcelona, Catalonia, Spain
| | - Hafid Laayouni
- Institute of Evolutionary Biology (UPF-CSIC), Universitat Pompeu Fabra, Doctor Aiguader 88 (PRBB), 08003 Barcelona, Catalonia, Spain
- Bioinformatics Studies, ESCI-UPF, Pg. Pujades 1, 08003 Barcelona, Spain
| | - Antonio Barbadilla
- Institut de Biotecnologia i de Biomedicina and Department de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
- Servei de Genòmica i Bioinformàtica, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
- To whom correspondence should be addressed. Sònia Casillas. Tel: +34 93 5868958; Fax: +34 93 5812011; . Correspondence may also be addressed to Antonio Barbadilla.
| |
Collapse
|
31
|
Frei O, Holland D, Smeland OB, Shadrin AA, Fan CC, Maeland S, O'Connell KS, Wang Y, Djurovic S, Thompson WK, Andreassen OA, Dale AM. Bivariate causal mixture model quantifies polygenic overlap between complex traits beyond genetic correlation. Nat Commun 2019; 10:2417. [PMID: 31160569 PMCID: PMC6547727 DOI: 10.1038/s41467-019-10310-0] [Citation(s) in RCA: 231] [Impact Index Per Article: 38.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Accepted: 04/29/2019] [Indexed: 12/13/2022] Open
Abstract
Accumulating evidence from genome wide association studies (GWAS) suggests an abundance of shared genetic influences among complex human traits and disorders, such as mental disorders. Here we introduce a statistical tool, MiXeR, which quantifies polygenic overlap irrespective of genetic correlation, using GWAS summary statistics. MiXeR results are presented as a Venn diagram of unique and shared polygenic components across traits. At 90% of SNP-heritability explained for each phenotype, MiXeR estimates that 8.3 K variants causally influence schizophrenia and 6.4 K influence bipolar disorder. Among these variants, 6.2 K are shared between the disorders, which have a high genetic correlation. Further, MiXeR uncovers polygenic overlap between schizophrenia and educational attainment. Despite a genetic correlation close to zero, the phenotypes share 8.3 K causal variants, while 2.5 K additional variants influence only educational attainment. By considering the polygenicity, discoverability and heritability of complex phenotypes, MiXeR analysis may improve our understanding of cross-trait genetic architectures.
Collapse
Affiliation(s)
- Oleksandr Frei
- NORMENT, KG Jebsen Centre for Psychosis Research, Institute of Clinical Medicine, University of Oslo, 0424, Oslo, Norway.
| | - Dominic Holland
- Center for Multimodal Imaging and Genetics, University of California at San Diego, La Jolla, CA, 92037, USA.,Department of Neurosciences, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Olav B Smeland
- NORMENT, KG Jebsen Centre for Psychosis Research, Institute of Clinical Medicine, University of Oslo, 0424, Oslo, Norway.,Division of Mental Health and Addiction, Oslo University Hospital, 0407, Oslo, Norway
| | - Alexey A Shadrin
- NORMENT, KG Jebsen Centre for Psychosis Research, Institute of Clinical Medicine, University of Oslo, 0424, Oslo, Norway
| | - Chun Chieh Fan
- Center for Multimodal Imaging and Genetics, University of California at San Diego, La Jolla, CA, 92037, USA.,Department of Cognitive Sciences, University of California at San Diego, La Jolla, CA, 92093, USA.,Department of Radiology, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Steffen Maeland
- NORMENT, KG Jebsen Centre for Psychosis Research, Institute of Clinical Medicine, University of Oslo, 0424, Oslo, Norway
| | - Kevin S O'Connell
- NORMENT, KG Jebsen Centre for Psychosis Research, Institute of Clinical Medicine, University of Oslo, 0424, Oslo, Norway
| | - Yunpeng Wang
- NORMENT, KG Jebsen Centre for Psychosis Research, Institute of Clinical Medicine, University of Oslo, 0424, Oslo, Norway.,Center for Multimodal Imaging and Genetics, University of California at San Diego, La Jolla, CA, 92037, USA.,Department of Radiology, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Srdjan Djurovic
- Department of Medical Genetics, Oslo University Hospital, 0424, Oslo, Norway.,NORMENT, KG Jebsen Centre for Psychosis Research, Department of Clinical Science, University of Bergen, 5020, Bergen, Norway
| | - Wesley K Thompson
- Department of Family Medicine and Public Health, University of California, San Diego, La Jolla, CA, 92093, USA.,Institute of Biological Psychiatry, Mental Health Center Sct. Hans, Capital Region of Denmark, Roskilde, 4000, Denmark
| | - Ole A Andreassen
- NORMENT, KG Jebsen Centre for Psychosis Research, Institute of Clinical Medicine, University of Oslo, 0424, Oslo, Norway.,Division of Mental Health and Addiction, Oslo University Hospital, 0407, Oslo, Norway
| | - Anders M Dale
- Center for Multimodal Imaging and Genetics, University of California at San Diego, La Jolla, CA, 92037, USA. .,Department of Neurosciences, University of California, San Diego, La Jolla, CA, 92093, USA. .,Department of Radiology, University of California, San Diego, La Jolla, CA, 92093, USA. .,Department of Psychiatry, University of California, San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
32
|
Hujoel MLA, Gazal S, Hormozdiari F, van de Geijn B, Price AL. Disease Heritability Enrichment of Regulatory Elements Is Concentrated in Elements with Ancient Sequence Age and Conserved Function across Species. Am J Hum Genet 2019; 104:611-624. [PMID: 30905396 PMCID: PMC6451699 DOI: 10.1016/j.ajhg.2019.02.008] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Accepted: 02/05/2019] [Indexed: 02/06/2023] Open
Abstract
Regulatory elements, e.g., enhancers and promoters, have been widely reported to be enriched for disease and complex trait heritability. We investigated how this enrichment varies with the age of the underlying genome sequence, the conservation of regulatory function across species, and the target gene of the regulatory element. We estimated heritability enrichment by applying stratified LD score regression to summary statistics from 41 independent diseases and complex traits (average N = 320K) and meta-analyzing results across traits. Enrichment of human putative enhancers and promoters was larger in elements with older sequence age, assessed via alignment with other species irrespective of conserved functionality: putative enhancer elements with ancient sequence age (older than the split between marsupial and placental mammals) were 8.8× enriched (versus 2.5× for all putative enhancers; p = 3e-14), and promoter elements with ancient sequence age were 13.5× enriched (versus 5.1× for all promoters; p = 5e-16). Enrichment of human putative enhancers and promoters was also larger in elements whose regulatory function was conserved across species, e.g., human putative enhancers that were enhancers in ≥5 of 9 other mammals were 4.6× enriched (p = 5e-12 versus all putative enhancers). Enrichment of human promoters was larger in promoters of loss-of-function intolerant genes: 12.0× enrichment (p = 8e-15 versus all promoters). The mean value of several measures of negative selection within these genomic annotations mirrored all of these findings. Notably, the annotations with these excess heritability enrichments were jointly significant conditional on each other and on our baseline-LD model, which includes a broad set of coding, conserved, regulatory, and LD-related annotations.
Collapse
Affiliation(s)
- Margaux L A Hujoel
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Division of Biostatistics, Dana-Farber Cancer Institute, Boston, MA 02215, USA.
| | - Steven Gazal
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Farhad Hormozdiari
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Bryce van de Geijn
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Alkes L Price
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| |
Collapse
|
33
|
Genotype Fingerprints Enable Fast and Private Comparison of Genetic Testing Results for Research and Direct-to-Consumer Applications. Genes (Basel) 2018; 9:genes9100481. [PMID: 30287784 PMCID: PMC6209914 DOI: 10.3390/genes9100481] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Revised: 09/29/2018] [Accepted: 10/02/2018] [Indexed: 11/18/2022] Open
Abstract
Genetic testing has expanded out of the research laboratory into medical practice and the direct-to-consumer market. Rapid analysis of the resulting genotype data now has a significant impact. We present a method for summarizing personal genotypes as ‘genotype fingerprints’ that meets these needs. Genotype fingerprints can be derived from any single nucleotide polymorphism-based assay, and remain comparable as chip designs evolve to higher marker densities. We demonstrate that these fingerprints support distinguishing types of relationships among closely related individuals and closely related individuals from individuals from the same background population, as well as high-throughput identification of identical genotypes, individuals in known background populations, and de novo separation of subpopulations within a large cohort through extremely rapid comparisons. Although fingerprints do not preserve anonymity, they provide a useful degree of privacy by summarizing a genotype while preventing reconstruction of individual marker states. Genotype fingerprints are therefore well-suited as a format for public aggregation of genetic information to support ancestry and relatedness determination without revealing personal health risk status.
Collapse
|
34
|
Wallace AD, Wendt GA, Barcellos LF, de Smith AJ, Walsh KM, Metayer C, Costello JF, Wiemels JL, Francis SS. To ERV Is Human: A Phenotype-Wide Scan Linking Polymorphic Human Endogenous Retrovirus-K Insertions to Complex Phenotypes. Front Genet 2018; 9:298. [PMID: 30154825 PMCID: PMC6102640 DOI: 10.3389/fgene.2018.00298] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Accepted: 07/16/2018] [Indexed: 12/13/2022] Open
Abstract
Approximately 8% of the human genome is comprised of endogenous retroviral insertions (ERVs) originating from historic retroviral integration into germ cells. The function of ERVs as regulators of gene expression is well established. Less well studied are insertional polymorphisms of ERVs and their contribution to the heritability of complex phenotypes. The most recent integration of ERV, HERV-K, is expressed in a range of complex human conditions from cancer to neurologic diseases. Using an in-house computational pipeline and whole-genome sequencing data from the diverse 1,000 Genomes Phase 3 population (n = 2,504), we identified 46 polymorphic HERV-K insertions that are tagged by adjacent single nucleotide polymorphisms (SNPs). To test the potential role of polymorphic HERV-K in the heritability of complex diseases, existing databases were queried for enrichment of established relationships between the HERV-K insertion-associated SNPs (hiSNPs), and tissue specific gene expression and disease phenotypes. Overall, hiSNPs for the 46 polymorphic HERV-K sites were statistically enriched (p < 1.0E-16) for eQTLs across 44 human tissues. Fifteen of the 46 HERV-K insertions had hiSNPs annotated in the EMBL-EBI GWAS Catalog and cumulatively associated with >100 phenotypes. Experimental factor ontology enrichment analysis suggests that polymorphic HERV-K specifically contribute to neurologic and immunologic disease phenotypes, including traits related to intra cranial volume (FDR 2.00E-09), Parkinson's disease (FDR 1.80E-09), and autoimmune diseases (FDR 1.80E-09). These results provide strong candidates for context-specific study of polymorphic HERV-K insertions in disease-related traits, serving as a roadmap for future studies of the heritability of complex disease.
Collapse
Affiliation(s)
- Amelia D Wallace
- Division of Epidemiology, School of Public Health, University of California, Berkeley, Berkeley, CA, United States
| | - George A Wendt
- Division of Epidemiology, School of Community Health Sciences, University of Nevada, Reno, NV, United States
| | - Lisa F Barcellos
- Division of Epidemiology, School of Public Health, University of California, Berkeley, Berkeley, CA, United States
| | - Adam J de Smith
- Department of Epidemiology and Biostatistics, Helen Diller Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, United States
| | - Kyle M Walsh
- Department of Neurosurgery, Duke University, Durham, NC, United States
| | - Catherine Metayer
- Division of Epidemiology, School of Public Health, University of California, Berkeley, Berkeley, CA, United States
| | - Joseph F Costello
- Department of Neurosurgery, Helen Diller Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, United States
| | - Joseph L Wiemels
- Department of Epidemiology and Biostatistics, Helen Diller Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, United States.,Department of Neurosurgery, Helen Diller Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, United States
| | - Stephen S Francis
- Division of Epidemiology, School of Community Health Sciences, University of Nevada, Reno, NV, United States.,Department of Epidemiology and Biostatistics, Helen Diller Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, United States
| |
Collapse
|
35
|
Marchi N, Mennecier P, Georges M, Lafosse S, Hegay T, Dorzhu C, Chichlo B, Ségurel L, Heyer E. Close inbreeding and low genetic diversity in Inner Asian human populations despite geographical exogamy. Sci Rep 2018; 8:9397. [PMID: 29925873 PMCID: PMC6010435 DOI: 10.1038/s41598-018-27047-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 04/13/2018] [Indexed: 01/12/2023] Open
Abstract
When closely related individuals mate, they produce inbred offspring, which often have lower fitness than outbred ones. Geographical exogamy, by favouring matings between distant individuals, is thought to be an inbreeding avoidance mechanism; however, no data has clearly tested this prediction. Here, we took advantage of the diversity of matrimonial systems in humans to explore the impact of geographical exogamy on genetic diversity and inbreeding. We collected ethno-demographic data for 1,344 individuals in 16 populations from two Inner Asian cultural groups with contrasting dispersal behaviours (Turko-Mongols and Indo-Iranians) and genotyped genome-wide single nucleotide polymorphisms in 503 individuals. We estimated the population exogamy rate and confirmed the expected dispersal differences: Turko-Mongols are geographically more exogamous than Indo-Iranians. Unexpectedly, across populations, exogamy patterns correlated neither with the proportion of inbred individuals nor with their genetic diversity. Even more surprisingly, among Turko-Mongols, descendants from exogamous couples were significantly more inbred than descendants from endogamous couples, except for large distances (>40 km). Overall, 37% of the descendants from exogamous couples were closely inbred. This suggests that in Inner Asia, geographical exogamy is neither efficient in increasing genetic diversity nor in avoiding inbreeding, which might be due to kinship endogamy despite the occurrence of dispersal.
Collapse
Affiliation(s)
- Nina Marchi
- Eco-anthropologie et Ethnobiologie, UMR 7206 CNRS, MNHN, Univ Paris Diderot, Sorbonne Paris Cité, Sorbonne Universités, 75016, Paris, France.
| | - Philippe Mennecier
- Eco-anthropologie et Ethnobiologie, UMR 7206 CNRS, MNHN, Univ Paris Diderot, Sorbonne Paris Cité, Sorbonne Universités, 75016, Paris, France
| | - Myriam Georges
- Eco-anthropologie et Ethnobiologie, UMR 7206 CNRS, MNHN, Univ Paris Diderot, Sorbonne Paris Cité, Sorbonne Universités, 75016, Paris, France.,LM2E-UMR6197, Laboratoire de Microbiologie des Environnements Extrêmes, Institut Universitaire Européen de la Mer, Technopôle Brest-Iroise, Plouzane, 29280, France
| | - Sophie Lafosse
- Eco-anthropologie et Ethnobiologie, UMR 7206 CNRS, MNHN, Univ Paris Diderot, Sorbonne Paris Cité, Sorbonne Universités, 75016, Paris, France
| | - Tatyana Hegay
- Republican Scientific Center of Immunology, Ministry of Public Health, Tashkent, 100060, Uzbekistan
| | - Choduraa Dorzhu
- Department of biology and ecology, Tuvan State University, Kyzyl, 667000, Russia
| | - Boris Chichlo
- Eco-anthropologie et Ethnobiologie, UMR 7206 CNRS, MNHN, Univ Paris Diderot, Sorbonne Paris Cité, Sorbonne Universités, 75016, Paris, France
| | - Laure Ségurel
- Eco-anthropologie et Ethnobiologie, UMR 7206 CNRS, MNHN, Univ Paris Diderot, Sorbonne Paris Cité, Sorbonne Universités, 75016, Paris, France
| | - Evelyne Heyer
- Eco-anthropologie et Ethnobiologie, UMR 7206 CNRS, MNHN, Univ Paris Diderot, Sorbonne Paris Cité, Sorbonne Universités, 75016, Paris, France.
| |
Collapse
|
36
|
Schlauch D, Fier H, Lange C. Identification of genetic outliers due to sub-structure and cryptic relationships. Bioinformatics 2018; 33:1972-1979. [PMID: 28334167 DOI: 10.1093/bioinformatics/btx109] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2016] [Accepted: 02/21/2017] [Indexed: 01/29/2023] Open
Abstract
Motivation In order to minimize the effects of genetic confounding on the analysis of high-throughput genetic association studies, e.g. (whole-genome) sequencing (WGS) studies, genome-wide association studies (GWAS), etc., we propose a general framework to assess and to test formally for genetic heterogeneity among study subjects. As the approach fully utilizes the recent ancestor information captured by rare variants, it is especially powerful in WGS studies. Even for relatively moderate sample sizes, the proposed testing framework is able to identify study subjects that are genetically too similar, e.g. cryptic relationships, or that are genetically too different, e.g. population substructure. The approach is computationally fast, enabling the application to whole-genome sequencing data, and straightforward to implement. Results Simulation studies illustrate the overall performance of our approach. In an application to the 1000 Genomes Project, we outline an analysis/cleaning pipeline that utilizes our approach to formally assess whether study subjects are related and whether population substructure is present. In the analysis of the 1000 Genomes Project data, our approach revealed subjects that are most likely related, but had previously passed standard qc-filters. Availability and Implementation An implementation of our method, Similarity Test for Estimating Genetic Outliers (STEGO), is available in the R package stego from Github at https://github.com/dschlauch/stego . Contact dschlauch@fas.harvard.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Daniel Schlauch
- Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA 02115, USA.,Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02115, USA
| | - Heide Fier
- Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA 02115, USA.,Department of Genomic Mathematics, University of Bonn, Bonn, Germany
| | - Christoph Lange
- Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA 02115, USA.,Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA
| |
Collapse
|
37
|
Abstract
The cavity system of the inner ear—the so-called bony labyrinth—houses the senses of balance and hearing. This structure is embedded in dense petrous bone, fully formed by birth and generally well preserved in human skeletal remains, thus providing a rich source of morphological information about past populations. Here we show that labyrinthine morphology tracks genetic distances and geography in an isolation-by-distance model with dispersal from Africa. Because petrous bones have become prime targets of ancient DNA recovery, we propose that all destructive studies first acquire high-resolution 3D computed-tomography data prior to any invasive sampling. Such data will constitute an important archive of morphological variation in past and present populations, and will permit individual-based genotype–phenotype comparisons. The dispersal of modern humans from Africa is now well documented with genetic data that track population history, as well as gene flow between populations. Phenetic skeletal data, such as cranial and pelvic morphologies, also exhibit a dispersal-from-Africa signal, which, however, tends to be blurred by the effects of local adaptation and in vivo phenotypic plasticity, and that is often deteriorated by postmortem damage to skeletal remains. These complexities raise the question of which skeletal structures most effectively track neutral population history. The cavity system of the inner ear (the so-called bony labyrinth) is a good candidate structure for such analyses. It is already fully formed by birth, which minimizes postnatal phenotypic plasticity, and it is generally well preserved in archaeological samples. Here we use morphometric data of the bony labyrinth to show that it is a surprisingly good marker of the global dispersal of modern humans from Africa. Labyrinthine morphology tracks genetic distances and geography in accordance with an isolation-by-distance model with dispersal from Africa. Our data further indicate that the neutral-like pattern of variation is compatible with stabilizing selection on labyrinth morphology. Given the increasingly important role of the petrous bone for ancient DNA recovery from archaeological specimens, we encourage researchers to acquire 3D morphological data of the inner ear structures before any invasive sampling. Such data will constitute an important archive of phenotypic variation in present and past populations, and will permit individual-based genotype–phenotype comparisons.
Collapse
|
38
|
Grant RC, Denroche RE, Borgida A, Virtanen C, Cook N, Smith AL, Connor AA, Wilson JM, Peterson G, Roberts NJ, Klein AP, Grimmond SM, Biankin A, Cleary S, Moore M, Lemire M, Zogopoulos G, Stein L, Gallinger S. Exome-Wide Association Study of Pancreatic Cancer Risk. Gastroenterology 2018; 154:719-722.e3. [PMID: 29074453 PMCID: PMC5811358 DOI: 10.1053/j.gastro.2017.10.015] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/10/2017] [Revised: 10/04/2017] [Accepted: 10/12/2017] [Indexed: 12/20/2022]
Abstract
We conducted a case-control exome-wide association study to discover germline variants in coding regions that affect risk for pancreatic cancer, combining data from 5 studies. We analyzed exome and genome sequencing data from 437 patients with pancreatic cancer (cases) and 1922 individuals not known to have cancer (controls). In the primary analysis, BRCA2 had the strongest enrichment for rare inactivating variants (17/437 cases vs 3/1922 controls) (P = 3.27x10-6; exome-wide statistical significance threshold P < 2.5x10-6). Cases had more rare inactivating variants in DNA repair genes than controls, even after excluding 13 genes known to predispose to pancreatic cancer (adjusted odds ratio, 1.35; P = .045). At the suggestive threshold (P < .001), 6 genes were enriched for rare damaging variants (UHMK1, AP1G2, DNTA, CHST6, FGFR3, and EPHA1) and 7 genes had associations with pancreatic cancer risk, based on the sequence-kernel association test. We confirmed variants in BRCA2 as the most common high-penetrant genetic factor associated with pancreatic cancer and we also identified candidate pancreatic cancer genes. Large collaborations and novel approaches are needed to overcome the genetic heterogeneity of pancreatic cancer predisposition.
Collapse
Affiliation(s)
| | | | | | | | - Natalie Cook
- Princess Margaret Genomics Centre, Toronto, Canada
| | - Alyssa L Smith
- Research Institute of the McGill University Health Centre, Montreal, Canada
| | | | | | - Gloria Peterson
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota
| | - Nicholas J Roberts
- Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins Medical Institutions, Baltimore, Maryland
| | - Alison P Klein
- Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins Medical Institutions, Baltimore, Maryland; Department of Pathology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins Medical Institutions, Baltimore, Maryland
| | - Sean M Grimmond
- University of Melbourne Centre for Cancer Research, Victorian Comprehensive Cancer Centre, Melbourne, Australia
| | - Andrew Biankin
- Wohl Cancer Research Centre, Institute of, Cancer Sciences, University of Glasgow, Glasgow, United Kingdom; West of Scotland Pancreatic Unit, Glasgow Royal Infirmary, Glasgow, United Kingdom; South Western Sydney Clinical School, Faculty of Medicine, University of NSW, Liverpool, Australia
| | - Sean Cleary
- Ontario Institute for Cancer Research, Toronto, Canada; Ontario Pancreas Cancer Study, Toronto, Canada
| | | | | | - George Zogopoulos
- Research Institute of the McGill University Health Centre, Montreal, Canada
| | - Lincoln Stein
- Ontario Institute for Cancer Research, Toronto, Canada
| | - Steven Gallinger
- Ontario Institute for Cancer Research, Toronto, Canada; Ontario Pancreas Cancer Study, Toronto, Canada.
| |
Collapse
|
39
|
Sohail M, Vakhrusheva OA, Sul JH, Pulit SL, Francioli LC, van den Berg LH, Veldink JH, de Bakker PIW, Bazykin GA, Kondrashov AS, Sunyaev SR. Negative selection in humans and fruit flies involves synergistic epistasis. Science 2018; 356:539-542. [PMID: 28473589 DOI: 10.1126/science.aah5238] [Citation(s) in RCA: 59] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2016] [Revised: 11/28/2016] [Accepted: 04/14/2017] [Indexed: 12/22/2022]
Abstract
Negative selection against deleterious alleles produced by mutation influences within-population variation as the most pervasive form of natural selection. However, it is not known whether deleterious alleles affect fitness independently, so that cumulative fitness loss depends exponentially on the number of deleterious alleles, or synergistically, so that each additional deleterious allele results in a larger decrease in relative fitness. Negative selection with synergistic epistasis should produce negative linkage disequilibrium between deleterious alleles and, therefore, an underdispersed distribution of the number of deleterious alleles in the genome. Indeed, we detected underdispersion of the number of rare loss-of-function alleles in eight independent data sets from human and fly populations. Thus, selection against rare protein-disrupting alleles is characterized by synergistic epistasis, which may explain how human and fly populations persist despite high genomic mutation rates.
Collapse
|
40
|
Blant A, Kwong M, Szpiech ZA, Pemberton TJ. Weighted likelihood inference of genomic autozygosity patterns in dense genotype data. BMC Genomics 2017; 18:928. [PMID: 29191164 PMCID: PMC5709839 DOI: 10.1186/s12864-017-4312-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Accepted: 11/16/2017] [Indexed: 12/14/2022] Open
Abstract
Background Genomic regions of autozygosity (ROA) arise when an individual is homozygous for haplotypes inherited identical-by-descent from ancestors shared by both parents. Over the past decade, they have gained importance for understanding evolutionary history and the genetic basis of complex diseases and traits. However, methods to infer ROA in dense genotype data have not evolved in step with advances in genome technology that now enable us to rapidly create large high-resolution genotype datasets, limiting our ability to investigate their constituent ROA patterns. Methods We report a weighted likelihood approach for inferring ROA in dense genotype data that accounts for autocorrelation among genotyped positions and the possibilities of unobserved mutation and recombination events, and variability in the confidence of individual genotype calls in whole genome sequence (WGS) data. Results Forward-time genetic simulations under two demographic scenarios that reflect situations where inbreeding and its effect on fitness are of interest suggest this approach is better powered than existing state-of-the-art methods to infer ROA at marker densities consistent with WGS and popular microarray genotyping platforms used in human and non-human studies. Moreover, we present evidence that suggests this approach is able to distinguish ROA arising via consanguinity from ROA arising via endogamy. Using subsets of The 1000 Genomes Project Phase 3 data we show that, relative to WGS, intermediate and long ROA are captured robustly with popular microarray platforms, while detection of short ROA is more variable and improves with marker density. Worldwide ROA patterns inferred from WGS data are found to accord well with those previously reported on the basis of microarray genotype data. Finally, we highlight the potential of this approach to detect genomic regions enriched for autozygosity signals in one group relative to another based upon comparisons of per-individual autozygosity likelihoods instead of inferred ROA frequencies. Conclusions This weighted likelihood ROA inference approach can assist population- and disease-geneticists working with a wide variety of data types and species to explore ROA patterns and to identify genomic regions with differential ROA signals among groups, thereby advancing our understanding of evolutionary history and the role of recessive variation in phenotypic variation and disease. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-4312-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Alexandra Blant
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, Canada
| | - Michelle Kwong
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, Canada
| | - Zachary A Szpiech
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, USA
| | - Trevor J Pemberton
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, Canada.
| |
Collapse
|
41
|
Crawford NG, Kelly DE, Hansen MEB, Beltrame MH, Fan S, Bowman SL, Jewett E, Ranciaro A, Thompson S, Lo Y, Pfeifer SP, Jensen JD, Campbell MC, Beggs W, Hormozdiari F, Mpoloka SW, Mokone GG, Nyambo T, Meskel DW, Belay G, Haut J, Rothschild H, Zon L, Zhou Y, Kovacs MA, Xu M, Zhang T, Bishop K, Sinclair J, Rivas C, Elliot E, Choi J, Li SA, Hicks B, Burgess S, Abnet C, Watkins-Chow DE, Oceana E, Song YS, Eskin E, Brown KM, Marks MS, Loftus SK, Pavan WJ, Yeager M, Chanock S, Tishkoff SA. Loci associated with skin pigmentation identified in African populations. Science 2017; 358:eaan8433. [PMID: 29025994 PMCID: PMC5759959 DOI: 10.1126/science.aan8433] [Citation(s) in RCA: 213] [Impact Index Per Article: 26.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2017] [Accepted: 10/03/2017] [Indexed: 12/13/2022]
Abstract
Despite the wide range of skin pigmentation in humans, little is known about its genetic basis in global populations. Examining ethnically diverse African genomes, we identify variants in or near SLC24A5, MFSD12, DDB1, TMEM138, OCA2, and HERC2 that are significantly associated with skin pigmentation. Genetic evidence indicates that the light pigmentation variant at SLC24A5 was introduced into East Africa by gene flow from non-Africans. At all other loci, variants associated with dark pigmentation in Africans are identical by descent in South Asian and Australo-Melanesian populations. Functional analyses indicate that MFSD12 encodes a lysosomal protein that affects melanogenesis in zebrafish and mice, and that mutations in melanocyte-specific regulatory regions near DDB1/TMEM138 correlate with expression of ultraviolet response genes under selection in Eurasians.
Collapse
Affiliation(s)
- Nicholas G Crawford
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Derek E Kelly
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Matthew E B Hansen
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Marcia H Beltrame
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Shaohua Fan
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Shanna L Bowman
- Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia Research Institute, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine and Department of Physiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Ethan Jewett
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA 94704, USA
- Department of Statistics, University of California, Berkeley, Berkeley, CA 94704, USA
| | - Alessia Ranciaro
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Simon Thompson
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Yancy Lo
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Susanne P Pfeifer
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
| | - Michael C Campbell
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Biology, Howard University, Washington, DC 20059, USA
| | - William Beggs
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Farhad Hormozdiari
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA 02142, USA
| | | | - Gaonyadiwe George Mokone
- Department of Biomedical Sciences, University of Botswana School of Medicine, Gaborone, Botswana
| | - Thomas Nyambo
- Department of Biochemistry, Muhimbili University of Health and Allied Sciences, Dar es Salaam, Tanzania
| | | | - Gurja Belay
- Department of Biology, Addis Ababa University, Addis Ababa, Ethiopia
| | - Jake Haut
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Harriet Rothschild
- Stem Cell Program, Division of Hematology and Oncology, Pediatric Hematology Program, Boston Children's Hospital and Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Leonard Zon
- Stem Cell Program, Division of Hematology and Oncology, Pediatric Hematology Program, Boston Children's Hospital and Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Yi Zhou
- Stem Cell Program, Division of Hematology and Oncology, Pediatric Hematology Program, Boston Children's Hospital and Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA
- Harvard Stem Cell Institute, Harvard University, Cambridge, MA 02138, USA
| | - Michael A Kovacs
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Mai Xu
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Tongwu Zhang
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Kevin Bishop
- Translational and Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Jason Sinclair
- Translational and Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Cecilia Rivas
- Genetic Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Eugene Elliot
- Genetic Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Jiyeon Choi
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Shengchao A Li
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD 20892, USA
- Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc., Frederick, MD 21701, USA
| | - Belynda Hicks
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD 20892, USA
- Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc., Frederick, MD 21701, USA
| | - Shawn Burgess
- Translational and Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Christian Abnet
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD 20892, USA
| | - Dawn E Watkins-Chow
- Genetic Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Elena Oceana
- Department of Molecular Pharmacology, Physiology and Biotechnology, Brown University, Providence, RI 02912, USA
| | - Yun S Song
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA 94704, USA
- Department of Statistics, University of California, Berkeley, Berkeley, CA 94704, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
- Department of Biology, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Mathematics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Eleazar Eskin
- Department of Computer Science and Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Kevin M Brown
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Michael S Marks
- Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia Research Institute, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine and Department of Physiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Stacie K Loftus
- Genetic Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - William J Pavan
- Genetic Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Meredith Yeager
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD 20892, USA
- Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc., Frederick, MD 21701, USA
| | - Stephen Chanock
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD 20892, USA
| | - Sarah A Tishkoff
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
- Department of Biology, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
42
|
Chen B, Cole JW, Grond-Ginsbach C. Departure from Hardy Weinberg Equilibrium and Genotyping Error. Front Genet 2017; 8:167. [PMID: 29163635 PMCID: PMC5671567 DOI: 10.3389/fgene.2017.00167] [Citation(s) in RCA: 91] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2017] [Accepted: 10/16/2017] [Indexed: 01/03/2023] Open
Abstract
Objective: Departure from Hardy Weinberg Equilibrium (HWE) may occur due to a variety of causes, including purifying selection, inbreeding, population substructure, copy number variation or genotyping error. We searched for specific characteristics of HWE-departure due to genotyping error. Methods: Genotypes of a random set of genetic variants were obtained from the Exome Aggregation Consortium (ExAC) database. Variants with <80% successful genotypes or with minor allele frequency (MAF) <1% were excluded. HWE-departure (d-HWE) was considered significant at p < 10E-05 and classified as d-HWE with loss of heterozygosity (LoH d-HWE) or d-HWE with excess heterozygosity (gain of heterozygosity: GoH d-HWE). Missing genotypes, variant type (single nucleotide polymorphism (SNP) vs. insertion/deletion); MAF, standard deviation (SD) of MAF across populations (MAF-SD) and copy number variation were evaluated for association with HWE-departure. Results: The study sample comprised 3,204 genotype distributions. HWE-departure was observed in 134 variants: LoH d-HWE in 41 (1.3%), GoH d-HWE in 93 (2.9%) variants. LoH d-HWE was more likely in variants located within deletion polymorphisms (p < 0.001) and in variants with higher MAF-SD (p = 0.0077). GoH d-HWE was associated with low genotyping rate, with variants of insertion/deletion type and with high MAF (all at p < 0.001). In a sub-sample of 2,196 variants with genotyping rate >98%, LoH d-HWE was found in 29 (1.3%) variants, but no GoH d-HWE was detected. The findings of the non-random distribution of HWE-violating SNPs along the chromosome, the association with common deletion polymorphisms and indel-variant type, and the finding of excess heterozygotes in genomic regions that are prone to cross-hybridization were confirmed in a large sample of short variants from the 1,000 Genomes Project. Conclusions: We differentiated between two types of HWE-departure. GoH d-HWE was suggestive for genotyping error. LoH d-HWE, on the contrary, pointed to natural variabilities such as population substructure or common deletion polymorphisms.
Collapse
Affiliation(s)
- Bowang Chen
- Department of Biology, Southern University of Science and Technology, Shenzhen, China
| | - John W. Cole
- Department of Neurology, Baltimore VA Medical Center (VHA), University of Maryland School of Medicine, Baltimore, MD, United States
| | | |
Collapse
|
43
|
Glusman G, Mauldin DE, Hood LE, Robinson M. Ultrafast Comparison of Personal Genomes via Precomputed Genome Fingerprints. Front Genet 2017; 8:136. [PMID: 29018478 PMCID: PMC5623000 DOI: 10.3389/fgene.2017.00136] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Accepted: 09/12/2017] [Indexed: 01/01/2023] Open
Abstract
We present an ultrafast method for comparing personal genomes. We transform the standard genome representation (lists of variants relative to a reference) into “genome fingerprints” via locality sensitive hashing. The resulting genome fingerprints can be meaningfully compared even when the input data were obtained using different sequencing technologies, processed using different pipelines, represented in different data formats and relative to different reference versions. Furthermore, genome fingerprints are robust to up to 30% missing data. Because of their reduced size, computation on the genome fingerprints is fast and requires little memory. For example, we could compute all-against-all pairwise comparisons among the 2504 genomes in the 1000 Genomes data set in 67 s at high quality (21 μs per comparison, on a single processor), and achieved a lower quality approximation in just 11 s. Efficient computation enables scaling up a variety of important genome analyses, including quantifying relatedness, recognizing duplicative sequenced genomes in a set, population reconstruction, and many others. The original genome representation cannot be reconstructed from its fingerprint, effectively decoupling genome comparison from genome interpretation; the method thus has significant implications for privacy-preserving genome analytics.
Collapse
Affiliation(s)
| | | | - Leroy E Hood
- Institute for Systems Biology, Seattle, WA, United States
| | - Max Robinson
- Institute for Systems Biology, Seattle, WA, United States
| |
Collapse
|
44
|
Ko A, Nielsen R. Composite likelihood method for inferring local pedigrees. PLoS Genet 2017; 13:e1006963. [PMID: 28827797 PMCID: PMC5578687 DOI: 10.1371/journal.pgen.1006963] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2017] [Revised: 08/31/2017] [Accepted: 08/07/2017] [Indexed: 12/21/2022] Open
Abstract
Pedigrees contain information about the genealogical relationships among individuals and are of fundamental importance in many areas of genetic studies. However, pedigrees are often unknown and must be inferred from genetic data. Despite the importance of pedigree inference, existing methods are limited to inferring only close relationships or analyzing a small number of individuals or loci. We present a simulated annealing method for estimating pedigrees in large samples of otherwise seemingly unrelated individuals using genome-wide SNP data. The method supports complex pedigree structures such as polygamous families, multi-generational families, and pedigrees in which many of the member individuals are missing. Computational speed is greatly enhanced by the use of a composite likelihood function which approximates the full likelihood. We validate our method on simulated data and show that it can infer distant relatives more accurately than existing methods. Furthermore, we illustrate the utility of the method on a sample of Greenlandic Inuit. Pedigrees contain information about the genealogical relationships among individuals. This information can be used in many areas of genetic studies such as disease association studies, conservation efforts, and for inferences about the demographic history and social structure of a population. Despite their importance, pedigrees are often unknown and must be estimated from genetic information. However, pedigree inference remains a difficult problem due to the high cost of likelihood computation and the enormous number of possible pedigrees that must be considered. These difficulties limit existing methods in their ability to infer pedigrees when the sample size or the number of markers is large, or when the sample contains only distant relatives. In this report, we present a method that circumvents these computational challenges in order to infer pedigrees of complex structure for a large number of individuals. Using simulations, we find that the method can infer distant relatives much more accurately than existing methods. Furthermore, we show that even pairwise inferences of relatedness can be improved substantially by consideration of the pedigree structure with other related individuals in the sample.
Collapse
Affiliation(s)
- Amy Ko
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
- * E-mail:
| | - Rasmus Nielsen
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
- Department of Statistics, University of California, Berkeley, Berkeley, California, United States of America
- Museum of Natural History, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
45
|
Population structure and coalescence in pedigrees: Comparisons to the structured coalescent and a framework for inference. Theor Popul Biol 2017; 115:1-12. [DOI: 10.1016/j.tpb.2017.01.004] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Revised: 01/02/2017] [Accepted: 01/18/2017] [Indexed: 01/08/2023]
|
46
|
Messina F, Finocchio A, Akar N, Loutradis A, Michalodimitrakis EI, Brdicka R, Jodice C, Novelletto A. Spatially Explicit Models to Investigate Geographic Patterns in the Distribution of Forensic STRs: Application to the North-Eastern Mediterranean. PLoS One 2016; 11:e0167065. [PMID: 27898725 PMCID: PMC5127579 DOI: 10.1371/journal.pone.0167065] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Accepted: 11/08/2016] [Indexed: 11/18/2022] Open
Abstract
Human forensic STRs used for individual identification have been reported to have little power for inter-population analyses. Several methods have been developed which incorporate information on the spatial distribution of individuals to arrive at a description of the arrangement of diversity. We genotyped at 16 forensic STRs a large population sample obtained from many locations in Italy, Greece and Turkey, i.e. three countries crucial to the understanding of discontinuities at the European/Asian junction and the genetic legacy of ancient migrations, but seldom represented together in previous studies. Using spatial PCA on the full dataset, we detected patterns of population affinities in the area. Additionally, we devised objective criteria to reduce the overall complexity into reduced datasets. Independent spatially explicit methods applied to these latter datasets converged in showing that the extraction of information on long- to medium-range geographical trends and structuring from the overall diversity is possible. All analyses returned the picture of a background clinal variation, with regional discontinuities captured by each of the reduced datasets. Several aspects of our results are confirmed on external STR datasets and replicate those of genome-wide SNP typings. High levels of gene flow were inferred within the main continental areas by coalescent simulations. These results are promising from a microevolutionary perspective, in view of the fast pace at which forensic data are being accumulated for many locales. It is foreseeable that this will allow the exploitation of an invaluable genotypic resource, assembled for other (forensic) purposes, to clarify important aspects in the formation of local gene pools.
Collapse
Affiliation(s)
| | | | - Nejat Akar
- Pediatrics Department, TOBB-Economy and Technology University Hospital, Ankara, Turkey
| | | | | | - Radim Brdicka
- Institute of Haematology and Blood Transfusion, Praha, Czech Republic
| | - Carla Jodice
- Department of Biology, University "Tor Vergata", Rome, Italy
| | - Andrea Novelletto
- Department of Biology, University "Tor Vergata", Rome, Italy
- * E-mail:
| |
Collapse
|
47
|
ENDOGAMY, CONSANGUINITY AND THE HEALTH IMPLICATIONS OF CHANGING MARITAL CHOICES IN THE UK PAKISTANI COMMUNITY. J Biosoc Sci 2016; 49:435-446. [DOI: 10.1017/s0021932016000419] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
SummaryThe biraderi (brotherhood) is a long-established, widely prevalent dimension of social stratification in Pakistani communities worldwide. Alongside consanguinity, it offers a route for cementing social solidarities and so has strong socio-biological significance. A detailed breakdown of biraderi affiliation among participants in an ongoing birth cohort study in the northern English city of Bradford is presented. There is historical resilience of intra-biraderi marriage, but with a secular decline in prevalence across all biraderi and considerable reductions in some. While a majority of marriages in all biraderi are consanguineous the prevalence varies, ranging from over 80% to under 60%. In consanguineous unions, first cousin marriages account for more than 50% in five of the fifteen biraderi and >40% in six others. Within-biraderi marriage and consanguinity enhance genetic stratification, thereby increasing rates of genomic homozygosity and the increased expression of recessive genetic disorders. The trends reported constitute putative signals of generational change in the marital choices in this community.
Collapse
|
48
|
Fujikura K. Global Carrier Rates of Rare Inherited Disorders Using Population Exome Sequences. PLoS One 2016; 11:e0155552. [PMID: 27219052 PMCID: PMC4878778 DOI: 10.1371/journal.pone.0155552] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2016] [Accepted: 04/29/2016] [Indexed: 12/22/2022] Open
Abstract
Exome sequencing has revealed the causative mutations behind numerous rare, inherited disorders, but it is challenging to find reliable epidemiological values for rare disorders. Here, I provide a genetic epidemiology method to identify the causative mutations behind rare, inherited disorders using two population exome sequences (1000 Genomes and NHLBI). I created global maps of carrier rate distribution for 18 recessive disorders in 16 diverse ethnic populations. Out of a total of 161 mutations associated with 18 recessive disorders, I detected 24 mutations in either or both exome studies. The genetic mapping revealed strong international spatial heterogeneities in the carrier patterns of the inherited disorders. I next validated this methodology by statistically evaluating the carrier rate of one well-understood disorder, sickle cell anemia (SCA). The population exome-based epidemiology of SCA [African (allele frequency (AF) = 0.0454, N = 2447), Asian (AF = 0, N = 286), European (AF = 0.000214, N = 4677), and Hispanic (AF = 0.0111, N = 362)] was not significantly different from that obtained from a clinical prevalence survey. A pair-wise proportion test revealed no significant differences between the two exome projects in terms of AF (46/48 cases; P > 0.05). I conclude that population exome-based carrier rates can form the foundation for a prospectively maintained database of use to clinical geneticists. Similar modeling methods can be applied to many inherited disorders.
Collapse
Affiliation(s)
- Kohei Fujikura
- Kobe University School of Medicine, 7-5-1, Kusunoki-cho, Chuo-ku, Kobe, 650-0017, Japan
- * E-mail:
| |
Collapse
|
49
|
Vieira FG, Albrechtsen A, Nielsen R. Estimating IBD tracts from low coverage NGS data. Bioinformatics 2016; 32:2096-102. [DOI: 10.1093/bioinformatics/btw212] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2015] [Accepted: 04/12/2016] [Indexed: 11/13/2022] Open
|