1
|
Dinh KN, Vázquez-García I, Chan A, Malhotra R, Weiner A, McPherson AW, Tavaré S. CINner: Modeling and simulation of chromosomal instability in cancer at single-cell resolution. PLoS Comput Biol 2025; 21:e1012902. [PMID: 40179124 PMCID: PMC11990800 DOI: 10.1371/journal.pcbi.1012902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 04/11/2025] [Accepted: 02/24/2025] [Indexed: 04/05/2025] Open
Abstract
Cancer development is characterized by chromosomal instability, manifesting in frequent occurrences of different genomic alteration mechanisms ranging in extent and impact. Mathematical modeling can help evaluate the role of each mutational process during tumor progression, however existing frameworks can only capture certain aspects of chromosomal instability (CIN). We present CINner, a mathematical framework for modeling genomic diversity and selection during tumor evolution. The main advantage of CINner is its flexibility to incorporate many genomic events that directly impact cellular fitness, from driver gene mutations to copy number alterations (CNAs), including focal amplifications and deletions, missegregations and whole-genome duplication (WGD). We apply CINner to find chromosome-arm selection parameters that drive tumorigenesis in the absence of WGD in chromosomally stable cancer types from the Pan-Cancer Analysis of Whole Genomes (PCAWG, [Formula: see text]). We found that the selection parameters predict WGD prevalence among different chromosomally unstable tumors, hinting that the selective advantage of WGD cells hinges on their tolerance for aneuploidy and escape from nullisomy. Analysis of inference results using CINner across cancer types in The Cancer Genome Atlas ([Formula: see text]) further reveals that the inferred selection parameters reflect the bias between tumor suppressor genes and oncogenes on specific genomic regions. Direct application of CINner to model the WGD proportion and fraction of genome altered (FGA) in PCAWG uncovers the increase in CNA probabilities associated with WGD in each cancer type. CINner can also be utilized to study chromosomally stable cancer types, by applying a selection model based on driver gene mutations and focal amplifications or deletions (chronic lymphocytic leukemia in PCAWG, [Formula: see text]). Finally, we used CINner to analyze the impact of CNA probabilities, chromosome selection parameters, tumor growth dynamics and population size on cancer fitness and heterogeneity. We expect that CINner will provide a powerful modeling tool for the oncology community to quantify the impact of newly uncovered genomic alteration mechanisms on shaping tumor progression and adaptation.
Collapse
Affiliation(s)
- Khanh N. Dinh
- Irving Institute for Cancer Dynamics, Columbia University, New York, New York, United States of America
- Department of Statistics, Columbia University, New York, New York, United States of America
| | - Ignacio Vázquez-García
- Irving Institute for Cancer Dynamics, Columbia University, New York, New York, United States of America
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America
- Department of Pathology, Krantz Family Center for Cancer Research, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Andrew Chan
- Case Western Reserve University, Cleveland, Ohio, United States of America
| | - Rhea Malhotra
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America
- Stanford University, Palo Alto, California, United States of America
| | - Adam Weiner
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, New York, United States of America
| | - Andrew W. McPherson
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America
| | - Simon Tavaré
- Irving Institute for Cancer Dynamics, Columbia University, New York, New York, United States of America
- Department of Statistics, Columbia University, New York, New York, United States of America
| |
Collapse
|
2
|
Pivirotto A, Peles N, Hey J. Allele age estimators designed for whole genome datasets show only a moderate reduction in performance when applied to whole exome datasets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.02.01.578465. [PMID: 38370640 PMCID: PMC10871225 DOI: 10.1101/2024.02.01.578465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Personalized genomics in the healthcare system is becoming increasingly accessible as the costs of sequencing decreases. With the increase in the number of genomes, larger numbers of rare variants are being discovered, leading to important initiatives in identifying the functional impacts in relation to disease phenotypes. One way to characterize these variants is to estimate the time the mutation entered the population. However, allele age estimators such as those implemented in the programs Relate, Genealogical Estimator of Variant Age (GEVA), and Runtc, were developed based on the assumption that datasets include the entire genome. We examined the performance of each of these estimators on simulated exome data under a neutral constant population size model, as well as under population expansion and background selection models. We found that each provides usable estimates of allele age from whole-exome datasets. Relate performs the best amongst all three estimators with Pearson coefficients of 0.83 and 0.73 (with respect to true simulated values, for neutral constant and expansion population model, respectively) with a 12 percent and 20 percent decrease in correlation between whole genome and whole exome estimations. Of the three estimators, Relate is best able to parallelize to yield quick results with little resources, however, Relate is currently only able to scale to thousands of samples making it unable to match the hundreds of thousands of samples being currently released. While more work is needed to expand the capabilities of current methods of estimating allele age, these methods show a modest decrease in performance in the estimation of the age of mutations.
Collapse
Affiliation(s)
- Alyssa Pivirotto
- Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA USA
| | - Noah Peles
- Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA USA
| | - Jody Hey
- Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA USA
| |
Collapse
|
3
|
Arbore R, Barbosa S, Brejcha J, Ogawa Y, Liu Y, Nicolaï MPJ, Pereira P, Sabatino SJ, Cloutier A, Poon ESK, Marques CI, Andrade P, Debruyn G, Afonso S, Afonso R, Roy SG, Abdu U, Lopes RJ, Mojzeš P, Maršík P, Sin SYW, White MA, Araújo PM, Corbo JC, Carneiro M. A molecular mechanism for bright color variation in parrots. Science 2024; 386:eadp7710. [PMID: 39480920 PMCID: PMC7617403 DOI: 10.1126/science.adp7710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Accepted: 09/05/2024] [Indexed: 11/02/2024]
Abstract
Parrots produce stunning plumage colors through unique pigments called psittacofulvins. However, the mechanism underlying their ability to generate a spectrum of vibrant yellows, reds, and greens remains enigmatic. We uncover a unifying chemical basis for a wide range of parrot plumage colors, which result from the selective deposition of red aldehyde- and yellow carboxyl-containing psittacofulvin molecules in developing feathers. Through genetic mapping, biochemical assays, and single-cell genomics, we identified a critical player in this process, the aldehyde dehydrogenase ALDH3A2, which oxidizes aldehyde psittacofulvins into carboxyl forms in late-differentiating keratinocytes during feather development. The simplicity of the underlying molecular mechanism, in which a single enzyme influences the balance of red and yellow pigments, offers an explanation for the exceptional evolutionary lability of parrot coloration.
Collapse
Affiliation(s)
- Roberto Arbore
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Universidade do Porto, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, USA
| | - Soraia Barbosa
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Universidade do Porto, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
| | - Jindřich Brejcha
- Department of Philosophy and History of Science, Faculty of Science, Charles University in Prague, Praha, Czech Republic
| | - Yohey Ogawa
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, USA
| | - Yu Liu
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, USA
| | - Michaël P. J. Nicolaï
- Evolution and Optics of Nanostructures Group, Biology Department, Ghent University, Ghent, Belgium
| | - Paulo Pereira
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Universidade do Porto, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
- Departamento de Biologia, Faculdade de Ciências da Universidade do Porto, Porto, Portugal
| | - Stephen J. Sabatino
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Universidade do Porto, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
| | - Alison Cloutier
- School of Biological Sciences, The University of Hong Kong, Hong Kong
| | | | - Cristiana I. Marques
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Universidade do Porto, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
- Departamento de Biologia, Faculdade de Ciências da Universidade do Porto, Porto, Portugal
| | - Pedro Andrade
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Universidade do Porto, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
| | - Gerben Debruyn
- Evolution and Optics of Nanostructures Group, Biology Department, Ghent University, Ghent, Belgium
| | - Sandra Afonso
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Universidade do Porto, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
| | - Rita Afonso
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Universidade do Porto, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
- Departamento de Biologia, Faculdade de Ciências da Universidade do Porto, Porto, Portugal
| | - Shatadru Ghosh Roy
- Department of Life Sciences, Ben-Gurion University of the Negev, Beer Sheva84105, Israel
| | - Uri Abdu
- Department of Life Sciences, Ben-Gurion University of the Negev, Beer Sheva84105, Israel
| | - Ricardo J. Lopes
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Universidade do Porto, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
- MHNC-UP, Natural History and Science Museum of the University of Porto, Porto, Portugal
- cE3c – Center for Ecology, Evolution and Environmental Change & CHANGE, Departamento de Biologia Animal, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
| | - Peter Mojzeš
- Institute of Physics, Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic
| | - Petr Maršík
- Department of Food Science, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences Prague, Prague, Czech Republic
| | - Simon Yung Wa Sin
- School of Biological Sciences, The University of Hong Kong, Hong Kong
| | - Michael A. White
- Edison Family Center for Systems Biology and Genome Sciences, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Pedro M. Araújo
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Universidade do Porto, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
- University of Coimbra, MARE – Marine and Environmental Sciences Centre, Department of Life Sciences, Coimbra, Portugal
| | - Joseph C. Corbo
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, USA
| | - Miguel Carneiro
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Universidade do Porto, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
| |
Collapse
|
4
|
Fan WTL, Wakeley J. Latent mutations in the ancestries of alleles under selection. Theor Popul Biol 2024; 158:1-20. [PMID: 38697365 DOI: 10.1016/j.tpb.2024.04.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 04/23/2024] [Accepted: 04/29/2024] [Indexed: 05/05/2024]
Abstract
We consider a single genetic locus with two alleles A1 and A2 in a large haploid population. The locus is subject to selection and two-way, or recurrent, mutation. Assuming the allele frequencies follow a Wright-Fisher diffusion and have reached stationarity, we describe the asymptotic behaviors of the conditional gene genealogy and the latent mutations of a sample with known allele counts, when the count n1 of allele A1 is fixed, and when either or both the sample size n and the selection strength |α| tend to infinity. Our study extends previous work under neutrality to the case of non-neutral rare alleles, asserting that when selection is not too strong relative to the sample size, even if it is strongly positive or strongly negative in the usual sense (α→-∞ or α→+∞), the number of latent mutations of the n1 copies of allele A1 follows the same distribution as the number of alleles in the Ewens sampling formula. On the other hand, very strong positive selection relative to the sample size leads to neutral gene genealogies with a single ancient latent mutation. We also demonstrate robustness of our asymptotic results against changing population sizes, when one of |α| or n is large.
Collapse
Affiliation(s)
- Wai-Tong Louis Fan
- Department of Mathematics, Indiana University, 831 East 3rd St, Bloomington, 47405, IN, USA; Department of Organismic and Evolutionary Biology, Harvard University, 16 Divinity Ave, Cambridge, 02138, MA, USA.
| | - John Wakeley
- Department of Organismic and Evolutionary Biology, Harvard University, 16 Divinity Ave, Cambridge, 02138, MA, USA.
| |
Collapse
|
5
|
Dinh KN, Vázquez-García I, Chan A, Malhotra R, Weiner A, McPherson AW, Tavaré S. CINner: modeling and simulation of chromosomal instability in cancer at single-cell resolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.03.587939. [PMID: 38617259 PMCID: PMC11014621 DOI: 10.1101/2024.04.03.587939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Cancer development is characterized by chromosomal instability, manifesting in frequent occurrences of different genomic alteration mechanisms ranging in extent and impact. Mathematical modeling can help evaluate the role of each mutational process during tumor progression, however existing frameworks can only capture certain aspects of chromosomal instability (CIN). We present CINner, a mathematical framework for modeling genomic diversity and selection during tumor evolution. The main advantage of CINner is its flexibility to incorporate many genomic events that directly impact cellular fitness, from driver gene mutations to copy number alterations (CNAs), including focal amplifications and deletions, missegregations and whole-genome duplication (WGD). We apply CINner to find chromosome-arm selection parameters that drive tumorigenesis in the absence of WGD in chromosomally stable cancer types. We found that the selection parameters predict WGD prevalence among different chromosomally unstable tumors, hinting that the selective advantage of WGD cells hinges on their tolerance for aneuploidy and escape from nullisomy. Direct application of CINner to model the WGD proportion and fraction of genome altered (FGA) further uncovers the increase in CNA probabilities associated with WGD in each cancer type. CINner can also be utilized to study chromosomally stable cancer types, by applying a selection model based on driver gene mutations and focal amplifications or deletions. Finally, we used CINner to analyze the impact of CNA probabilities, chromosome selection parameters, tumor growth dynamics and population size on cancer fitness and heterogeneity. We expect that CINner will provide a powerful modeling tool for the oncology community to quantify the impact of newly uncovered genomic alteration mechanisms on shaping tumor progression and adaptation.
Collapse
Affiliation(s)
- Khanh N. Dinh
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY, USA
- Department of Statistics, Columbia University, New York, NY, USA
| | - Ignacio Vázquez-García
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY, USA
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Andrew Chan
- Case Western Reserve University, Cleveland, OH, USA
| | - Rhea Malhotra
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Stanford University, Palo Alto, CA, USA
| | - Adam Weiner
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Andrew W. McPherson
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Simon Tavaré
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY, USA
- Department of Statistics, Columbia University, New York, NY, USA
| |
Collapse
|
6
|
Assis R, Conant G, Holland B, Liberles DA, O'Reilly MM, Wilson AE. Models for the retention of duplicate genes and their biological underpinnings. F1000Res 2024; 12:1400. [PMID: 38173826 PMCID: PMC10762295 DOI: 10.12688/f1000research.141786.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/08/2024] [Indexed: 01/05/2024] Open
Abstract
Gene content in genomes changes through several different processes, with gene duplication being an important contributor to such changes. Gene duplication occurs over a range of scales from individual genes to whole genomes, and the dynamics of this process can be context dependent. Still, there are rules by which genes are retained or lost from genomes after duplication, and probabilistic modeling has enabled characterization of these rules, including their context-dependence. Here, we describe the biology and corresponding mathematical models that are used to understand duplicate gene retention and its contribution to the set of biochemical functions encoded in a genome.
Collapse
Affiliation(s)
- Raquel Assis
- Florida Atlantic University, Boca Raton, Florida, USA
| | - Gavin Conant
- North Carolina State University, Raleigh, North Carolina, USA
| | | | | | | | | |
Collapse
|
7
|
Pivirotto AM, Platt A, Patel R, Kumar S, Hey J. Analyses of allele age and fitness impact reveal human beneficial alleles to be older than neutral controls. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.09.561569. [PMID: 37873438 PMCID: PMC10592680 DOI: 10.1101/2023.10.09.561569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
A classic population genetic prediction is that alleles experiencing directional selection should swiftly traverse allele frequency space, leaving detectable reductions in genetic variation in linked regions. However, despite this expectation, identifying clear footprints of beneficial allele passage has proven to be surprisingly challenging. We addressed the basic premise underlying this expectation by estimating the ages of large numbers of beneficial and deleterious alleles in a human population genomic data set. Deleterious alleles were found to be young, on average, given their allele frequency. However, beneficial alleles were older on average than non-coding, non-regulatory alleles of the same frequency. This finding is not consistent with directional selection and instead indicates some type of balancing selection. Among derived beneficial alleles, those fixed in the population show higher local recombination rates than those still segregating, consistent with a model in which new beneficial alleles experience an initial period of balancing selection due to linkage disequilibrium with deleterious recessive alleles. Alleles that ultimately fix following a period of balancing selection will leave a modest 'soft' sweep impact on the local variation, consistent with the overall paucity of species-wide 'hard' sweeps in human genomes.
Collapse
Affiliation(s)
| | - Alexander Platt
- Temple University, Department of Biology, Philadelphia PA 19122, USA
- University of Pennsylvania, Department of Genetics, Philadelphia PA 19104, USA
| | - Ravi Patel
- Temple University, Department of Biology, Philadelphia PA 19122, USA
- Institute for Genomics and Evolutionary Medicine, Temple University, PA 19122, USA
| | - Sudhir Kumar
- Temple University, Department of Biology, Philadelphia PA 19122, USA
- Institute for Genomics and Evolutionary Medicine, Temple University, PA 19122, USA
| | - Jody Hey
- Temple University, Department of Biology, Philadelphia PA 19122, USA
| |
Collapse
|
8
|
Hofmeister RJ, Ribeiro DM, Rubinacci S, Delaneau O. Accurate rare variant phasing of whole-genome and whole-exome sequencing data in the UK Biobank. Nat Genet 2023:10.1038/s41588-023-01415-w. [PMID: 37386248 DOI: 10.1038/s41588-023-01415-w] [Citation(s) in RCA: 62] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 05/04/2023] [Indexed: 07/01/2023]
Abstract
Phasing involves distinguishing the two parentally inherited copies of each chromosome into haplotypes. Here, we introduce SHAPEIT5, a new phasing method that quickly and accurately processes large sequencing datasets and applied it to UK Biobank (UKB) whole-genome and whole-exome sequencing data. We demonstrate that SHAPEIT5 phases rare variants with low switch error rates of below 5% for variants present in just 1 sample out of 100,000. Furthermore, we outline a method for phasing singletons, which, although less precise, constitutes an important step towards future developments. We then demonstrate that the use of UKB as a reference panel improves the accuracy of genotype imputation, which is even more pronounced when phased with SHAPEIT5 compared with other methods. Finally, we screen the UKB data for loss-of-function compound heterozygous events and identify 549 genes where both gene copies are knocked out. These genes complement current knowledge of gene essentiality in the human genome.
Collapse
Affiliation(s)
- Robin J Hofmeister
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Diogo M Ribeiro
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Simone Rubinacci
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Olivier Delaneau
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
| |
Collapse
|
9
|
Puckett EE, Davis IS, Harper DC, Wakamatsu K, Battu G, Belant JL, Beyer DE, Carpenter C, Crupi AP, Davidson M, DePerno CS, Forman N, Fowler NL, Garshelis DL, Gould N, Gunther K, Haroldson M, Ito S, Kocka D, Lackey C, Leahy R, Lee-Roney C, Lewis T, Lutto A, McGowan K, Olfenbuttel C, Orlando M, Platt A, Pollard MD, Ramaker M, Reich H, Sajecki JL, Sell SK, Strules J, Thompson S, van Manen F, Whitman C, Williamson R, Winslow F, Kaelin CB, Marks MS, Barsh GS. Genetic architecture and evolution of color variation in American black bears. Curr Biol 2023; 33:86-97.e10. [PMID: 36528024 PMCID: PMC10039708 DOI: 10.1016/j.cub.2022.11.042] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 11/08/2022] [Accepted: 11/18/2022] [Indexed: 12/23/2022]
Abstract
Color variation is a frequent evolutionary substrate for camouflage in small mammals, but the underlying genetics and evolutionary forces that drive color variation in natural populations of large mammals are mostly unexplained. The American black bear, Ursus americanus (U. americanus), exhibits a range of colors including the cinnamon morph, which has a similar color to the brown bear, U. arctos, and is found at high frequency in the American southwest. Reflectance and chemical melanin measurements showed little distinction between U. arctos and cinnamon U. americanus individuals. We used a genome-wide association for hair color as a quantitative trait in 151 U. americanus individuals and identified a single major locus (p < 10-13). Additional genomic and functional studies identified a missense alteration (R153C) in Tyrosinase-related protein 1 (TYRP1) that likely affects binding of the zinc cofactor, impairs protein localization, and results in decreased pigment production. Population genetic analyses and demographic modeling indicated that the R153C variant arose 9.36 kya in a southwestern population where it likely provided a selective advantage, spreading both northwards and eastwards by gene flow. A different TYRP1 allele, R114C, contributes to the characteristic brown color of U. arctos but is not fixed across the range.
Collapse
Affiliation(s)
- Emily E Puckett
- Department of Biological Sciences, University of Memphis, Memphis, TN 38152, USA.
| | - Isis S Davis
- Department of Biological Sciences, University of Memphis, Memphis, TN 38152, USA
| | - Dawn C Harper
- Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Kazumasa Wakamatsu
- Institute for Melanin Chemistry, Fujita Health University, Toyoake, Japan
| | - Gopal Battu
- HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA
| | - Jerrold L Belant
- Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI 48824, USA
| | - Dean E Beyer
- Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI 48824, USA
| | - Colin Carpenter
- West Virginia Division of Natural Resources, Beckley, WV 25801, USA
| | - Anthony P Crupi
- Division of Wildlife Conservation, Alaska Department of Fish and Game, Douglas, Juneau, AK 99824, USA
| | - Maria Davidson
- The Louisiana Department of Wildlife and Fisheries, Baton Rouge, LA 70898, USA
| | - Christopher S DePerno
- Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC 27695-7646, USA
| | - Nicholas Forman
- New Mexico Department of Game and Fish, Santa Fe, NM 87507, USA
| | - Nicholas L Fowler
- Division of Wildlife Conservation, Alaska Department of Fish and Game, Douglas, Juneau, AK 99824, USA
| | - David L Garshelis
- Minnesota Department of Natural Resources, Grand Rapids, MN 55744, USA; IUCN SSC Bear Specialist Group
| | - Nicholas Gould
- Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC 27695-7646, USA
| | - Kerry Gunther
- National Park Service, Yellowstone National Park, WY 82190-0168, USA
| | - Mark Haroldson
- U.S. Geological Survey, Northern Rocky Mountain Science Center, Interagency Grizzly Bear Study Team, Bozeman, MT 59715, USA
| | - Shosuke Ito
- Institute for Melanin Chemistry, Fujita Health University, Toyoake, Japan
| | - David Kocka
- Virginia Department of Wildlife Resources, Verona, VA 24482, USA
| | - Carl Lackey
- Nevada Department of Wildlife, Reno, NV 89512, USA
| | - Ryan Leahy
- National Park Service, Yosemite National Park Wildlife Management, Yosemite, CA 95389, USA
| | - Caitlin Lee-Roney
- National Park Service, Yosemite National Park Wildlife Management, Yosemite, CA 95389, USA
| | - Tania Lewis
- National Park Service, Glacier Bay National Park, Gustavus, AK 99826, USA
| | - Ashley Lutto
- U.S. Fish and Wildlife Service, Kenai National Wildlife Refuge, Soldotna, AK 99669, USA
| | - Kelly McGowan
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA 94305, USA
| | | | - Mike Orlando
- Florida Fish and Wildlife Conservation Commission, Tallahassee, FL 32399, USA
| | - Alexander Platt
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Matthew D Pollard
- Department of Biological Sciences, University of Memphis, Memphis, TN 38152, USA
| | - Megan Ramaker
- HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA
| | | | - Jaime L Sajecki
- Virginia Department of Wildlife Resources, Verona, VA 24482, USA
| | - Stephanie K Sell
- Division of Wildlife Conservation, Alaska Department of Fish and Game, Douglas, Juneau, AK 99824, USA
| | - Jennifer Strules
- Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC 27695-7646, USA
| | - Seth Thompson
- Virginia Department of Wildlife Resources, Verona, VA 24482, USA
| | - Frank van Manen
- U.S. Geological Survey, Northern Rocky Mountain Science Center, Interagency Grizzly Bear Study Team, Bozeman, MT 59715, USA
| | - Craig Whitman
- U.S. Geological Survey, Northern Rocky Mountain Science Center, Interagency Grizzly Bear Study Team, Bozeman, MT 59715, USA
| | - Ryan Williamson
- National Park Service, Great Smoky Mountains National Park, Gatlinburg, TN 37738, USA
| | | | - Christopher B Kaelin
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Michael S Marks
- Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Departments of Pathology and Laboratory Medicine and of Physiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Gregory S Barsh
- HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA; Department of Genetics, School of Medicine, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
10
|
Muktupavela RA, Petr M, Ségurel L, Korneliussen T, Novembre J, Racimo F. Modeling the spatiotemporal spread of beneficial alleles using ancient genomes. eLife 2022; 11:e73767. [PMID: 36537881 PMCID: PMC9767474 DOI: 10.7554/elife.73767] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Accepted: 11/21/2022] [Indexed: 12/24/2022] Open
Abstract
Ancient genome sequencing technologies now provide the opportunity to study natural selection in unprecedented detail. Rather than making inferences from indirect footprints left by selection in present-day genomes, we can directly observe whether a given allele was present or absent in a particular region of the world at almost any period of human history within the last 10,000 years. Methods for studying selection using ancient genomes often rely on partitioning individuals into discrete time periods or regions of the world. However, a complete understanding of natural selection requires more nuanced statistical methods which can explicitly model allele frequency changes in a continuum across space and time. Here we introduce a method for inferring the spread of a beneficial allele across a landscape using two-dimensional partial differential equations. Unlike previous approaches, our framework can handle time-stamped ancient samples, as well as genotype likelihoods and pseudohaploid sequences from low-coverage genomes. We apply the method to a panel of published ancient West Eurasian genomes to produce dynamic maps showcasing the inferred spread of candidate beneficial alleles over time and space. We also provide estimates for the strength of selection and diffusion rate for each of these alleles. Finally, we highlight possible avenues of improvement for accurately tracing the spread of beneficial alleles in more complex scenarios.
Collapse
Affiliation(s)
- Rasa A Muktupavela
- Lundbeck GeoGenetics Centre, GLOBE Institute, Faculty of HealthCopenhagenDenmark
| | - Martin Petr
- Lundbeck GeoGenetics Centre, GLOBE Institute, Faculty of HealthCopenhagenDenmark
| | - Laure Ségurel
- UMR5558 Biométrie et Biologie Evolutive, CNRS - Université Lyon 1VilleurbanneFrance
| | | | - John Novembre
- Department of Human Genetics, University of ChicagoChicagoUnited States
| | - Fernando Racimo
- Lundbeck GeoGenetics Centre, GLOBE Institute, Faculty of HealthCopenhagenDenmark
| |
Collapse
|
11
|
Johnson KE, Adams CJ, Voight BF. Identifying rare variants inconsistent with identity-by-descent in population-scale whole-genome sequencing data. Methods Ecol Evol 2022; 13:2429-2442. [PMID: 38938451 PMCID: PMC11210625 DOI: 10.1111/2041-210x.13991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 09/12/2022] [Indexed: 12/01/2022]
Abstract
Analyses of genetic variation typically assume that rare variants within a population are inherited from a single common ancestral event identity-by-descent (IBD). However, there are genetic and technical processes through which rare variants in population genetic data may deviate from this simple evolutionary model, including recurrent mutations, gene conversions and genotyping error. All these processes can decrease the expected length of shared background haplotype surrounding a rare variant if that variant was inherited from a single event descending from a common ancestor. No method exists to computationally infer rare variants inconsistent with this simple model-denoted here as 'IBD-inconsistent'-using unphased population sequencing data.We hypothesized that the difference in shared haplotype background length can distinguish variants consistent and inconsistent with this simple IBD transmission population sequencing data without pedigree information. We implemented a Bayesian hierarchical model and used Gibbs sampling to estimate the posterior probability of IBD state for rare variants, using simulated recurrent mutations to demonstrate that our approach accurately distinguishes rare variants consistent and inconsistent with a simple IBD inheritance model.Applying our method to whole-genome sequencing data from 3,621 human individuals in the UK10K consortium, we found that IBD-inconsistent variants correlated with higher local mutation rates and genomic features like replication timing. Using a heuristic to categorize IBD-inconsistent variants as gene conversions, we found that potential gene conversions had expected properties such as enriched local GC content.By identifying IBD-inconsistent variants, we can better understand the spectrum of recent mutations in human populations, a source of genetic variation driving evolution and a key factor in understanding recent demographic history.
Collapse
Affiliation(s)
- Kelsey E. Johnson
- Cell and Molecular Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Christopher J. Adams
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Benjamin F. Voight
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
12
|
Vecchyo DOD, Lohmueller KE, Novembre J. Haplotype-based inference of the distribution of fitness effects. Genetics 2022; 220:6501446. [PMID: 35100400 PMCID: PMC8982047 DOI: 10.1093/genetics/iyac002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 12/18/2021] [Indexed: 11/13/2022] Open
Abstract
Abstract
Recent genome sequencing studies with large sample sizes in humans have discovered a vast quantity of low-frequency variants, providing an important source of information to analyze how selection is acting on human genetic variation. In order to estimate the strength of natural selection acting on low-frequency variants, we have developed a likelihood-based method that uses the lengths of pairwise identity-by-state between haplotypes carrying low-frequency variants. We show that in some non-equilibrium populations (such as those that have had recent population expansions) it is possible to distinguish between positive or negative selection acting on a set of variants. With our new framework, one can infer a fixed selection intensity acting on a set of variants at a particular frequency, or a distribution of selection coefficients for standing variants and new mutations. We show an application of our method to the UK10K phased haplotype dataset of individuals.
Collapse
Affiliation(s)
- Diego Ortega-Del Vecchyo
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Juriquilla, Querétaro, 76230, México
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, California, 90095, United States of America
| | - Kirk E Lohmueller
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, California, 90095, United States of America
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, 90095, United States of America
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, 90095, United States of America
| | - John Novembre
- Department of Human Genetics, University of Chicago, Chicago, Illinois, 60637, United States of America
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, 60637, United States of America
| |
Collapse
|
13
|
Stark TL, Liberles DA. Characterizing Amino Acid Substitution with Complete Linkage of Sites on a Lineage. Genome Biol Evol 2021; 13:6377338. [PMID: 34581792 PMCID: PMC8557849 DOI: 10.1093/gbe/evab225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/17/2021] [Indexed: 11/16/2022] Open
Abstract
Amino acid substitution models are commonly used for phylogenetic inference, for ancestral sequence reconstruction, and for the inference of positive selection. All commonly used models explicitly assume that each site evolves independently, an assumption that is violated by both linkage and protein structural and functional constraints. We introduce two new models for amino acid substitution which incorporate linkage between sites, each based on the (population-genetic) Moran model. The first model is a generalized population process tracking arbitrarily many sites which undergo mutation, with individuals replaced according to their fitnesses. This model provides a reasonably complete framework for simulations but is numerically and analytically intractable. We also introduce a second model which includes several simplifying assumptions but for which some theoretical results can be derived. We analyze the simplified model to determine conditions where linkage is likely to have meaningful effects on sitewise substitution probabilities, as well as conditions under which the effects are likely to be negligible. These findings are an important step in the generation of tractable phylogenetic models that parameterize selective coefficients for amino acid substitution while accounting for linkage of sites leading to both hitchhiking and background selection.
Collapse
Affiliation(s)
- Tristan L Stark
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, USA
| | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, USA
| |
Collapse
|
14
|
Stark TL, Kaufman RS, Maltepes MA, Chi PB, Liberles DA. Detecting Selection on Segregating Gene Duplicates in a Population. J Mol Evol 2021; 89:554-564. [PMID: 34341836 DOI: 10.1007/s00239-021-10024-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 07/20/2021] [Indexed: 11/26/2022]
Abstract
Gene duplication is a fundamental process that has the potential to drive phenotypic differences between populations and species. While evolutionarily neutral changes have the potential to affect phenotypes, detecting selection acting on gene duplicates can uncover cases of adaptive diversification. Existing methods to detect selection on duplicates work mostly inter-specifically and are based upon selection on coding sequence changes, here we present a method to detect selection directly on a copy number variant segregating in a population. The method relies upon expected relationships between allele (new duplication) age and frequency in the population dependent upon the effective population size. Using both a haploid and a diploid population with a Moran Model under several population sizes, the neutral baseline for copy number variants is established. The ability of the method to reject neutrality for duplicates with known age (measured in pairwise dS value) and frequency in the population is established through mathematical analysis and through simulations. Power is particularly good in the diploid case and with larger effective population sizes, as expected. With extension of this method to larger population sizes, this is a tool to analyze selection on copy number variants in any natural or experimentally evolving population. We have made an R package available at https://github.com/peterbchi/CNVSelectR/ which implements the method introduced here.
Collapse
Affiliation(s)
- Tristan L Stark
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA.
- Discipline of Mathematics, University of Tasmania, Hobart, Tasmania, 7001, Australia.
| | - Rebecca S Kaufman
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA
| | - Maria A Maltepes
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA
| | - Peter B Chi
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA
- Department of Mathematics and Statistics, Villanova University, Villanova, PA, 19085, USA
| | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA.
| |
Collapse
|
15
|
Haworth SE, Nituch L, Northrup JM, Shafer ABA. Characterizing the demographic history and prion protein variation to infer susceptibility to chronic wasting disease in a naïve population of white-tailed deer ( Odocoileus virginianus). Evol Appl 2021; 14:1528-1539. [PMID: 34178102 PMCID: PMC8210793 DOI: 10.1111/eva.13214] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Revised: 02/12/2021] [Accepted: 03/02/2021] [Indexed: 12/17/2022] Open
Abstract
Assessments of the adaptive potential in natural populations are essential for understanding and predicting responses to environmental stressors like climate change and infectious disease. Species face a range of stressors in human-dominated landscapes, often with contrasting effects. White-tailed deer (Odocoileus virginianus; deer) are expanding in the northern part of their range following decreasing winter severity and increasing forage availability. Chronic wasting disease (CWD), a prion disease affecting deer, is likewise expanding and represents a major threat to deer and other cervids. We obtained tissue samples from free-ranging deer across their native range in Ontario, Canada, which has yet to detect CWD in wild populations. We used high-throughput sequencing to assess neutral genomic variation and variation in the prion protein gene (PRNP) that is partly responsible for the protein misfolding when deer contract CWD. Neutral variation revealed a high number of rare alleles and no population structure, and demographic models suggested a rapid historical population expansion. Allele frequencies of PRNP variants associated with CWD susceptibility and disease progression were evenly distributed across the landscape and consistent with deer populations not infected with CWD. We estimated the selection coefficient of CWD, with simulations showing an observable and rapid shift in PRNP allele frequencies that coincides with the start of a novel CWD outbreak. Sustained surveillance of genomic and PRNP variation can be a useful tool for guiding management practices, which is especially important for CWD-free regions where deer are managed for ecological and economic benefits.
Collapse
Affiliation(s)
- Sarah E. Haworth
- Environmental and Life Sciences Graduate ProgramTrent UniversityPeterboroughONCanada
| | - Larissa Nituch
- Wildlife Research and Monitoring SectionOntario Ministry of Natural Resources and ForestryTrent UniversityPeterboroughONCanada
| | - Joseph M. Northrup
- Environmental and Life Sciences Graduate ProgramTrent UniversityPeterboroughONCanada
- Wildlife Research and Monitoring SectionOntario Ministry of Natural Resources and ForestryTrent UniversityPeterboroughONCanada
| | - Aaron B. A. Shafer
- Environmental and Life Sciences Graduate ProgramTrent UniversityPeterboroughONCanada
- Department of ForensicsTrent UniversityPeterboroughONCanada
| |
Collapse
|
16
|
Biddanda A, Rice DP, Novembre J. A variant-centric perspective on geographic patterns of human allele frequency variation. eLife 2020; 9:60107. [PMID: 33350384 PMCID: PMC7755386 DOI: 10.7554/elife.60107] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Accepted: 11/12/2020] [Indexed: 12/14/2022] Open
Abstract
A key challenge in human genetics is to understand the geographic distribution of human genetic variation. Often genetic variation is described by showing relationships among populations or individuals, drawing inferences over many variants. Here, we introduce an alternative representation of genetic variation that reveals the relative abundance of different allele frequency patterns. This approach allows viewers to easily see several features of human genetic structure: (1) most variants are rare and geographically localized, (2) variants that are common in a single geographic region are more likely to be shared across the globe than to be private to that region, and (3) where two individuals differ, it is most often due to variants that are found globally, regardless of whether the individuals are from the same region or different regions. Our variant-centric visualization clarifies the geographic patterns of human variation and can help address misconceptions about genetic differentiation among populations.
Collapse
Affiliation(s)
- Arjun Biddanda
- Department of Human Genetics, University of Chicago, Chicago, United States
| | - Daniel P Rice
- Department of Human Genetics, University of Chicago, Chicago, United States
| | - John Novembre
- Department of Human Genetics, University of Chicago, Chicago, United States
| |
Collapse
|
17
|
Albers PK, McVean G. Dating genomic variants and shared ancestry in population-scale sequencing data. PLoS Biol 2020; 18:e3000586. [PMID: 31951611 PMCID: PMC6992231 DOI: 10.1371/journal.pbio.3000586] [Citation(s) in RCA: 112] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Revised: 01/30/2020] [Accepted: 01/02/2020] [Indexed: 12/31/2022] Open
Abstract
The origin and fate of new mutations within species is the fundamental process underlying evolution. However, while much attention has been focused on characterizing the presence, frequency, and phenotypic impact of genetic variation, the evolutionary histories of most variants are largely unexplored. We have developed a nonparametric approach for estimating the date of origin of genetic variants in large-scale sequencing data sets. The accuracy and robustness of the approach is demonstrated through simulation. Using data from two publicly available human genomic diversity resources, we estimated the age of more than 45 million single-nucleotide polymorphisms (SNPs) in the human genome and release the Atlas of Variant Age as a public online database. We characterize the relationship between variant age and frequency in different geographical regions and demonstrate the value of age information in interpreting variants of functional and selective importance. Finally, we use allele age estimates to power a rapid approach for inferring the ancestry shared between individual genomes and to quantify genealogical relationships at different points in the past, as well as to describe and explore the evolutionary history of modern human populations.
Collapse
Affiliation(s)
- Patrick K. Albers
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
- * E-mail:
| | - Gil McVean
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| |
Collapse
|