1
|
Good BH, Bhatt AS, McDonald MJ. Unraveling the tempo and mode of horizontal gene transfer in bacteria. Trends Microbiol 2025:S0966-842X(25)00100-3. [PMID: 40274494 DOI: 10.1016/j.tim.2025.03.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2024] [Revised: 02/26/2025] [Accepted: 03/20/2025] [Indexed: 04/26/2025]
Abstract
Research on horizontal gene transfer (HGT) has surged over the past two decades, revealing its critical role in accelerating evolutionary rates, facilitating adaptive innovations, and shaping pangenomes. Recent experimental and theoretical results have shown how HGT shapes the flow of genetic information within and between populations, expanding the range of possibilities for microbial evolution. These advances set the stage for a new wave of research seeking to predict how HGT shapes microbial evolution within natural communities, especially during rapid ecological shifts. In this article, we highlight these developments and outline promising research directions, emphasizing the necessity of quantifying the rates of HGT within diverse ecological contexts.
Collapse
Affiliation(s)
- Benjamin H Good
- Department of Applied Physics, Stanford University, Stanford, CA, USA; Department of Biology, Stanford University, Stanford, CA, USA; Chan Zuckerberg Biohub-San Francisco, San Francisco, CA, USA.
| | - Ami S Bhatt
- Department of Medicine (Hematology, Blood and Marrow Transplantation), Stanford, CA, USA; Department of Genetics, Stanford University, Stanford, CA, USA
| | - Michael J McDonald
- ARC Centre for the Mathematical Analysis of Cellular Systems, Melbourne, Victoria, Australia; School of Biological Sciences, Monash University, Clayton, Victoria, Australia.
| |
Collapse
|
2
|
Strütt S, Excoffier L, Peischl S. A generalized structured coalescent for purifying selection without recombination. Genetics 2025; 229:iyaf013. [PMID: 39862229 DOI: 10.1093/genetics/iyaf013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2024] [Revised: 12/18/2024] [Accepted: 12/30/2024] [Indexed: 01/27/2025] Open
Abstract
Purifying selection is a critical factor in shaping genetic diversity. Current theoretical models mostly address scenarios of either very weak or strong selection, leaving a significant gap in our knowledge. The effects of purifying selection on patterns of genomic diversity remain poorly understood when selection against deleterious mutations is weak to moderate, particularly when recombination is limited or absent. In this study, we extend an existing approach, the fitness-class coalescent, to incorporate arbitrary levels of purifying selection in haploid populations. This model offers a comprehensive framework for exploring the influence of purifying selection in a wide range of demographic scenarios. Moreover, our research reveals potential sources of qualitative and quantitative biases in demographic inference, highlighting the significant risk of attributing genetic patterns to past demographic events rather than purifying selection. This work expands our understanding of the complex interplay between selection, drift, and population dynamics, and how purifying selection distorts demographic inference.
Collapse
Affiliation(s)
- Stefan Strütt
- Interfaculty Bioinformatics Unit, University of Bern, Baltzerstrasse 6, Bern 3012, Switzerland
- Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Baltzerstrasse 6, Bern 3012, Switzerland
| | - Laurent Excoffier
- Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Baltzerstrasse 6, Bern 3012, Switzerland
| | - Stephan Peischl
- Interfaculty Bioinformatics Unit, University of Bern, Baltzerstrasse 6, Bern 3012, Switzerland
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| |
Collapse
|
3
|
Zhang H, Zhang P, Niu Y, Tao T, Liu G, Dong C, Zheng Z, Zhang Z, Li Y, Niu Z, Liu W, Guo Z, Hu S, Yang Y, Li M, Sun H, Renner SS, Liu J. Genetic basis of camouflage in an alpine plant and its long-term co-evolution with an insect herbivore. Nat Ecol Evol 2025; 9:628-638. [PMID: 40065027 DOI: 10.1038/s41559-025-02653-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Accepted: 01/31/2025] [Indexed: 04/09/2025]
Abstract
Camouflage through colour change can involve reversible or permanent changes in response to cyclic predator or herbivore pressures. The evolution of background matching in camouflaged phenotypes partly depends on the genetics of the camouflage trait, but this has received little attention in plants. Here we clarify the genetic pathway underlying the grey-leaved morph of fumewort, Corydalis hemidicentra, of the Qinghai-Tibet Plateau that by being camouflaged escapes herbivory from caterpillars of host-specialized Parnassius butterflies. Field experiments show that camouflaged grey leaves matching the surrounding scree habitat experience reduced oviposition by female butterflies and herbivory by caterpillars, resulting in higher fruit set than that achieved by green-leaved plants. The defence is entirely visual. Multi-omics data and functional validation reveal that a 254-bp-inserted transposon causes anthocyanin accumulation in leaves, giving them a rock-like grey colour. Demographic analyses of plant and butterfly effective population sizes over the past 500 years indicate that plant populations have been more stable at sites with camouflage than at sites with only green-leaved plants. In the recent past, populations of Parnassius butterflies have declined at sites with camouflaged plants. These findings provide insights into the genetics of a plant camouflage trait and its potential role in the rapidly changing dynamics of plant-herbivore interactions.
Collapse
Affiliation(s)
- Han Zhang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Pan Zhang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Yang Niu
- State Key Laboratory of Plant Diversity and Specialty Crops, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
| | - Tongzhou Tao
- Key Laboratory for Bio-resource and Eco-environment of Ministry of Education & Sichuan Zoige Alpine Wetland Ecosystem National Observation and Research Station, College of Life Science, Sichuan University, Chengdu, China
| | - Gang Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Congcong Dong
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Zeyu Zheng
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Zengzhu Zhang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Ying Li
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Zhimin Niu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Wenyu Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Zemin Guo
- State Key Laboratory of Plant Diversity and Specialty Crops, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Shaoji Hu
- Yunnan Key Laboratory of International Rivers and Transboundary Eco-security, Yunnan University, Kunming, China
- Institute of International Rivers and Eco-security, Yunnan University, Kunming, China
| | - Yang Yang
- Building No. 10, Anwai Xiaoguanbeili, Chaoyang District, Beijing, China
| | - Minjie Li
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China.
| | - Hang Sun
- State Key Laboratory of Plant Diversity and Specialty Crops, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China.
| | - Susanne S Renner
- Department of Biology, Washington University, Saint Louis, MO, USA.
| | - Jianquan Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China.
- Key Laboratory for Bio-resource and Eco-environment of Ministry of Education & Sichuan Zoige Alpine Wetland Ecosystem National Observation and Research Station, College of Life Science, Sichuan University, Chengdu, China.
| |
Collapse
|
4
|
Ishigohoka J, Liedvogel M. High-recombining genomic regions affect demography inference based on ancestral recombination graphs. Genetics 2025; 229:iyaf004. [PMID: 39790013 PMCID: PMC11912872 DOI: 10.1093/genetics/iyaf004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 12/23/2024] [Indexed: 01/12/2025] Open
Abstract
Multiple methods of demography inference are based on the ancestral recombination graph. This powerful approach uses observed mutations to model local genealogies changing along chromosomes by historical recombination events. However, inference of underlying genealogies is difficult in regions with high recombination rate relative to mutation rate due to the lack of mutations representing genealogies. Despite the prevalence of high-recombining genomic regions in some organisms, such as birds, its impact on demography inference based on ancestral recombination graphs has not been well studied. Here, we use population genomic simulations to investigate the impact of high-recombining regions on demography inference based on ancestral recombination graphs. We demonstrate that inference of effective population size and the time of population split events is systematically affected when high-recombining regions cover wide breadths of the chromosomes. Excluding high-recombining genomic regions can practically mitigate this impact, and population genomic inference of recombination maps is informative in defining such regions although the estimated values of local recombination rate can be biased. Finally, we confirm the relevance of our findings in empirical analysis by contrasting demography inferences applied for a bird species, the Eurasian blackcap (Sylvia atricapilla), using different parts of the genome with high and low recombination rates. Our results suggest that demography inference methods based on ancestral recombination graphs should be carried out with caution when applied in species whose genomes contain long stretches of high-recombining regions.
Collapse
Affiliation(s)
- Jun Ishigohoka
- Max Planck Research Group Behavioural Genomics, Max Planck Institute for Evolutionary Biology, August-Thienemann-Straße 2, Plön 24306, Germany
| | - Miriam Liedvogel
- Max Planck Research Group Behavioural Genomics, Max Planck Institute for Evolutionary Biology, August-Thienemann-Straße 2, Plön 24306, Germany
- Institute of Avian Research, An der Vogelwarte 21, Wilhelmshaven 26386, Germany
- Department of Biology and Environmental Sciences, Carl von Ossietzky Universität Oldenburg, Ammerländer Heerstraße 114-118, Oldenburg 26129, Germany
| |
Collapse
|
5
|
Barroso GV, Ragsdale AP. A model for background selection in non-equilibrium populations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.19.639084. [PMID: 40027808 PMCID: PMC11870586 DOI: 10.1101/2025.02.19.639084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
In many taxa, levels of genetic diversity are observed to vary along their genome. The framework of background selection models this variation in terms of linkage to constrained sites, and recent applications have been able to explain a large portion of the variation in human genomes. However, these studies have also yielded conflicting results, stemming from two key limitations. First, existing models are inaccurate in the most critical region of parameter space( N e s ~ - 1 ) , where the reduction in diversity is sharpest. And second, they assume a constant population size over time. Here, we develop predictions for diversity under background selection based on the Hill-Robertson system of two-locus statistics, which allows for population size changes. We treat the joint effect of multiple selected loci independently, but we show that interference among them is well captured through local rescaling of mutation, recombination and selection in an iterative procedure that converges quickly. We further accommodate existing background selection theory to non-equilibrium demography, bridging the gap between weak and strong selection. Simulations show that our predictions are accurate over the entire range of selection coefficients. We characterize the temporal dynamics of linked selection under population size changes and demonstrate that patterns of diversity can be misinterpreted by other models. Specifically, biases due to the incorrect assumption of equilibrium carry over to downstream inferences of the distribution of fitness effects and deleterious mutation rate. Jointly modeling demography and linked selection therefore improves our understanding of the genomic landscape of diversity, which will help refine inferences of linked selection in humans and other species.
Collapse
Affiliation(s)
- Gustavo V. Barroso
- Department of Integrative Biology, University of Wisconsin-Madison, USA, 53706
| | - Aaron P. Ragsdale
- Department of Integrative Biology, University of Wisconsin-Madison, USA, 53706
| |
Collapse
|
6
|
Malinsky M, Talbi M, Zhou C, Maurer N, Sacco S, Shapiro B, Peichel CL, Seehausen O, Salzburger W, Weber JN, Bolnick DI, Green RE, Durbin R. Hi-reComb: constructing recombination maps from bulk gamete Hi-C sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.06.641907. [PMID: 40161681 PMCID: PMC11952307 DOI: 10.1101/2025.03.06.641907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Recombination is central to genetics and to evolution of sexually reproducing organisms. However, obtaining accurate estimates of recombination rates, and of how they vary along chromosomes, continues to be challenging. To advance our ability to estimate recombination rates, we present Hi-reComb, a new method and software for estimation of recombination maps from bulk gamete chromosome conformation capture sequencing (Hi-C). Simulations show that Hi-reComb produces robust, accurate recombination landscapes. With empirical data from sperm of five fish species we show the advantages of this approach, including joint assessment of recombination maps and large structural variants, map comparisons using bootstrap, and workflows with trio phasing vs. Hi-C phasing. With off-the-shelf library construction and a straightforward rapid workflow, our approach will facilitate routine recombination landscape estimation for a broad range of studies and model organisms in genetics and evolutionary biology. Hi-reComb is open-source and freely available at https://github.com/millanek/Hi-reComb.
Collapse
Affiliation(s)
- Milan Malinsky
- Institute of Ecology and Evolution, University of Bern, 3012 Bern, Switzerland
- Department of Fish Ecology and Evolution, EAWAG, 6047 Kastanienbaum, Switzerland
| | - Marion Talbi
- Institute of Ecology and Evolution, University of Bern, 3012 Bern, Switzerland
- Department of Fish Ecology and Evolution, EAWAG, 6047 Kastanienbaum, Switzerland
| | - Chenxi Zhou
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK
| | - Nicholas Maurer
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- UCSC Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Samuel Sacco
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- UCSC Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Beth Shapiro
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- UCSC Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - Ole Seehausen
- Institute of Ecology and Evolution, University of Bern, 3012 Bern, Switzerland
- Department of Fish Ecology and Evolution, EAWAG, 6047 Kastanienbaum, Switzerland
| | - Walter Salzburger
- Department of Environmental Sciences, Zoological Institute, University of Basel, 4051 Basel, Switzerland
| | - Jesse N. Weber
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Daniel I. Bolnick
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
| | - Richard E. Green
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- UCSC Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK
| |
Collapse
|
7
|
Talbi M, Turner GF, Malinsky M. Rapid evolution of recombination landscapes during the divergence of cichlid ecotypes in Lake Masoko. Evolution 2025; 79:364-379. [PMID: 39589917 DOI: 10.1093/evolut/qpae169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 11/06/2024] [Accepted: 11/25/2024] [Indexed: 11/28/2024]
Abstract
Variation of recombination rate along the genome is of crucial importance to rapid adaptation and organismal diversification. Many unknowns remain regarding how and why recombination landscapes evolve in nature. Here, we reconstruct recombination maps based on linkage disequilibrium and use subsampling and simulations to derive a new measure of recombination landscape evolution: the Population Recombination Divergence Index (PRDI). Using PRDI, we show that fine-scale recombination landscapes differ substantially between two cichlid fish ecotypes of Astatotilapia calliptera that diverged only ~2,500 generations ago. Perhaps surprisingly, recombination landscape differences are not driven by divergence in terms of allele frequency (FST) and nucleotide diversity (Δ(π)): although there is some association, we observe positive PRDI in regions where FST and Δ(π) are zero. We found a stronger association between the evolution of recombination and 47 large haplotype blocks that are polymorphic in Lake Masoko, cover 21% of the genome, and appear to include multiple inversions. Among haplotype blocks, there is a strong and clear association between the degree of recombination divergence and differences between ecotypes in heterozygosity, consistent with recombination suppression in heterozygotes. Overall, our work provides a holistic view of changes in population recombination landscapes during the early stages of speciation with gene flow.
Collapse
Affiliation(s)
- Marion Talbi
- Biology Department, Institute of Ecology and Evolution, University of Bern, Bern, Switzerland
- Department of Fish Ecology and Evolution, EAWAG, Kastanienbaum, Switzerland
| | - George F Turner
- School of Natural & Environmental Sciences, Bangor University, Bangor, United Kingdom
| | - Milan Malinsky
- Biology Department, Institute of Ecology and Evolution, University of Bern, Bern, Switzerland
- Department of Fish Ecology and Evolution, EAWAG, Kastanienbaum, Switzerland
| |
Collapse
|
8
|
Lalli JL, Bortvin AN, McCoy RC, Werling DM. A T2T-CHM13 recombination map and globally diverse haplotype reference panel improves phasing and imputation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.24.639687. [PMID: 40060455 PMCID: PMC11888259 DOI: 10.1101/2025.02.24.639687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/20/2025]
Abstract
The T2T-CHM13 complete human reference genome contains ~200 Mb of newly resolved sequence, improving read mapping and variant calling compared to GRCh38. However, the benefits of using complete reference genomes in other contexts are unclear. Here, we present a reference T2T-CHM13 recombination map and phased haplotype panel derived from 3202 samples from the 1000 Genomes Project (1KGP). Using published long-read based assemblies as a reference-neutral ground truth, we compared our T2T-CHM13 1KGP panel to the previously released GRCh38 1KGP phased callset. We find that alignment to T2T-CHM13 resulted in 38% fewer assembly-discordant genotypes and 16% fewer switch errors. The largest gains in panel accuracy are observed on chromosome X and in the regions flanking disease-causing CNVs. Simons Genome Diversity Project samples were more accurately imputed when using the T2T-CHM13 panel. Our study demonstrates that use of a T2T-native phased haplotype panel improves statistical phasing and imputation for samples from diverse human populations.
Collapse
Affiliation(s)
- Joseph L Lalli
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI, United States
| | - Andrew N Bortvin
- Department of Biology, Johns Hopkins University, Baltimore, MD, United States
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, MD, United States
- These authors jointly supervised this work
| | - Donna M Werling
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI, United States
- These authors jointly supervised this work
| |
Collapse
|
9
|
Mowlaei ME, Li C, Jamialahmadi O, Dias R, Chen J, Jamialahmadi B, Rebbeck TR, Carnevale V, Kumar S, Shi X. STICI: Split-Transformer with integrated convolutions for genotype imputation. Nat Commun 2025; 16:1218. [PMID: 39890780 PMCID: PMC11785734 DOI: 10.1038/s41467-025-56273-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Accepted: 01/08/2025] [Indexed: 02/03/2025] Open
Abstract
Despite advances in sequencing technologies, genome-scale datasets often contain missing bases and genomic segments, hindering downstream analyses. Genotype imputation addresses this issue and has been a cornerstone pre-processing step in genetic and genomic studies. Although various methods have been widely adopted for genotype imputation, it remains challenging to impute certain genomic regions and large structural variants. Here, we present a transformer-based framework, named STICI, for accurate genotype imputation. STICI models automatically learn genome-wide patterns of linkage disequilibrium, evidenced by much higher imputation accuracy in regions with highly linked variants. Our imputation results on the human 1000 Genomes Project and non-human genomes show that STICI can achieve high imputation accuracy comparable to the state-of-the-art genotype imputation methods, with the additional capability to impute multi-allelic variants and various types of genetic variants. STICI can be trained for any collection of genomes automatically using self-supervision. Moreover, STICI shows excellent performance without needing any special presuppositions about the underlying patterns in collections of non-human genomes, pointing to adaptability and applications of STICI to impute missing genotypes in any species.
Collapse
Affiliation(s)
- Mohammad Erfan Mowlaei
- Computer & Information Sciences, College of Science and Technology, Temple University, Philadelphia, PA, USA
| | - Chong Li
- Computer & Information Sciences, College of Science and Technology, Temple University, Philadelphia, PA, USA
| | - Oveis Jamialahmadi
- Department of Molecular and Clinical Medicine, Institute of Medicine, Sahlgrenska Academy, Wallenberg Laboratory, University of Gothenburg, Gothenburg, Sweden
| | - Raquel Dias
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
| | - Junjie Chen
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Benyamin Jamialahmadi
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, Canada
| | - Timothy Richard Rebbeck
- Division of Population Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Vincenzo Carnevale
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Institute for Computational Molecular Science, Temple University, Philadelphia, PA, USA
| | - Sudhir Kumar
- Computer & Information Sciences, College of Science and Technology, Temple University, Philadelphia, PA, USA
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Xinghua Shi
- Computer & Information Sciences, College of Science and Technology, Temple University, Philadelphia, PA, USA.
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA.
| |
Collapse
|
10
|
van den Belt S, Alachiotis N. Fast and accurate deep learning scans for signatures of natural selection in genomes using FASTER-NN. Commun Biol 2025; 8:58. [PMID: 39814854 PMCID: PMC11735897 DOI: 10.1038/s42003-025-07480-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Accepted: 01/07/2025] [Indexed: 01/18/2025] Open
Abstract
Deep learning classification models based on Convolutional Neural Networks (CNNs) are increasingly used in population genetic inference for detecting signatures of natural selection. Prevailing detection methods treat the design of the classifier as a discrete phase, assuming that high classification accuracy is the sole prerequisite for precise detection. This frequently steers method development toward classification-driven optimizations that can inadvertently impede detection. We present FASTER-NN, a CNN classifier designed specifically for the precise detection of natural selection. It has higher sensitivity than state-of-the-art CNN classifiers while only processing allele frequencies and genomic positions through dilated convolutions to maximize data reuse. As a result, execution time is invariant to the sample size and the chromosome length, creating a highly suitable solution for large-scale, whole-genome scans. Furthermore, FASTER-NN can accurately identify selective sweeps in recombination hotspots, which is a highly challenging detection problem with very limited theoretical treatment to date.
Collapse
|
11
|
Topaloudis A, Cumer T, Lavanchy E, Ducrest AL, Simon C, Machado AP, Paposhvili N, Roulin A, Goudet J. The recombination landscape of the barn owl, from families to populations. Genetics 2025; 229:1-50. [PMID: 39545468 PMCID: PMC11708917 DOI: 10.1093/genetics/iyae190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Accepted: 11/01/2024] [Indexed: 11/17/2024] Open
Abstract
Homologous recombination is a meiotic process that generates diversity along the genome and interacts with all evolutionary forces. Despite its importance, studies of recombination landscapes are lacking due to methodological limitations and limited data. Frequently used approaches include linkage mapping based on familial data that provides sex-specific broad-scale estimates of realized recombination and inferences based on population linkage disequilibrium that reveal a more fine-scale resolution of the recombination landscape, albeit dependent on the effective population size and the selective forces acting on the population. In this study, we use a combination of these 2 methods to elucidate the recombination landscape for the Afro-European barn owl (Tyto alba). We find subtle differences in crossover placement between sexes that lead to differential effective shuffling of alleles. Linkage disequilibrium-based estimates of recombination are concordant with family-based estimates and identify large variation in recombination rates within and among linkage groups. Larger chromosomes show variation in recombination rates, while smaller chromosomes have a universally high rate that shapes the diversity landscape. We find that recombination rates are correlated with gene content, genetic diversity, and GC content. We find no conclusive differences in the recombination landscapes between populations. Overall, this comprehensive analysis enhances our understanding of recombination dynamics, genomic architecture, and sex-specific variation in the barn owl, contributing valuable insights to the broader field of avian genomics.
Collapse
Affiliation(s)
- Alexandros Topaloudis
- Department of Ecology and Evolution, University of Lausanne, Lausanne 1015, Switzerland
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| | - Tristan Cumer
- Department of Ecology and Evolution, University of Lausanne, Lausanne 1015, Switzerland
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| | - Eléonore Lavanchy
- Department of Ecology and Evolution, University of Lausanne, Lausanne 1015, Switzerland
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| | - Anne-Lyse Ducrest
- Department of Ecology and Evolution, University of Lausanne, Lausanne 1015, Switzerland
| | - Celine Simon
- Department of Ecology and Evolution, University of Lausanne, Lausanne 1015, Switzerland
| | - Ana Paula Machado
- Department of Ecology and Evolution, University of Lausanne, Lausanne 1015, Switzerland
| | - Nika Paposhvili
- Institute of Ecology, Ilia State University, Tbilisi 0162, Georgia
| | - Alexandre Roulin
- Department of Ecology and Evolution, University of Lausanne, Lausanne 1015, Switzerland
| | - Jérôme Goudet
- Department of Ecology and Evolution, University of Lausanne, Lausanne 1015, Switzerland
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| |
Collapse
|
12
|
Soni V, Versoza CJ, Terbot JW, Jensen JD, Pfeifer SP. Inferring fine-scale mutation and recombination rate maps in aye-ayes ( Daubentonia madagascariensis ). BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.28.630620. [PMID: 39763842 PMCID: PMC11703150 DOI: 10.1101/2024.12.28.630620] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/20/2025]
Abstract
The rate of input of new genetic mutations, and the rate at which that variation is reshuffled, are key evolutionary processes shaping genomic diversity. Importantly, these rates vary not just across populations and species, but also across individual genomes. Despite previous studies having demonstrated that failing to account for rate heterogeneity across the genome can bias the inference of both selective and neutral population genetic processes, mutation and recombination rate maps have to date only been generated for a relatively small number of organisms. Here, we infer such fine-scale maps for the aye-aye ( Daubentonia madagascariensis ) - a highly endangered strepsirrhine that represents one of the earliest splits in the primate clade, and thus stands as an important outgroup to the more commonly-studied haplorrhines - utilizing a recently released fully-annotated genome combined with high-quality population sequencing data. We compare our indirectly inferred rates to previous pedigree-based estimates, finding further evidence of relatively low mutation and recombination rates in aye-ayes compared to other primates.
Collapse
|
13
|
Schield DR, Carter JK, Scordato ESC, Levin II, Wilkins MR, Mueller SA, Gompert Z, Nosil P, Wolf JBW, Safran RJ. Sexual selection promotes reproductive isolation in barn swallows. Science 2024; 386:eadj8766. [PMID: 39666856 DOI: 10.1126/science.adj8766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 06/25/2024] [Accepted: 10/11/2024] [Indexed: 12/14/2024]
Abstract
Despite the well-known effects of sexual selection on phenotypes, links between this evolutionary process and reproductive isolation, genomic divergence, and speciation have been difficult to establish. We unravel the genetic basis of sexually selected plumage traits to investigate their effects on reproductive isolation in barn swallows. The genetic architecture of sexual traits is characterized by 12 loci on two autosomes and the Z chromosome. Sexual trait loci exhibit signatures of divergent selection in geographic isolation and barriers to gene flow in secondary contact. Linkage disequilibrium between these genes has been maintained by selection in hybrid zones beyond what would be expected under admixture alone. Our findings reveal that selection on coupled sexual trait loci promotes reproductive isolation, providing key empirical evidence for the role of sexual selection in speciation.
Collapse
Affiliation(s)
- Drew R Schield
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, CO, USA
- Department of Biology, University of Virginia, Charlottesville, VA, USA
| | - Javan K Carter
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, CO, USA
| | - Elizabeth S C Scordato
- Department of Biological Sciences, California State Polytechnic University, Pomona, CA, USA
| | - Iris I Levin
- Department of Biology, Kenyon College, Gambier, OH, USA
| | - Matthew R Wilkins
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, CO, USA
- Galactic Polymath Education Studio, Minneapolis, MN, USA
| | - Sarah A Mueller
- Division of Evolutionary Biology, Faculty of Biology, Ludwig Maximilian University of Munich, Munich, Germany
| | | | - Patrik Nosil
- CEFE, Université Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Jochen B W Wolf
- Division of Evolutionary Biology, Faculty of Biology, Ludwig Maximilian University of Munich, Munich, Germany
| | - Rebecca J Safran
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, CO, USA
| |
Collapse
|
14
|
Ishigohoka J, Bascón-Cardozo K, Bours A, Fuß J, Rhie A, Mountcastle J, Haase B, Chow W, Collins J, Howe K, Uliano-Silva M, Fedrigo O, Jarvis ED, Pérez-Tris J, Illera JC, Liedvogel M. Distinct patterns of genetic variation at low-recombining genomic regions represent haplotype structure. Evolution 2024; 78:1916-1935. [PMID: 39208288 DOI: 10.1093/evolut/qpae117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 07/26/2024] [Accepted: 09/24/2024] [Indexed: 09/04/2024]
Abstract
Genomic regions sometimes show patterns of genetic variation distinct from the genome-wide population structure. Such deviations have often been interpreted to represent effects of selection. However, systematic investigation of whether and how non-selective factors, such as recombination rates, can affect distinct patterns has been limited. Here, we associate distinct patterns of genetic variation with reduced recombination rates in a songbird, the Eurasian blackcap (Sylvia atricapilla), using a new reference genome assembly, whole-genome resequencing data and recombination maps. We find that distinct patterns of genetic variation reflect haplotype structure at genomic regions with different prevalence of reduced recombination rate across populations. At low-recombining regions shared in most populations, distinct patterns reflect conspicuous haplotypes segregating in multiple populations. At low-recombining regions found only in a few populations, distinct patterns represent variance among cryptic haplotypes within the low-recombining populations. With simulations, we confirm that these distinct patterns evolve neutrally by reduced recombination rate, on which the effects of selection can be overlaid. Our results highlight that distinct patterns of genetic variation can emerge through evolutionary reduction of local recombination rate. The recombination landscape as an evolvable trait therefore plays an important role determining the heterogeneous distribution of genetic variation along the genome.
Collapse
Affiliation(s)
- Jun Ishigohoka
- Max Planck Institute for Evolutionary Biology, Plön, Germany
| | | | - Andrea Bours
- Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Janina Fuß
- Institute of Clinical Molecular Biology (IKMB), Kiel University, Kiel, Germany
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Jacquelyn Mountcastle
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Bettina Haase
- The Vertebrate Genome Lab, Rockefeller University, New York, NY, USA
| | | | | | | | | | - Olivier Fedrigo
- The Vertebrate Genome Lab, Rockefeller University, New York, NY, USA
| | - Erich D Jarvis
- The Vertebrate Genome Lab, Rockefeller University, New York, NY, USA
- Laboratory of Neurogenetics of Language, Rockefeller University, New York, NY, USA
- The Howards Hughes Medical Institute, Chevy Chase, MD, USA
| | - Javier Pérez-Tris
- Department of Biodiversity, Ecology and Evolution, Complutense University of Madrid, Madrid, Spain
| | - Juan Carlos Illera
- Biodiversity Research Institute (CSIC-Oviedo University-Principality of Asturias), Oviedo University, Mieres, Spain
| | - Miriam Liedvogel
- Max Planck Institute for Evolutionary Biology, Plön, Germany
- Institute of Avian Research, Wilhelmshaven, Germany
- Department of Biology and Environmental Sciences, Carl von Ossietzky Universität Oldenburg, Germany
| |
Collapse
|
15
|
Amorim CEG, Di C, Lin M, Marsden C, Del Carpio CA, Mah JC, Robinson J, Kim BY, Mooney JA, Cornejo OE, Lohmueller KE. Evolutionary consequences of domestication on the selective effects of new amino acid changing mutations in canids. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.13.623529. [PMID: 39605619 PMCID: PMC11601280 DOI: 10.1101/2024.11.13.623529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
The domestication of wild canids led to dogs no longer living in the wild but instead residing alongside humans. Extreme changes in behavior and diet associated with domestication may have led to the relaxation of the selective pressure on traits that may be less important in the domesticated context. Thus, here we hypothesize that strongly deleterious mutations may have become less deleterious in domesticated populations. We test this hypothesis by estimating the distribution of fitness effects (DFE) for new amino acid changing mutations using whole-genome sequence data from 24 gray wolves and 61 breed dogs. We find that the DFE is strikingly similar across canids, with 26-28% of new amino acid changing mutations being neutral/nearly neutral (|s| < 1e-5), and 41-48% under strong purifying selection (|s| > 1e-2). Our results are robust to different model assumptions suggesting that the DFE is stable across short evolutionary timescales, even in the face of putative drastic changes in the selective pressure caused by artificial selection during domestication and breed formation. On par with previous works describing DFE evolution, our data indicate that the DFE of amino acid changing mutations depends more strongly on genome structure and organismal characteristics, and less so on shifting selective pressures or environmental factors. Given the constant DFE and previous data showing that genetic variants that differentiate wolf and dog populations are enriched in regulatory elements, we speculate that domestication may have had a larger impact on regulatory variation than on amino acid changing mutations.
Collapse
Affiliation(s)
| | - Chenlu Di
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, 90095, USA
| | - Meixi Lin
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California, 94720, USA
| | - Clare Marsden
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, 90095, USA
- Serology/DNA unit, Forensic Science Division, Los Angeles Police Department, Los Angeles CA 90032
| | - Christina A. Del Carpio
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, 90095, USA
| | - Jonathan C. Mah
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, 90095, USA
| | - Jacqueline Robinson
- Institute for Human Genetics, University of California San Francisco, San Francisco CA 94143
| | - Bernard Y. Kim
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA
| | - Jazlyn A. Mooney
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, 90089, USA
| | - Omar E. Cornejo
- Ecology & Evolutionary Biology Department, University of California, Santa Cruz, California, 95060, USA
| | - Kirk E. Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California, 90095, USA
| |
Collapse
|
16
|
Lyulina AS, Liu Z, Good BH. Linkage equilibrium between rare mutations. Genetics 2024; 228:iyae145. [PMID: 39222343 PMCID: PMC11538400 DOI: 10.1093/genetics/iyae145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 08/20/2024] [Indexed: 09/04/2024] Open
Abstract
Recombination breaks down genetic linkage by reshuffling existing variants onto new genetic backgrounds. These dynamics are traditionally quantified by examining the correlations between alleles, and how they decay as a function of the recombination rate. However, the magnitudes of these correlations are strongly influenced by other evolutionary forces like natural selection and genetic drift, making it difficult to tease out the effects of recombination. Here, we introduce a theoretical framework for analyzing an alternative family of statistics that measure the homoplasy produced by recombination. We derive analytical expressions that predict how these statistics depend on the rates of recombination and recurrent mutation, the strength of negative selection and genetic drift, and the present-day frequencies of the mutant alleles. We find that the degree of homoplasy can strongly depend on this frequency scale, which reflects the underlying timescales over which these mutations occurred. We show how these scaling properties can be used to isolate the effects of recombination and discuss their implications for the rates of horizontal gene transfer in bacteria.
Collapse
Affiliation(s)
- Anastasia S Lyulina
- Department of Biology, Stanford University, Stanford, CA 94305, USA
- Department of Applied Physics, Stanford University, Stanford, CA 94305, USA
| | - Zhiru Liu
- Department of Applied Physics, Stanford University, Stanford, CA 94305, USA
| | - Benjamin H Good
- Department of Biology, Stanford University, Stanford, CA 94305, USA
- Department of Applied Physics, Stanford University, Stanford, CA 94305, USA
- Chan Zuckerberg Biohub – San Francisco, San Francisco, CA 94158, USA
| |
Collapse
|
17
|
Zhang J, Lyu H, Chen J, Cao X, Du R, Ma L, Wang N, Zhu Z, Rao J, Wang J, Zhong K, Lyu Y, Wang Y, Lin T, Zhou Y, Zhou Y, Zhu G, Fei Z, Klee H, Huang S. Releasing a sugar brake generates sweeter tomato without yield penalty. Nature 2024; 635:647-656. [PMID: 39537922 PMCID: PMC11578880 DOI: 10.1038/s41586-024-08186-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Accepted: 10/09/2024] [Indexed: 11/16/2024]
Abstract
In tomato, sugar content is highly correlated with consumer preferences, with most consumers preferring sweeter fruit1-4. However, the sugar content of commercial varieties is generally low, as it is inversely correlated with fruit size, and growers prioritize yield over flavour quality5-7. Here we identified two genes, tomato (Solanum lycopersicum) calcium-dependent protein kinase 27 (SlCDPK27; also known as SlCPK27) and its paralogue SlCDPK26, that control fruit sugar content. They act as sugar brakes by phosphorylating a sucrose synthase, which promotes degradation of the sucrose synthase. Gene-edited SlCDPK27 and SlCDPK26 knockouts increased glucose and fructose contents by up to 30%, enhancing perceived sweetness without fruit weight or yield penalty. Although there are fewer, lighter seeds in the mutants, they exhibit normal germination. Together, these findings provide insight into the regulatory mechanisms controlling fruit sugar accumulation in tomato and offer opportunities to increase sugar content in large-fruited cultivars without sacrificing size and yield.
Collapse
Affiliation(s)
- Jinzhe Zhang
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Hongjun Lyu
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
- Shandong Key Laboratory of Bulk Open-Field Vegetable Breeding, Ministry of Agriculture and Rural Affairs Key Laboratory of Huang Huai Protected Horticulture Engineering, Institute of Vegetables, Shandong Academy of Agricultural Sciences, Jinan, China
| | - Jie Chen
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Xue Cao
- State Key Laboratory for Quality Ensurance and Sustainable Use of Dao-di Herbs, Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, China
| | - Ran Du
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Liang Ma
- State Key Laboratory of Plant Environmental Resilience (SKLPER), College of Biological Sciences, China Agricultural University, Beijing, China
| | - Nan Wang
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Zhiguo Zhu
- School of Life Sciences, Yunnan Key Laboratory of Potato Biology, Yunnan Normal University, Southwest United Graduate School, Kunming, China
| | - Jianglei Rao
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Jie Wang
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Kui Zhong
- Agriculture and Food Standardization Institute, China National Institute of Standardization, Beijing, China
| | - Yaqing Lyu
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Yanling Wang
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Tao Lin
- College of Horticulture, China Agricultural University, Beijing, China
| | - Yao Zhou
- Key Laboratory of Plant Molecular Physiology, Institute of Botany, Chinese Academy of Sciences, China University of Chinese Academy of Sciences, Beijing, China
| | - Yongfeng Zhou
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Guangtao Zhu
- School of Life Sciences, Yunnan Key Laboratory of Potato Biology, Yunnan Normal University, Southwest United Graduate School, Kunming, China
| | - Zhangjun Fei
- Boyce Thompson Institute, Cornell University, Ithaca, NY, USA
| | - Harry Klee
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Sanwen Huang
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China.
- National Key Laboratory of Tropical Crop Breeding, Chinese Academy of Tropical Agricultural Sciences, Haikou, China.
| |
Collapse
|
18
|
Escuer P, Guirao-Rico S, Arnedo MA, Sánchez-Gracia A, Rozas J. Population Genomics of Adaptive Radiations: Exceptionally High Levels of Genetic Diversity and Recombination in an Endemic Spider From the Canary Islands. Mol Ecol 2024; 33:e17547. [PMID: 39400446 DOI: 10.1111/mec.17547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 08/26/2024] [Accepted: 09/24/2024] [Indexed: 10/15/2024]
Abstract
The spider genus Dysdera has undergone a remarkable diversification in the oceanic archipelago of the Canary Islands, with ~60 endemic species having originated during the 20 million years since the origin of the archipelago. This evolutionary radiation has been accompanied by substantial dietary shifts, often characterised by phenotypic modifications encompassing morphological, metabolic and behavioural changes. Hence, these endemic spiders represent an excellent model for understanding the evolutionary drivers and to pinpoint the genomic determinants underlying adaptive radiations. Recently, we achieved the first chromosome-level genome assembly of one of the endemic species, D. silvatica, providing a high-quality reference sequence for evolutionary genomics studies. Here, we conducted a low coverage-based resequencing study of a natural population of D. silvatica from La Gomera island. Taking advantage of the new high-quality genome, we characterised genome-wide levels of nucleotide polymorphism, divergence and linkage disequilibrium, and inferred the demographic history of this population. We also performed comprehensive genome-wide scans for recent positive selection. Our findings uncovered exceptionally high levels of nucleotide diversity and recombination in this geographically restricted endemic species, indicative of large historical effective population sizes. We also identified several candidate genomic regions that are potentially under positive selection, highlighting relevant biological processes, such as vision and nitrogen extraction as potential adaptation targets. These processes may ultimately drive species diversification in this genus. This pioneering study of spiders that are endemic to an oceanic archipelago lays the groundwork for broader population genomics analyses aimed at understanding the genetic mechanisms driving adaptive radiation in island ecosystems.
Collapse
Affiliation(s)
- Paula Escuer
- Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona, Barcelona, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
- Institute of Biology, University of Neuchâtel, Neuchâtel, Switzerland
| | - Sara Guirao-Rico
- Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona, Barcelona, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | - Miquel A Arnedo
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
- Departament de Biologia Evolutiva, Ecologia i Ciències Ambientals, Universitat de Barcelona, Barcelona, Spain
| | - Alejandro Sánchez-Gracia
- Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona, Barcelona, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | - Julio Rozas
- Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona, Barcelona, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| |
Collapse
|
19
|
van den Belt S, Zhao H, Alachiotis N. Scalable CNN-based classification of selective sweeps using derived allele frequencies. Bioinformatics 2024; 40:ii29-ii36. [PMID: 39230693 PMCID: PMC11373383 DOI: 10.1093/bioinformatics/btae385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024] Open
Abstract
MOTIVATION Selective sweeps can successfully be distinguished from neutral genetic data using summary statistics and likelihood-based methods that analyze single nucleotide polymorphisms (SNPs). However, these methods are sensitive to confounding factors, such as severe population bottlenecks and old migration. By virtue of machine learning, and specifically convolutional neural networks (CNNs), new accurate classification models that are robust to confounding factors have been recently proposed. However, such methods are more computationally expensive than summary-statistic-based ones, yielding them impractical for processing large-scale genomic data. Moreover, SNP data are frequently preprocessed to improve classification accuracy, further exacerbating the long analysis times. RESULTS To this end, we propose a 1D CNN-based model, dubbed FAST-NN, that does not require any preprocessing while using only derived allele frequencies instead of summary statistics or raw SNP data, thereby yielding a sample-size-invariant, scalable solution. We evaluated several data fusion approaches to account for the variance of the density of genetic diversity across genomic regions (a selective sweep signature), and performed an extensive neural architecture search based on a state-of-the-art reference network architecture (SweepNet). The resulting model, FAST-NN, outperforms the reference architecture by up to 12% inference accuracy over all challenging evolutionary scenarios with confounding factors that were evaluated. Moreover, FAST-NN is between 30× and 259× faster on a single CPU core, and between 2.0× and 6.2× faster on a GPU, when processing sample sizes between 128 and 1000 samples. Our work paves the way for the practical use of CNNs in large-scale selective sweep detection. AVAILABILITY AND IMPLEMENTATION https://github.com/SjoerdvandenBelt/FAST-NN.
Collapse
Affiliation(s)
- Sjoerd van den Belt
- Department of Computer Science, Faculty of EEMCS, University of Twente, 7522NB Enschede, The Netherlands
| | - Hanqing Zhao
- Department of Computer Science, Faculty of EEMCS, University of Twente, 7522NB Enschede, The Netherlands
| | - Nikolaos Alachiotis
- Department of Computer Science, Faculty of EEMCS, University of Twente, 7522NB Enschede, The Netherlands
| |
Collapse
|
20
|
Takayama J, Makino S, Funayama T, Ueki M, Narita A, Murakami K, Orui M, Ishikuro M, Obara T, Kuriyama S, Yamamoto M, Tamiya G. A fine-scale genetic map of the Japanese population. Clin Genet 2024; 106:284-292. [PMID: 38719617 DOI: 10.1111/cge.14536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 04/13/2024] [Accepted: 04/15/2024] [Indexed: 08/13/2024]
Abstract
Genetic maps are fundamental resources for linkage and association studies. A fine-scale genetic map can be constructed by inferring historical recombination events from the genome-wide structure of linkage disequilibrium-a non-random association of alleles among loci-by using population-scale sequencing data. We constructed a fine-scale genetic map and identified recombination hotspots from 10 092 551 bi-allelic high-quality autosomal markers segregating among 150 unrelated Japanese individuals whose genotypes were determined by high-coverage (30×) whole-genome sequencing, and the genotype quality was carefully controlled by using their parents' and offspring's genotypes. The pedigree information was also utilized for haplotype phasing. The resulting genome-wide recombination rate profiles were concordant with those of the worldwide population on a broad scale, and the resolution was much improved. We identified 9487 recombination hotspots and confirmed the enrichment of previously known motifs in the hotspots. Moreover, we demonstrated that the Japanese genetic map improved the haplotype phasing and genotype imputation accuracy for the Japanese population. The construction of a population-specific genetic map will help make genetics research more accurate.
Collapse
Affiliation(s)
- Jun Takayama
- Department of AI and Innovative Medicine, Tohoku University School of Medicine, Sendai, Japan
- Department of Integrative Genomics, Tohoku Medical Megabank Organization (ToMMo) Tohoku University, Sendai, Japan
- Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
| | - Satoshi Makino
- Department of Integrative Genomics, Tohoku Medical Megabank Organization (ToMMo) Tohoku University, Sendai, Japan
| | - Takamitsu Funayama
- Department of Integrative Genomics, Tohoku Medical Megabank Organization (ToMMo) Tohoku University, Sendai, Japan
- Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
| | - Masao Ueki
- Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
| | - Akira Narita
- Department of Integrative Genomics, Tohoku Medical Megabank Organization (ToMMo) Tohoku University, Sendai, Japan
| | - Keiko Murakami
- Department of Preventive Medicine and Epidemiology, ToMMo, Tohoku University, Sendai, Japan
| | - Masatsugu Orui
- Department of Preventive Medicine and Epidemiology, ToMMo, Tohoku University, Sendai, Japan
- Department of Molecular Epidemiology, Tohoku University School of Medicine, Sendai, Japan
| | - Mami Ishikuro
- Department of Preventive Medicine and Epidemiology, ToMMo, Tohoku University, Sendai, Japan
- Department of Molecular Epidemiology, Tohoku University School of Medicine, Sendai, Japan
| | - Taku Obara
- Department of Preventive Medicine and Epidemiology, ToMMo, Tohoku University, Sendai, Japan
- Department of Molecular Epidemiology, Tohoku University School of Medicine, Sendai, Japan
| | - Shinichi Kuriyama
- Department of Preventive Medicine and Epidemiology, ToMMo, Tohoku University, Sendai, Japan
- Department of Molecular Epidemiology, Tohoku University School of Medicine, Sendai, Japan
| | - Masayuki Yamamoto
- Department of Integrative Genomics, Tohoku Medical Megabank Organization (ToMMo) Tohoku University, Sendai, Japan
| | - Gen Tamiya
- Department of AI and Innovative Medicine, Tohoku University School of Medicine, Sendai, Japan
- Department of Integrative Genomics, Tohoku Medical Megabank Organization (ToMMo) Tohoku University, Sendai, Japan
- Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
| |
Collapse
|
21
|
Ohadi M, Arabfard M, Khamse S, Alizadeh S, Vafadar S, Bayat H, Tajeddin N, Maddi AMA, Delbari A, Khorram Khorshid HR. Novel crossover and recombination hotspots massively spread across primate genomes. Biol Direct 2024; 19:70. [PMID: 39169390 PMCID: PMC11340189 DOI: 10.1186/s13062-024-00508-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Accepted: 07/29/2024] [Indexed: 08/23/2024] Open
Abstract
BACKGROUND The recombination landscape and subsequent natural selection have vast consequences forevolution and speciation. However, most of the crossover and recombination hotspots are yet to be discovered. We previously reported the relevance of C and G trinucleotide two-repeat units (CG-TTUs) in crossovers and recombination. METHODS On a genome-wide scale, here we mapped all combinations of A and T trinucleotide two-repeat units (AT-TTUs) in human, consisting of AATAAT, ATAATA, ATTATT, TTATTA, TATTAT, and TAATAA. We also compared a number of the colonies formed by the AT-TTUs (distance between consecutive AT-TTUs < 500 bp) in several other primates and mouse. RESULTS We found that the majority of the AT-TTUs (> 96%) resided in approximately 1.4 million colonies, spread throughout the human genome. In comparison to the CG-TTU colonies, the AT-TTU colonies were significantly more abundant and larger in size. Pure units and overlapping units of the pure units were readily detectable in the same colonies, signifying that the units were the sites of unequal crossover. We discovered dynamic sharedness of several of the colonies across the primate species studied, which mainly reached maximum complexity and size in human. CONCLUSIONS We report novel crossover and recombination hotspots of the finest molecular resolution, massively spread and shared across the genomes of human and several other primates. With respect to crossovers and recombination, these genomes are far more dynamic than previously envisioned.
Collapse
Affiliation(s)
- Mina Ohadi
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
| | - Masoud Arabfard
- Chemical Injuries Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran.
| | - Safoura Khamse
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Samira Alizadeh
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Sara Vafadar
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Hadi Bayat
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
- Biochemical Neuroendocrinology, Montreal Clinical and Research Institute (IRCM, affiliated to the McGill University, Montreal, QC, H2W 1R7, Canada
| | - Nahid Tajeddin
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
- Department of Biology, Central Tehran Branch, Islamic Azad University, Tehran, Iran
| | - Ali M A Maddi
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
| | - Ahmad Delbari
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Hamid R Khorram Khorshid
- Personalized Medicine and Genometabolomics Research Center, Hope Generation Foundation, Tehran, Iran
| |
Collapse
|
22
|
Li L, Comi TJ, Bierman RF, Akey JM. Recurrent gene flow between Neanderthals and modern humans over the past 200,000 years. Science 2024; 385:eadi1768. [PMID: 38991054 DOI: 10.1126/science.adi1768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 05/14/2024] [Indexed: 07/13/2024]
Abstract
Although it is well known that the ancestors of modern humans and Neanderthals admixed, the effects of gene flow on the Neanderthal genome are not well understood. We develop methods to estimate the amount of human-introgressed sequences in Neanderthals and apply it to whole-genome sequence data from 2000 modern humans and three Neanderthals. We estimate that Neanderthals have 2.5 to 3.7% human ancestry, and we leverage human-introgressed sequences in Neanderthals to revise estimates of Neanderthal ancestry in modern humans, show that Neanderthal population sizes were significantly smaller than previously estimated, and identify two distinct waves of modern human gene flow into Neanderthals. Our data provide insights into the genetic legacy of recurrent gene flow between modern humans and Neanderthals.
Collapse
Affiliation(s)
- Liming Li
- Department of Medical Genetics and Developmental Biology, School of Medicine, The Key Laboratory of Developmental Genes and Human Diseases, Ministry of Education, Southeast University, Nanjing 210009, China
- The Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA
| | - Troy J Comi
- The Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA
| | - Rob F Bierman
- The Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA
| | - Joshua M Akey
- The Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA
| |
Collapse
|
23
|
Dutheil JY. On the estimation of genome-average recombination rates. Genetics 2024; 227:iyae051. [PMID: 38565705 PMCID: PMC11232287 DOI: 10.1093/genetics/iyae051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 03/13/2024] [Accepted: 03/20/2024] [Indexed: 04/04/2024] Open
Abstract
The rate at which recombination events occur in a population is an indicator of its effective population size and the organism's reproduction mode. It determines the extent of linkage disequilibrium along the genome and, thereby, the efficacy of both purifying and positive selection. The population recombination rate can be inferred using models of genome evolution in populations. Classic methods based on the patterns of linkage disequilibrium provide the most accurate estimates, providing large sample sizes are used and the demography of the population is properly accounted for. Here, the capacity of approaches based on the sequentially Markov coalescent (SMC) to infer the genome-average recombination rate from as little as a single diploid genome is examined. SMC approaches provide highly accurate estimates even in the presence of changing population sizes, providing that (1) within genome heterogeneity is accounted for and (2) classic maximum-likelihood optimization algorithms are employed to fit the model. SMC-based estimates proved sensitive to gene conversion, leading to an overestimation of the recombination rate if conversion events are frequent. Conversely, methods based on the correlation of heterozygosity succeed in disentangling the rate of crossing over from that of gene conversion events, but only when the population size is constant and the recombination landscape homogeneous. These results call for a convergence of these two methods to obtain accurate and comparable estimates of recombination rates between populations.
Collapse
Affiliation(s)
- Julien Y Dutheil
- Max Planck Institute for Evolutionary Biology, August-Thienemann-Str. 2, Plön 24306, Germany
| |
Collapse
|
24
|
Joseph J, Prentout D, Laverré A, Tricou T, Duret L. High prevalence of PRDM9-independent recombination hotspots in placental mammals. Proc Natl Acad Sci U S A 2024; 121:e2401973121. [PMID: 38809707 PMCID: PMC11161765 DOI: 10.1073/pnas.2401973121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 04/26/2024] [Indexed: 05/31/2024] Open
Abstract
In many mammals, recombination events are concentrated in hotspots directed by a sequence-specific DNA-binding protein named PRDM9. Intriguingly, PRDM9 has been lost several times in vertebrates, and notably among mammals, it has been pseudogenized in the ancestor of canids. In the absence of PRDM9, recombination hotspots tend to occur in promoter-like features such as CpG islands. It has thus been proposed that one role of PRDM9 could be to direct recombination away from PRDM9-independent hotspots. However, the ability of PRDM9 to direct recombination hotspots has been assessed in only a handful of species, and a clear picture of how much recombination occurs outside of PRDM9-directed hotspots in mammals is still lacking. In this study, we derived an estimator of past recombination activity based on signatures of GC-biased gene conversion in substitution patterns. We quantified recombination activity in PRDM9-independent hotspots in 52 species of boreoeutherian mammals. We observe a wide range of recombination rates at these loci: several species (such as mice, humans, some felids, or cetaceans) show a deficit of recombination, while a majority of mammals display a clear peak of recombination. Our results demonstrate that PRDM9-directed and PRDM9-independent hotspots can coexist in mammals and that their coexistence appears to be the rule rather than the exception. Additionally, we show that the location of PRDM9-independent hotspots is relatively more stable than that of PRDM9-directed hotspots, but that PRDM9-independent hotspots nevertheless evolve slowly in concert with DNA hypomethylation.
Collapse
Affiliation(s)
- Julien Joseph
- Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, CNRS, UMR 5558, Villeurbanne69100, France
| | - Djivan Prentout
- Department of Biological Sciences, Columbia University, New York, NY10027
| | - Alexandre Laverré
- Department of Ecology and Evolution, University of Lausanne, LausanneCH-1015, Switzerland
- Swiss Institute of Bioinformatics, LausanneCH-1015, Switzerland
| | - Théo Tricou
- Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, CNRS, UMR 5558, Villeurbanne69100, France
| | - Laurent Duret
- Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, CNRS, UMR 5558, Villeurbanne69100, France
| |
Collapse
|
25
|
Chen Z, Zhou M, Sun Y, Tang X, Zhang Z, Huang L. Exploration of Genome-Wide Recombination Rate Variation Patterns at Different Scales in Pigs. Animals (Basel) 2024; 14:1345. [PMID: 38731349 PMCID: PMC11083071 DOI: 10.3390/ani14091345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 04/27/2024] [Accepted: 04/28/2024] [Indexed: 05/13/2024] Open
Abstract
Meiotic recombination is a prevalent process in eukaryotic sexual reproduction organisms that plays key roles in genetic diversity, breed selection, and species evolution. However, the recombination events differ across breeds and even within breeds. In this study, we initially computed large-scale population recombination rates of both sexes using approximately 52 K SNP genotypes in a total of 3279 pigs from four different Chinese and Western breeds. We then constructed a high-resolution historical recombination map using approximately 16 million SNPs from a sample of unrelated individuals. Comparative analysis of porcine recombination events from different breeds and at different resolutions revealed the following observations: Firstly, the 1Mb-scale pig recombination maps of the same sex are moderately conserved among different breeds, with the similarity of recombination events between Western pigs and Chinese indigenous pigs being lower than within their respective groups. Secondly, we identified 3861 recombination hotspots in the genome and observed medium- to high-level correlation between historical recombination rates (0.542~0.683) and estimates of meiotic recombination rates. Third, we observed that recombination hotspots are significantly far from the transcription start sites of pig genes, and the silico-predicted PRDM9 zinc finger domain DNA recognition motif is significantly enriched in the regions of recombination hotspots compared to recombination coldspots, highlighting the potential role of PRDM9 in regulating recombination hotspots in pigs. Our study analyzed the variation patterns of the pig recombination map at broad and fine scales, providing a valuable reference for genomic selection breeding and laying a crucial foundation for further understanding the molecular mechanisms of pig genome recombination.
Collapse
Affiliation(s)
| | | | | | | | - Zhiyan Zhang
- National Key Laboratory for Swine Genetic Improvement and Germplasm Innovation, Jiangxi Agricultural University, Nanchang 330045, China
| | | |
Collapse
|
26
|
Liu X, Koyama S, Tomizuka K, Takata S, Ishikawa Y, Ito S, Kosugi S, Suzuki K, Hikino K, Koido M, Koike Y, Horikoshi M, Gakuhari T, Ikegawa S, Matsuda K, Momozawa Y, Ito K, Kamatani Y, Terao C. Decoding triancestral origins, archaic introgression, and natural selection in the Japanese population by whole-genome sequencing. SCIENCE ADVANCES 2024; 10:eadi8419. [PMID: 38630824 PMCID: PMC11023554 DOI: 10.1126/sciadv.adi8419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Accepted: 03/07/2024] [Indexed: 04/19/2024]
Abstract
We generated Japanese Encyclopedia of Whole-Genome/Exome Sequencing Library (JEWEL), a high-depth whole-genome sequencing dataset comprising 3256 individuals from across Japan. Analysis of JEWEL revealed genetic characteristics of the Japanese population that were not discernible using microarray data. First, rare variant-based analysis revealed an unprecedented fine-scale genetic structure. Together with population genetics analysis, the present-day Japanese can be decomposed into three ancestral components. Second, we identified unreported loss-of-function (LoF) variants and observed that for specific genes, LoF variants appeared to be restricted to a more limited set of transcripts than would be expected by chance, with PTPRD as a notable example. Third, we identified 44 archaic segments linked to complex traits, including a Denisovan-derived segment at NKX6-1 associated with type 2 diabetes. Most of these segments are specific to East Asians. Fourth, we identified candidate genetic loci under recent natural selection. Overall, our work provided insights into genetic characteristics of the Japanese population.
Collapse
Affiliation(s)
- Xiaoxi Liu
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
| | - Satoshi Koyama
- Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Medical and Population Genetics and Cardiovascular Disease Initiative, Broad Institute of Harvard and MIT, Boston, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
| | - Kohei Tomizuka
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Sadaaki Takata
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Yuki Ishikawa
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Shuji Ito
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Laboratory for Bone and Joint Diseases, RIKEN Center for Medical Sciences, Tokyo, Japan
- Department of Orthopedic Surgery, Faculty of Medicine, Shimane University, Izumo, Japan
| | - Shunichi Kosugi
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Kunihiko Suzuki
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Keiko Hikino
- Laboratory for Pharmacogenomics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Masaru Koido
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Laboratory of Complex Trait Genomics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Yoshinao Koike
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Laboratory for Bone and Joint Diseases, RIKEN Center for Medical Sciences, Tokyo, Japan
- Department of Orthopedic Surgery, Hokkaido University Graduate School of Medicine, Sapporo, Japan
| | - Momoko Horikoshi
- Laboratory for Genomics of Diabetes and Metabolism, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Takashi Gakuhari
- Institute for the Study of Ancient Civilizations and Cultural Resources, College of Human and Social Sciences, Kanazawa University, Kanazawa, Japan
| | - Shiro Ikegawa
- Laboratory for Bone and Joint Diseases, RIKEN Center for Medical Sciences, Tokyo, Japan
| | - Kochi Matsuda
- Laboratory of Genome Technology, Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
- Laboratory of Clinical Genome Sequencing, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Yukihide Momozawa
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Kaoru Ito
- Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Yoichiro Kamatani
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Laboratory of Complex Trait Genomics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- The Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| |
Collapse
|
27
|
Lyulina AS, Liu Z, Good BH. Linkage equilibrium between rare mutations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.28.587282. [PMID: 38617331 PMCID: PMC11014483 DOI: 10.1101/2024.03.28.587282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Recombination breaks down genetic linkage by reshuffling existing variants onto new genetic backgrounds. These dynamics are traditionally quantified by examining the correlations between alleles, and how they decay as a function of the recombination rate. However, the magnitudes of these correlations are strongly influenced by other evolutionary forces like natural selection and genetic drift, making it difficult to tease out the effects of recombination. Here we introduce a theoretical framework for analyzing an alternative family of statistics that measure the homoplasy produced by recombination. We derive analytical expressions that predict how these statistics depend on the rates of recombination and recurrent mutation, the strength of negative selection and genetic drift, and the present-day frequencies of the mutant alleles. We find that the degree of homoplasy can strongly depend on this frequency scale, which reflects the underlying timescales over which these mutations occurred. We show how these scaling properties can be used to isolate the effects of recombination, and discuss their implications for the rates of horizontal gene transfer in bacteria.
Collapse
Affiliation(s)
- Anastasia S Lyulina
- Department of Biology, Stanford University, Stanford, CA 94305, USA
- Department of Applied Physics, Stanford University, Stanford, CA 94305, USA
| | - Zhiru Liu
- Department of Applied Physics, Stanford University, Stanford, CA 94305, USA
| | - Benjamin H Good
- Department of Biology, Stanford University, Stanford, CA 94305, USA
- Department of Applied Physics, Stanford University, Stanford, CA 94305, USA
- Chan Zuckerberg Biohub - San Francisco, San Francisco, CA 94158, USA
| |
Collapse
|
28
|
Aw AJ, Spence JP, Song YS. A SIMPLE AND FLEXIBLE TEST OF SAMPLE EXCHANGEABILITY WITH APPLICATIONS TO STATISTICAL GENOMICS. Ann Appl Stat 2024; 18:858-881. [PMID: 38784669 PMCID: PMC11115382 DOI: 10.1214/23-aoas1817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
In scientific studies involving analyses of multivariate data, basic but important questions often arise for the researcher: Is the sample exchangeable, meaning that the joint distribution of the sample is invariant to the ordering of the units? Are the features independent of one another, or perhaps the features can be grouped so that the groups are mutually independent? In statistical genomics, these considerations are fundamental to downstream tasks such as demographic inference and the construction of polygenic risk scores. We propose a non-parametric approach, which we call the V test, to address these two questions, namely, a test of sample exchangeability given dependency structure of features, and a test of feature independence given sample exchangeability. Our test is conceptually simple, yet fast and flexible. It controls the Type I error across realistic scenarios, and handles data of arbitrary dimensions by leveraging large-sample asymptotics. Through extensive simulations and a comparison against unsupervised tests of stratification based on random matrix theory, we find that our test compares favorably in various scenarios of interest. We apply the test to data from the 1000 Genomes Project, demonstrating how it can be employed to assess exchangeability of the genetic sample, or find optimal linkage disequilibrium (LD) splits for downstream analysis. For exchangeability assessment, we find that removing rare variants can substantially increase the p -value of the test statistic. For optimal LD splitting, the V test reports different optimal splits than previous approaches not relying on hypothesis testing. Software for our methods is available in R (CRAN: flintyR) and Python (PyPI: flintyPy).
Collapse
Affiliation(s)
- Alan J Aw
- Department of Statistics, University of California, Berkeley
| | | | - Yun S Song
- Department of Statistics and Computer Science Division, University of California, Berkeley
| |
Collapse
|
29
|
Jiang Z, Zang W, Ericson PGP, Song G, Wu S, Feng S, Drovetski SV, Liu G, Zhang D, Saitoh T, Alström P, Edwards SV, Lei F, Qu Y. Gene flow and an anomaly zone complicate phylogenomic inference in a rapidly radiated avian family (Prunellidae). BMC Biol 2024; 22:49. [PMID: 38413944 PMCID: PMC10900574 DOI: 10.1186/s12915-024-01848-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Accepted: 02/15/2024] [Indexed: 02/29/2024] Open
Abstract
BACKGROUND Resolving the phylogeny of rapidly radiating lineages presents a challenge when building the Tree of Life. An Old World avian family Prunellidae (Accentors) comprises twelve species that rapidly diversified at the Pliocene-Pleistocene boundary. RESULTS Here we investigate the phylogenetic relationships of all species of Prunellidae using a chromosome-level de novo assembly of Prunella strophiata and 36 high-coverage resequenced genomes. We use homologous alignments of thousands of exonic and intronic loci to build the coalescent and concatenated phylogenies and recover four different species trees. Topology tests show a large degree of gene tree-species tree discordance but only 40-54% of intronic gene trees and 36-75% of exonic genic trees can be explained by incomplete lineage sorting and gene tree estimation errors. Estimated branch lengths for three successive internal branches in the inferred species trees suggest the existence of an empirical anomaly zone. The most common topology recovered for species in this anomaly zone was not similar to any coalescent or concatenated inference phylogenies, suggesting presence of anomalous gene trees. However, this interpretation is complicated by the presence of gene flow because extensive introgression was detected among these species. When exploring tree topology distributions, introgression, and regional variation in recombination rate, we find that many autosomal regions contain signatures of introgression and thus may mislead phylogenetic inference. Conversely, the phylogenetic signal is concentrated to regions with low-recombination rate, such as the Z chromosome, which are also more resistant to interspecific introgression. CONCLUSIONS Collectively, our results suggest that phylogenomic inference should consider the underlying genomic architecture to maximize the consistency of phylogenomic signal.
Collapse
Affiliation(s)
- Zhiyong Jiang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Wenqing Zang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Per G P Ericson
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, PO Box 50007, Stockholm, SE-104 05, Sweden
| | - Gang Song
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Shaoyuan Wu
- Jiangsu International Joint Center of Genomics, Jiangsu Key Laboratory of Phylogenomics & Comparative Genomics, School of Life Sciences, Jiangsu Normal University, Xuzhou, 221116, Jiangsu, China
| | - Shaohong Feng
- Center for Evolutionary & Organismal Biology, Zhejiang University School of Medicine, Hangzhou, 310058, China
- Liangzhu Laboratory, Zhejiang University, 1369 West Wenyi Road, Hangzhou, 311121, China
- Innovation Center of Yangtze River Delta, Zhejiang University, Jiashan, 314102, China
| | - Sergei V Drovetski
- National Museum of Natural History, Smithsonian Institution, Washington, DC, 20004, USA
- Present address: U.S. Geological Survey, Eastern Ecological Science Center at Patuxent Research Refuge, Laurel, MD, 20708, USA
| | - Gang Liu
- Chinese Academy of Forestry, Institute of Ecological Conservation and Restoration, Beijing, 100091, China
| | - Dezhi Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Takema Saitoh
- Yamashina Institute for Ornithology, Abiko, Chiba, Japan
| | - Per Alström
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- Animal Ecology, Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18 D, 752 36, Uppsala, Sweden
| | - Scott V Edwards
- Museum of Comparative Zoology and Department of Organismic & Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA, 02138, USA
| | - Fumin Lei
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Yanhua Qu
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China.
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, PO Box 50007, Stockholm, SE-104 05, Sweden.
| |
Collapse
|
30
|
Hoge C, de Manuel M, Mahgoub M, Okami N, Fuller Z, Banerjee S, Baker Z, McNulty M, Andolfatto P, Macfarlan TS, Schumer M, Tzika AC, Przeworski M. Patterns of recombination in snakes reveal a tug-of-war between PRDM9 and promoter-like features. Science 2024; 383:eadj7026. [PMID: 38386752 DOI: 10.1126/science.adj7026] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 01/04/2024] [Indexed: 02/24/2024]
Abstract
In some mammals, notably humans, recombination occurs almost exclusively where the protein PRDM9 binds, whereas in vertebrates lacking an intact PRDM9, such as birds and canids, recombination rates are elevated near promoter-like features. To determine whether PRDM9 directs recombination in nonmammalian vertebrates, we focused on an exemplar species with a single, intact PRDM9 ortholog, the corn snake (Pantherophis guttatus). Analyzing historical recombination rates along the genome and crossovers in pedigrees, we found evidence that PRDM9 specifies the location of recombination events, but we also detected a separable effect of promoter-like features. These findings reveal that the uses of PRDM9 and promoter-like features need not be mutually exclusive and instead reflect a tug-of-war that is more even in some species than others.
Collapse
Affiliation(s)
- Carla Hoge
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Marc de Manuel
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Mohamed Mahgoub
- The Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| | - Naima Okami
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Zachary Fuller
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Shreya Banerjee
- Department of Biology, Stanford University, Stanford, CA, USA
| | - Zachary Baker
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Morgan McNulty
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Peter Andolfatto
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Todd S Macfarlan
- The Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| | - Molly Schumer
- Department of Biology, Stanford University, Stanford, CA, USA
- Howard Hughes Medical Institute, Stanford, CA, USA
| | - Athanasia C Tzika
- Laboratory of Artificial & Natural Evolution (LANE), Department of Genetics & Evolution, University of Geneva, Geneva, Switzerland
| | - Molly Przeworski
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
| |
Collapse
|
31
|
Ariad D, Madjunkova S, Madjunkov M, Chen S, Abramov R, Librach C, McCoy RC. Aberrant landscapes of maternal meiotic crossovers contribute to aneuploidies in human embryos. Genome Res 2024; 34:70-84. [PMID: 38071472 PMCID: PMC10903951 DOI: 10.1101/gr.278168.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 11/21/2023] [Indexed: 12/19/2023]
Abstract
Meiotic recombination is crucial for human genetic diversity and chromosome segregation accuracy. Understanding its variation across individuals and the processes by which it goes awry are long-standing goals in human genetics. Current approaches for inferring recombination landscapes rely either on population genetic patterns of linkage disequilibrium (LD)-capturing a time-averaged view-or on direct detection of crossovers in gametes or multigeneration pedigrees, which limits data set scale and availability. Here, we introduce an approach for inferring sex-specific recombination landscapes using data from preimplantation genetic testing for aneuploidy (PGT-A). This method relies on low-coverage (<0.05×) whole-genome sequencing of in vitro fertilized (IVF) embryo biopsies. To overcome the data sparsity, our method exploits its inherent relatedness structure, knowledge of haplotypes from external population reference panels, and the frequent occurrence of monosomies in embryos, whereby the remaining chromosome is phased by default. Extensive simulations show our method's high accuracy, even at coverages as low as 0.02×. Applying this method to PGT-A data from 18,967 embryos, we mapped 70,660 recombination events with ∼150 kbp resolution, replicating established sex-specific recombination patterns. We observed a reduced total length of the female genetic map in trisomies compared with disomies, as well as chromosome-specific alterations in crossover distributions. Based on haplotype configurations in pericentromeric regions, our data indicate chromosome-specific propensities for different mechanisms of meiotic error. Our results provide a comprehensive view of the role of aberrant meiotic recombination in the origins of human aneuploidies and offer a versatile tool for mapping crossovers in low-coverage sequencing data from multiple siblings.
Collapse
Affiliation(s)
- Daniel Ariad
- Department of Biology, Johns Hopkins University, Baltimore, Maryland 21218, USA;
| | - Svetlana Madjunkova
- CReATe Fertility Centre, Toronto, Ontario M5G 1N8, Canada
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | | | - Siwei Chen
- CReATe Fertility Centre, Toronto, Ontario M5G 1N8, Canada
| | - Rina Abramov
- CReATe Fertility Centre, Toronto, Ontario M5G 1N8, Canada
| | - Clifford Librach
- CReATe Fertility Centre, Toronto, Ontario M5G 1N8, Canada
- Department of Obstetrics and Gynecology, University of Toronto, Toronto, Ontario M5G 1E2, Canada
- Institute of Medical Sciences, University of Toronto, Toronto, Ontario M5S 1A8, Canada
- Department of Physiology, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, Maryland 21218, USA;
| |
Collapse
|
32
|
Bascón-Cardozo K, Bours A, Manthey G, Durieux G, Dutheil JY, Pruisscher P, Odenthal-Hesse L, Liedvogel M. Fine-Scale Map Reveals Highly Variable Recombination Rates Associated with Genomic Features in the Eurasian Blackcap. Genome Biol Evol 2024; 16:evad233. [PMID: 38198800 PMCID: PMC10781513 DOI: 10.1093/gbe/evad233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/12/2023] [Indexed: 01/12/2024] Open
Abstract
Recombination is responsible for breaking up haplotypes, influencing genetic variability, and the efficacy of selection. Bird genomes lack the protein PR domain-containing protein 9, a key determinant of recombination dynamics in most metazoans. Historical recombination maps in birds show an apparent stasis in positioning recombination events. This highly conserved recombination pattern over long timescales may constrain the evolution of recombination in birds. At the same time, extensive variation in recombination rate is observed across the genome and between different species of birds. Here, we characterize the fine-scale historical recombination map of an iconic migratory songbird, the Eurasian blackcap (Sylvia atricapilla), using a linkage disequilibrium-based approach that accounts for population demography. Our results reveal variable recombination rates among and within chromosomes, which associate positively with nucleotide diversity and GC content and negatively with chromosome size. Recombination rates increased significantly at regulatory regions but not necessarily at gene bodies. CpG islands are associated strongly with recombination rates, though their specific position and local DNA methylation patterns likely influence this relationship. The association with retrotransposons varied according to specific family and location. Our results also provide evidence of heterogeneous intrachromosomal conservation of recombination maps between the blackcap and its closest sister taxon, the garden warbler. These findings highlight the considerable variability of recombination rates at different scales and the role of specific genomic features in shaping this variation. This study opens the possibility of further investigating the impact of recombination on specific population-genomic features.
Collapse
Affiliation(s)
- Karen Bascón-Cardozo
- MPRG Behavioural Genomics, Max Planck Institute for Evolutionary Biology, Plön 24306, Germany
| | - Andrea Bours
- MPRG Behavioural Genomics, Max Planck Institute for Evolutionary Biology, Plön 24306, Germany
| | - Georg Manthey
- Institute of Avian Research “Vogelwarte Helgoland”, Wilhelmshaven 26386, Germany
| | - Gillian Durieux
- MPRG Behavioural Genomics, Max Planck Institute for Evolutionary Biology, Plön 24306, Germany
| | - Julien Y Dutheil
- Department for Theoretical Biology, Max Planck Institute for Evolutionary Biology, Plön 24306, Germany
| | - Peter Pruisscher
- MPRG Behavioural Genomics, Max Planck Institute for Evolutionary Biology, Plön 24306, Germany
- Department of Zoology, Stockholm University, Stockholm SE-106 91, Sweden
| | - Linda Odenthal-Hesse
- Department Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön 24306, Germany
| | - Miriam Liedvogel
- MPRG Behavioural Genomics, Max Planck Institute for Evolutionary Biology, Plön 24306, Germany
- Institute of Avian Research “Vogelwarte Helgoland”, Wilhelmshaven 26386, Germany
- Department of Biology and Environmental Sciences, Carl von Ossietzky University of Oldenburg, Oldenburg 26129, Germany
| |
Collapse
|
33
|
Versoza CJ, Weiss S, Johal R, La Rosa B, Jensen JD, Pfeifer SP. Novel Insights into the Landscape of Crossover and Noncrossover Events in Rhesus Macaques (Macaca mulatta). Genome Biol Evol 2024; 16:evad223. [PMID: 38051960 PMCID: PMC10773715 DOI: 10.1093/gbe/evad223] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 11/04/2023] [Accepted: 11/28/2023] [Indexed: 12/07/2023] Open
Abstract
Meiotic recombination landscapes differ greatly between distantly and closely related taxa, populations, individuals, sexes, and even within genomes; however, the factors driving this variation are yet to be well elucidated. Here, we directly estimate contemporary crossover rates and, for the first time, noncrossover rates in rhesus macaques (Macaca mulatta) from four three-generation pedigrees comprising 32 individuals. We further compare these results with historical, demography-aware, linkage disequilibrium-based recombination rate estimates. From paternal meioses in the pedigrees, 165 crossover events with a median resolution of 22.3 kb were observed, corresponding to a male autosomal map length of 2,357 cM-approximately 15% longer than an existing linkage map based on human microsatellite loci. In addition, 85 noncrossover events with a mean tract length of 155 bp were identified-similar to the tract lengths observed in the only other two primates in which noncrossovers have been studied to date, humans and baboons. Consistent with observations in other placental mammals with PRDM9-directed recombination, crossover (and to a lesser extent noncrossover) events in rhesus macaques clustered in intergenic regions and toward the chromosomal ends in males-a pattern in broad agreement with the historical, sex-averaged recombination rate estimates-and evidence of GC-biased gene conversion was observed at noncrossover sites.
Collapse
Affiliation(s)
- Cyril J Versoza
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| | - Sarah Weiss
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Ravneet Johal
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Bruno La Rosa
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| | - Susanne P Pfeifer
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
34
|
Dinh BL, Tang E, Taparra K, Nakatsuka N, Chen F, Chiang CWK. Recombination map tailored to Native Hawaiians may improve robustness of genomic scans for positive selection. Hum Genet 2024; 143:85-99. [PMID: 38157018 PMCID: PMC10794367 DOI: 10.1007/s00439-023-02625-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Accepted: 11/25/2023] [Indexed: 01/03/2024]
Abstract
Recombination events establish the patterns of haplotypic structure in a population and estimates of recombination rates are used in several downstream population and statistical genetic analyses. Using suboptimal maps from distantly related populations may reduce the efficacy of genomic analyses, particularly for underrepresented populations such as the Native Hawaiians. To overcome this challenge, we constructed recombination maps using genome-wide array data from two study samples of Native Hawaiians: one reflecting the current admixed state of Native Hawaiians (NH map) and one based on individuals of enriched Polynesian ancestries (PNS map) with the potential to be used for less admixed Polynesian populations such as the Samoans. We found the recombination landscape to be less correlated with those from other continental populations (e.g. Spearman's rho = 0.79 between PNS and CEU (Utah residents with Northern and Western European ancestry) compared to 0.92 between YRI (Yoruba in Ibadan, Nigeria) and CEU at 50 kb resolution), likely driven by the unique demographic history of the Native Hawaiians. PNS also shared the fewest recombination hotspots with other populations (e.g. 8% of hotspots shared between PNS and CEU compared to 27% of hotspots shared between YRI and CEU). We found that downstream analyses in the Native Hawaiian population, such as local ancestry inference, imputation, and IBD segment and relatedness detections, would achieve similar efficacy when using the NH map compared to an omnibus map. However, for genome scans of adaptive loci using integrated haplotype scores, we found several loci with apparent genome-wide significant signals (|Z-score|> 4) in Native Hawaiians that would not have been significant when analyzed using NH-specific maps. Population-specific recombination maps may therefore improve the robustness of haplotype-based statistics and help us better characterize the evolutionary history that may underlie Native Hawaiian-specific health conditions that persist today.
Collapse
Affiliation(s)
- Bryan L Dinh
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Echo Tang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Kekoa Taparra
- Department of Radiation Oncology, Stanford University, Palo Alto, CA, USA
| | | | - Fei Chen
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Charleston W K Chiang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
35
|
Ariad D, Madjunkova S, Madjunkov M, Chen S, Abramov R, Librach C, McCoy RC. Aberrant landscapes of maternal meiotic crossovers contribute to aneuploidies in human embryos. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.07.543910. [PMID: 37333422 PMCID: PMC10274764 DOI: 10.1101/2023.06.07.543910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Meiotic recombination is crucial for human genetic diversity and chromosome segregation accuracy. Understanding its variation across individuals and the processes by which it goes awry are long-standing goals in human genetics. Current approaches for inferring recombination landscapes either rely on population genetic patterns of linkage disequilibrium (LD)-capturing a time-averaged view-or direct detection of crossovers in gametes or multi-generation pedigrees, which limits dataset scale and availability. Here, we introduce an approach for inferring sex-specific recombination landscapes using data from preimplantation genetic testing for aneuploidy (PGT-A). This method relies on low-coverage (<0.05×) whole-genome sequencing of in vitro fertilized (IVF) embryo biopsies. To overcome the data sparsity, our method exploits its inherent relatedness structure, knowledge of haplotypes from external population reference panels, as well as the frequent occurrence of monosomies in embryos, whereby the remaining chromosome is phased by default. Extensive simulations demonstrate our method's high accuracy, even at coverages as low as 0.02×. Applying this method to PGT-A data from 18,967 embryos, we mapped 70,660 recombination events with ~150 kbp resolution, replicating established sex-specific recombination patterns. We observed a reduced total length of the female genetic map in trisomies compared to disomies, as well as chromosome-specific alterations in crossover distributions. Based on haplotype configurations in pericentromeric regions, our data indicate chromosome-specific propensities for different mechanisms of meiotic error. Our results provide a comprehensive view of the role of aberrant meiotic recombination in the origins of human aneuploidies and offer a versatile tool for mapping crossovers in low-coverage sequencing data from multiple siblings.
Collapse
Affiliation(s)
- Daniel Ariad
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Svetlana Madjunkova
- CReATe Fertility Centre, Toronto, Canada
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Canada
| | | | - Siwei Chen
- CReATe Fertility Centre, Toronto, Canada
| | | | - Clifford Librach
- CReATe Fertility Centre, Toronto, Canada
- Department of Obstetrics and Gynecology, University of Toronto, Toronto, Canada
- Institute of Medical Sciences, University of Toronto, Toronto, Canada
- Department of Physiology, University of Toronto, Toronto, Canada
| | - Rajiv C. McCoy
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
36
|
Spence JP, Zeng T, Mostafavi H, Pritchard JK. Scaling the discrete-time Wright-Fisher model to biobank-scale datasets. Genetics 2023; 225:iyad168. [PMID: 37724741 PMCID: PMC10627256 DOI: 10.1093/genetics/iyad168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 06/01/2023] [Accepted: 09/08/2023] [Indexed: 09/21/2023] Open
Abstract
The discrete-time Wright-Fisher (DTWF) model and its diffusion limit are central to population genetics. These models can describe the forward-in-time evolution of allele frequencies in a population resulting from genetic drift, mutation, and selection. Computing likelihoods under the diffusion process is feasible, but the diffusion approximation breaks down for large samples or in the presence of strong selection. Existing methods for computing likelihoods under the DTWF model do not scale to current exome sequencing sample sizes in the hundreds of thousands. Here, we present a scalable algorithm that approximates the DTWF model with provably bounded error. Our approach relies on two key observations about the DTWF model. The first is that transition probabilities under the model are approximately sparse. The second is that transition distributions for similar starting allele frequencies are extremely close as distributions. Together, these observations enable approximate matrix-vector multiplication in linear (as opposed to the usual quadratic) time. We prove similar properties for Hypergeometric distributions, enabling fast computation of likelihoods for subsamples of the population. We show theoretically and in practice that this approximation is highly accurate and can scale to population sizes in the tens of millions, paving the way for rigorous biobank-scale inference. Finally, we use our results to estimate the impact of larger samples on estimating selection coefficients for loss-of-function variants. We find that increasing sample sizes beyond existing large exome sequencing cohorts will provide essentially no additional information except for genes with the most extreme fitness effects.
Collapse
Affiliation(s)
- Jeffrey P Spence
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Tony Zeng
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | | | - Jonathan K Pritchard
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
37
|
Baker Z, Przeworski M, Sella G. Down the Penrose stairs, or how selection for fewer recombination hotspots maintains their existence. eLife 2023; 12:e83769. [PMID: 37830496 PMCID: PMC10703446 DOI: 10.7554/elife.83769] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 10/12/2023] [Indexed: 10/14/2023] Open
Abstract
In many species, meiotic recombination events tend to occur in narrow intervals of the genome, known as hotspots. In humans and mice, double strand break (DSB) hotspot locations are determined by the DNA-binding specificity of the zinc finger array of the PRDM9 protein, which is rapidly evolving at residues in contact with DNA. Previous models explained this rapid evolution in terms of the need to restore PRDM9 binding sites lost to gene conversion over time, under the assumption that more PRDM9 binding always leads to more DSBs. This assumption, however, does not align with current evidence. Recent experimental work indicates that PRDM9 binding on both homologs facilitates DSB repair, and that the absence of sufficient symmetric binding disrupts meiosis. We therefore consider an alternative hypothesis: that rapid PRDM9 evolution is driven by the need to restore symmetric binding because of its role in coupling DSB formation and efficient repair. To this end, we model the evolution of PRDM9 from first principles: from its binding dynamics to the population genetic processes that govern the evolution of the zinc finger array and its binding sites. We show that the loss of a small number of strong binding sites leads to the use of a greater number of weaker ones, resulting in a sharp reduction in symmetric binding and favoring new PRDM9 alleles that restore the use of a smaller set of strong binding sites. This decrease, in turn, drives rapid PRDM9 evolutionary turnover. Our results therefore suggest that the advantage of new PRDM9 alleles is in limiting the number of binding sites used effectively, rather than in increasing net PRDM9 binding. By extension, our model suggests that the evolutionary advantage of hotspots may have been to increase the efficiency of DSB repair and/or homolog pairing.
Collapse
Affiliation(s)
- Zachary Baker
- Department of Systems Biology, Columbia UniversityNew YorkUnited States
- Department of Biological Sciences, Columbia UniversityNew YorkUnited States
| | - Molly Przeworski
- Department of Systems Biology, Columbia UniversityNew YorkUnited States
- Department of Biological Sciences, Columbia UniversityNew YorkUnited States
- Program for Mathematical Genomics, Columbia UniversityNew YorkUnited States
| | - Guy Sella
- Department of Biological Sciences, Columbia UniversityNew YorkUnited States
- Program for Mathematical Genomics, Columbia UniversityNew YorkUnited States
| |
Collapse
|
38
|
Pivirotto AM, Platt A, Patel R, Kumar S, Hey J. Analyses of allele age and fitness impact reveal human beneficial alleles to be older than neutral controls. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.09.561569. [PMID: 37873438 PMCID: PMC10592680 DOI: 10.1101/2023.10.09.561569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
A classic population genetic prediction is that alleles experiencing directional selection should swiftly traverse allele frequency space, leaving detectable reductions in genetic variation in linked regions. However, despite this expectation, identifying clear footprints of beneficial allele passage has proven to be surprisingly challenging. We addressed the basic premise underlying this expectation by estimating the ages of large numbers of beneficial and deleterious alleles in a human population genomic data set. Deleterious alleles were found to be young, on average, given their allele frequency. However, beneficial alleles were older on average than non-coding, non-regulatory alleles of the same frequency. This finding is not consistent with directional selection and instead indicates some type of balancing selection. Among derived beneficial alleles, those fixed in the population show higher local recombination rates than those still segregating, consistent with a model in which new beneficial alleles experience an initial period of balancing selection due to linkage disequilibrium with deleterious recessive alleles. Alleles that ultimately fix following a period of balancing selection will leave a modest 'soft' sweep impact on the local variation, consistent with the overall paucity of species-wide 'hard' sweeps in human genomes.
Collapse
Affiliation(s)
| | - Alexander Platt
- Temple University, Department of Biology, Philadelphia PA 19122, USA
- University of Pennsylvania, Department of Genetics, Philadelphia PA 19104, USA
| | - Ravi Patel
- Temple University, Department of Biology, Philadelphia PA 19122, USA
- Institute for Genomics and Evolutionary Medicine, Temple University, PA 19122, USA
| | - Sudhir Kumar
- Temple University, Department of Biology, Philadelphia PA 19122, USA
- Institute for Genomics and Evolutionary Medicine, Temple University, PA 19122, USA
| | - Jody Hey
- Temple University, Department of Biology, Philadelphia PA 19122, USA
| |
Collapse
|
39
|
Liu X, Matsunami M, Horikoshi M, Ito S, Ishikawa Y, Suzuki K, Momozawa Y, Niida S, Kimura R, Ozaki K, Maeda S, Imamura M, Terao C. Natural Selection Signatures in the Hondo and Ryukyu Japanese Subpopulations. Mol Biol Evol 2023; 40:msad231. [PMID: 37903429 PMCID: PMC10615566 DOI: 10.1093/molbev/msad231] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 09/20/2023] [Accepted: 10/06/2023] [Indexed: 11/01/2023] Open
Abstract
Natural selection signatures across Japanese subpopulations are under-explored. Here we conducted genome-wide selection scans with 622,926 single nucleotide polymorphisms for 20,366 Japanese individuals, who were recruited from the main-islands of Japanese Archipelago (Hondo) and the Ryukyu Archipelago (Ryukyu), representing two major Japanese subpopulations. The integrated haplotype score (iHS) analysis identified several signals in one or both subpopulations. We found a novel candidate locus at IKZF2, especially in Ryukyu. Significant signals were observed in the major histocompatibility complex region in both subpopulations. The lead variants differed and demonstrated substantial allele frequency differences between Hondo and Ryukyu. The lead variant in Hondo tags HLA-A*33:03-C*14:03-B*44:03-DRB1*13:02-DQB1*06:04-DPB1*04:01, a haplotype specific to Japanese and Korean. While in Ryukyu, the lead variant tags DRB1*15:01-DQB1*06:02, which had been recognized as a genetic risk factor for narcolepsy. In contrast, it is reported to confer protective effects against type 1 diabetes and human T lymphotropic virus type 1-associated myelopathy/tropical spastic paraparesis. The FastSMC analysis identified 8 loci potentially affected by selection within the past 20-150 generations, including 2 novel candidate loci. The analysis also showed differences in selection patterns of ALDH2 between Hondo and Ryukyu, a gene recognized to be specifically targeted by selection in East Asian. In summary, our study provided insights into the selection signatures within the Japanese and nominated potential sources of selection pressure.
Collapse
Affiliation(s)
- Xiaoxi Liu
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
| | - Masatoshi Matsunami
- Department of Advanced Genomic and Laboratory Medicine, Graduate School of Medicine, University of the Ryukyus, Nishihara-Cho, Japan
| | - Momoko Horikoshi
- Laboratory for Genomics of Diabetes and Metabolism, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Shuji Ito
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Yuki Ishikawa
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Kunihiko Suzuki
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Yukihide Momozawa
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Shumpei Niida
- Core Facility Administration, Research Institute, National Center for Geriatrics and Gerontology, Obu, Japan
| | - Ryosuke Kimura
- Department of Human Biology and Anatomy, Graduate School of Medicine, University of the Ryukyus, Nishihara-Cho, Japan
| | - Kouichi Ozaki
- Medical Genome Center, Research Institute, National Center for Geriatrics and Gerontology, Obu, Japan
| | - Shiro Maeda
- Department of Advanced Genomic and Laboratory Medicine, Graduate School of Medicine, University of the Ryukyus, Nishihara-Cho, Japan
- Division of Clinical Laboratory and Blood Transfusion, University of the Ryukyus Hospital, Okinawa, Japan
| | - Minako Imamura
- Department of Advanced Genomic and Laboratory Medicine, Graduate School of Medicine, University of the Ryukyus, Nishihara-Cho, Japan
- Division of Clinical Laboratory and Blood Transfusion, University of the Ryukyus Hospital, Okinawa, Japan
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- The Department of Applied Genetics, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| |
Collapse
|
40
|
Nait Saada J, Tsangalidou Z, Stricker M, Palamara PF. Inference of Coalescence Times and Variant Ages Using Convolutional Neural Networks. Mol Biol Evol 2023; 40:msad211. [PMID: 37738175 PMCID: PMC10581698 DOI: 10.1093/molbev/msad211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 09/11/2023] [Accepted: 09/18/2023] [Indexed: 09/24/2023] Open
Abstract
Accurate inference of the time to the most recent common ancestor (TMRCA) between pairs of individuals and of the age of genomic variants is key in several population genetic analyses. We developed a likelihood-free approach, called CoalNN, which uses a convolutional neural network to predict pairwise TMRCAs and allele ages from sequencing or SNP array data. CoalNN is trained through simulation and can be adapted to varying parameters, such as demographic history, using transfer learning. Across several simulated scenarios, CoalNN matched or outperformed the accuracy of model-based approaches for pairwise TMRCA and allele age prediction. We applied CoalNN to settings for which model-based approaches are under-developed and performed analyses to gain insights into the set of features it uses to perform TMRCA prediction. We next used CoalNN to analyze 2,504 samples from 26 populations in the 1,000 Genome Project data set, inferring the age of ∼80 million variants. We observed substantial variation across populations and for variants predicted to be pathogenic, reflecting heterogeneous demographic histories and the action of negative selection. We used CoalNN's predicted allele ages to construct genome-wide annotations capturing the signature of past negative selection. We performed LD-score regression analysis of heritability using summary association statistics from 63 independent complex traits and diseases (average N=314k), observing increased annotation-specific effects on heritability compared to a previous allele age annotation. These results highlight the effectiveness of using likelihood-free, simulation-trained models to infer properties of gene genealogies in large genomic data sets.
Collapse
Affiliation(s)
| | | | | | - Pier Francesco Palamara
- Department of Statistics, University of Oxford, Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| |
Collapse
|
41
|
Heraghty SD, Jackson JM, Lozier JD. Whole genome analyses reveal weak signatures of population structure and environmentally associated local adaptation in an important North American pollinator, the bumble bee Bombus vosnesenskii. Mol Ecol 2023; 32:5479-5497. [PMID: 37702957 DOI: 10.1111/mec.17125] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 08/21/2023] [Accepted: 08/24/2023] [Indexed: 09/14/2023]
Abstract
Studies of species that experience environmental heterogeneity across their distributions have become an important tool for understanding mechanisms of adaptation and predicting responses to climate change. We examine population structure, demographic history and environmentally associated genomic variation in Bombus vosnesenskii, a common bumble bee in the western USA, using whole genome resequencing of populations distributed across a broad range of latitudes and elevations. We find that B. vosnesenskii exhibits minimal population structure and weak isolation by distance, confirming results from previous studies using other molecular marker types. Similarly, demographic analyses with Sequentially Markovian Coalescent models suggest that minimal population structure may have persisted since the last interglacial period, with genomes from different parts of the species range showing similar historical effective population size trajectories and relatively small fluctuations through time. Redundancy analysis revealed a small amount of genomic variation explained by bioclimatic variables. Environmental association analysis with latent factor mixed modelling (LFMM2) identified few outlier loci that were sparsely distributed throughout the genome and although a few putative signatures of selective sweeps were identified, none encompassed particularly large numbers of loci. Some outlier loci were in genes with known regulatory relationships, suggesting the possibility of weak selection, although compared with other species examined with similar approaches, evidence for extensive local adaptation signatures in the genome was relatively weak. Overall, results indicate B. vosnesenskii is an example of a generalist with a high degree of flexibility in its environmental requirements that may ultimately benefit the species under periods of climate change.
Collapse
Affiliation(s)
- Sam D Heraghty
- Department of Biological Sciences, The University of Alabama, Tuscaloosa, Alabama, USA
| | - Jason M Jackson
- Department of Biological Sciences, The University of Alabama, Tuscaloosa, Alabama, USA
| | - Jeffrey D Lozier
- Department of Biological Sciences, The University of Alabama, Tuscaloosa, Alabama, USA
| |
Collapse
|
42
|
Li Z, Liu X, Wang C, Li Z, Jiang B, Zhang R, Tong L, Qu Y, He S, Chen H, Mao Y, Li Q, Pook T, Wu Y, Zan Y, Zhang H, Li L, Wen K, Chen Y. The pig pangenome provides insights into the roles of coding structural variations in genetic diversity and adaptation. Genome Res 2023; 33:1833-1847. [PMID: 37914227 PMCID: PMC10691484 DOI: 10.1101/gr.277638.122] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 09/12/2023] [Indexed: 11/03/2023]
Abstract
Structural variations have emerged as an important driving force for genome evolution and phenotypic variation in various organisms, yet their contributions to genetic diversity and adaptation in domesticated animals remain largely unknown. Here we constructed a pangenome based on 250 sequenced individuals from 32 pig breeds in Eurasia and systematically characterized coding sequence presence/absence variations (PAVs) within pigs. We identified 308.3-Mb nonreference sequences and 3438 novel genes absent from the current reference genome. Gene PAV analysis showed that 16.8% of the genes in the pangene catalog undergo PAV. A number of newly identified dispensable genes showed close associations with adaptation. For instance, several novel swine leukocyte antigen (SLA) genes discovered in nonreference sequences potentially participate in immune responses to productive and respiratory syndrome virus (PRRSV) infection. We delineated previously unidentified features of the pig mobilome that contained 490,480 transposable element insertion polymorphisms (TIPs) resulting from recent mobilization of 970 TE families, and investigated their population dynamics along with influences on population differentiation and gene expression. In addition, several candidate adaptive TE insertions were detected to be co-opted into genes responsible for responses to hypoxia, skeletal development, regulation of heart contraction, and neuronal cell development, likely contributing to local adaptation of Tibetan wild boars. These findings enhance our understanding on hidden layers of the genetic diversity in pigs and provide novel insights into the role of SVs in the evolutionary adaptation of mammals.
Collapse
Affiliation(s)
- Zhengcao Li
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, 510006 Guangzhou, China;
| | - Xiaohong Liu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, 510006 Guangzhou, China
| | - Chen Wang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, 510006 Guangzhou, China
| | - Zhenyang Li
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, 510006 Guangzhou, China
| | - Bo Jiang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, 510006 Guangzhou, China
| | - Ruifeng Zhang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, 510006 Guangzhou, China
| | - Lu Tong
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, 510006 Guangzhou, China
| | - Youping Qu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, 510006 Guangzhou, China
| | - Sheng He
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, 510006 Guangzhou, China
| | - Haifan Chen
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, 510006 Guangzhou, China
| | - Yafei Mao
- Bio-X Institutes, Shanghai Jiao Tong University, 200240 Shanghai, China
| | - Qingnan Li
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, 510006 Guangzhou, China
| | - Torsten Pook
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen 6700 AH, The Netherlands
| | - Yu Wu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, 510006 Guangzhou, China
| | - Yanjun Zan
- Key Laboratory of Tobacco Improvement and Biotechnology, Tobacco Research Institute, Chinese Academy of Agricultural Sciences, Qingdao 266000, China
| | - Hui Zhang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, 510006 Guangzhou, China
| | - Lu Li
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, 510006 Guangzhou, China
| | - Keying Wen
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, 510006 Guangzhou, China
| | - Yaosheng Chen
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, 510006 Guangzhou, China;
| |
Collapse
|
43
|
Bours A, Pruisscher P, Bascón-Cardozo K, Odenthal-Hesse L, Liedvogel M. The blackcap (Sylvia atricapilla) genome reveals a recent accumulation of LTR retrotransposons. Sci Rep 2023; 13:16471. [PMID: 37777595 PMCID: PMC10542752 DOI: 10.1038/s41598-023-43090-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 09/19/2023] [Indexed: 10/02/2023] Open
Abstract
Transposable elements (TEs) are mobile genetic elements that can move around the genome, and as such are a source of genomic variability. Based on their characteristics we can annotate TEs within the host genome and classify them into specific TE types and families. The increasing number of available high-quality genome references in recent years provides an excellent resource that will enhance the understanding of the role of recently active TEs on genetic variation and phenotypic evolution. Here we showcase the use of a high-quality TE annotation to understand the distinct effect of recent and ancient TE insertions on the evolution of genomic variation, within our study species the Eurasian blackcap (Sylvia atricapilla). We investigate how these distinct TE categories are distributed along the genome and evaluate how their coverage across the genome is correlated with four genomic features: recombination rate, gene coverage, CpG island coverage and GC content. We found within the recent TE insertions an accumulation of LTRs previously not seen in birds. While the coverage of recent TE insertions was negatively correlated with both GC content and recombination rate, the correlation with recombination rate disappeared and turned positive for GC content when considering ancient TE insertions.
Collapse
Affiliation(s)
- Andrea Bours
- MPRG Behavioural Genomics, Max Planck Institute for Evolutionary Biology, 24306, Plön, Germany.
| | - Peter Pruisscher
- MPRG Behavioural Genomics, Max Planck Institute for Evolutionary Biology, 24306, Plön, Germany
- Department of Evolutionary Biology, Evolutionary Biology Centre (EBC), Uppsala University, Uppsala, Sweden
| | - Karen Bascón-Cardozo
- MPRG Behavioural Genomics, Max Planck Institute for Evolutionary Biology, 24306, Plön, Germany
| | - Linda Odenthal-Hesse
- Department Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, 24306, Plön, Germany
| | - Miriam Liedvogel
- MPRG Behavioural Genomics, Max Planck Institute for Evolutionary Biology, 24306, Plön, Germany.
- Institute of Avian Research "Vogelwarte Helgoland", 26386, Wilhelmshaven, Germany.
| |
Collapse
|
44
|
Dinh BL, Tang E, Taparra K, Nakatsuka N, Chen F, Chiang CWK. Recombination map tailored to Native Hawaiians improves robustness of genomic scans for positive selection. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.12.548735. [PMID: 37503129 PMCID: PMC10370006 DOI: 10.1101/2023.07.12.548735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Recombination events establish the patterns of haplotypic structure in a population and estimates of recombination rates are used in several downstream population and statistical genetic analyses. Using suboptimal maps from distantly related populations may reduce the efficacy of genomic analyses, particularly for underrepresented populations such as the Native Hawaiians. To overcome this challenge, we constructed recombination maps using genome-wide array data from two study samples of Native Hawaiians: one reflecting the current admixed state of Native Hawaiians (NH map), and one based on individuals of enriched Polynesian ancestries (PNS map) with the potential to be used for less admixed Polynesian populations such as the Samoans. We found the recombination landscape to be less correlated with those from other continental populations (e.g. Spearman's rho = 0.79 between PNS and CEU (Utah residents with Northern and Western European ancestry) compared to 0.92 between YRI (Yoruba in Ibadan, Nigeria) and CEU at 50 kb resolution), likely driven by the unique demographic history of the Native Hawaiians. PNS also shared the fewest recombination hotspots with other populations (e.g. 8% of hotspots shared between PNS and CEU compared to 27% of hotspots shared between YRI and CEU). We found that downstream analyses in the Native Hawaiian population, such as local ancestry inference, imputation, and IBD segment and relatedness detections, would achieve similar efficacy when using the NH map compared to an omnibus map. However, for genome scans of adaptive loci using integrated haplotype scores, we found several loci with apparent genome-wide significant signals (|Z-score| > 4) in Native Hawaiians that would not have been significant when analyzed using NH-specific maps. Population-specific recombination maps may therefore improve the robustness of haplotype-based statistics and help us better characterize the evolutionary history that may underlie Native Hawaiian-specific health conditions that persist today.
Collapse
Affiliation(s)
- Bryan L Dinh
- Department of Quantitative and Computational Biology, University of Southern California
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Echo Tang
- Department of Quantitative and Computational Biology, University of Southern California
| | - Kekoa Taparra
- Department of Radiation Oncology, Stanford University, Palo Alto, California
| | | | - Fei Chen
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Charleston W K Chiang
- Department of Quantitative and Computational Biology, University of Southern California
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| |
Collapse
|
45
|
He H, Yang H, Foo R, Chan W, Zhu F, Liu Y, Zhou X, Ma L, Wang LF, Zhai W. Population genomic analysis reveals distinct demographics and recent adaptation in the black flying fox (Pteropus alecto). J Genet Genomics 2023; 50:554-562. [PMID: 37182682 DOI: 10.1016/j.jgg.2023.05.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 05/03/2023] [Accepted: 05/03/2023] [Indexed: 05/16/2023]
Abstract
As the only mammalian group capable of powered flight, bats have many unique biological traits. Previous comparative genomic studies in bats have focused on long-term evolution. However, the micro-evolutionary processes driving recent evolution are largely under-explored. Using resequencing data from 50 black flying foxes (Pteropus alecto), one of the model species for bats, we find that black flying fox has much higher genetic diversity and lower levels of linkage disequilibrium than most of the mammalian species. Demographic inference reveals strong population fluctuations (>100 fold) coinciding with multiple historical events including the last glacial change and Toba super eruption, suggesting that the black flying fox is a very resilient species with strong recovery abilities. While long-term adaptation in the black flying fox is enriched in metabolic genes, recent adaptation in the black flying fox has a unique landscape where recently selected genes are not strongly enriched in any functional category. The demographic history and mode of adaptation suggest that black flying fox might be a well-adapted species with strong evolutionary resilience. Taken together, this study unravels a vibrant landscape of recent evolution for the black flying fox and sheds light on several unique evolutionary processes for bats comparing to other mammalian groups.
Collapse
Affiliation(s)
- Haopeng He
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hechuan Yang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Randy Foo
- Programme in Emerging Infectious Diseases, Duke-NUS Medical School, Singapore 169857, Singapore; Singhealth Duke-NUS Global Health Institute, Singapore 169857, Singapore
| | - Wharton Chan
- Programme in Emerging Infectious Diseases, Duke-NUS Medical School, Singapore 169857, Singapore; Singhealth Duke-NUS Global Health Institute, Singapore 169857, Singapore
| | - Feng Zhu
- Programme in Emerging Infectious Diseases, Duke-NUS Medical School, Singapore 169857, Singapore; Singhealth Duke-NUS Global Health Institute, Singapore 169857, Singapore
| | - Yunsong Liu
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xuming Zhou
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Liang Ma
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China.
| | - Lin-Fa Wang
- Programme in Emerging Infectious Diseases, Duke-NUS Medical School, Singapore 169857, Singapore; Singhealth Duke-NUS Global Health Institute, Singapore 169857, Singapore.
| | - Weiwei Zhai
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, Yunnan 650223, China.
| |
Collapse
|
46
|
Hoge C, de Manuel M, Mahgoub M, Okami N, Fuller Z, Banerjee S, Baker Z, McNulty M, Andolfatto P, Macfarlan TS, Schumer M, Tzika AC, Przeworski M. Patterns of recombination in snakes reveal a tug of war between PRDM9 and promoter-like features. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.11.548536. [PMID: 37502971 PMCID: PMC10369914 DOI: 10.1101/2023.07.11.548536] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
In vertebrates, there are two known mechanisms by which meiotic recombination is directed to the genome: in humans, mice, and other mammals, recombination occurs almost exclusively where the protein PRDM9 binds, while in species lacking an intact PRDM9, such as birds and canids, recombination rates are elevated near promoter-like features. To test if PRDM9 also directs recombination in non-mammalian vertebrates, we focused on an exemplar species, the corn snake (Pantherophis guttatus). Unlike birds, this species possesses a single, intact PRDM9 ortholog. By inferring historical recombination rates along the genome from patterns of linkage disequilibrium and identifying crossovers in pedigrees, we found that PRDM9 specifies the location of recombination events outside of mammals. However, we also detected an independent effect of promoter-like features on recombination, which is more pronounced on macro- than microchromosomes. Thus, our findings reveal that the uses of PRDM9 and promoter-like features are not mutually-exclusive, and instead reflect a tug of war, which varies in strength along the genome and is more lopsided in some species than others.
Collapse
Affiliation(s)
- Carla Hoge
- Dept. of Biological Sciences, Columbia University
| | | | - Mohamed Mahgoub
- The Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health
| | - Naima Okami
- Dept. of Biological Sciences, Columbia University
| | | | | | | | | | | | - Todd S Macfarlan
- The Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health
| | - Molly Schumer
- Dept. of Biology, Stanford University
- Howard Hughes Medical Institute, Stanford, CA
| | - Athanasia C Tzika
- Laboratory of Artificial & Natural Evolution (LANE), Department of Genetics & Evolution, University of Geneva
| | - Molly Przeworski
- Dept. of Biological Sciences, Columbia University
- Howard Hughes Medical Institute, Stanford, CA
| |
Collapse
|
47
|
Naseri A, Yue W, Zhang S, Zhi D. Fast inference of genetic recombination rates in biobank scale data. Genome Res 2023; 33:1015-1022. [PMID: 37349109 PMCID: PMC10538484 DOI: 10.1101/gr.277676.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 06/09/2023] [Indexed: 06/24/2023]
Abstract
Although rates of recombination events across the genome (genetic maps) are fundamental to genetic research, the majority of current studies only use one standard map. There is evidence suggesting population differences in genetic maps, and thus estimating population-specific maps, are of interest. Although the recent availability of biobank-scale data offers such opportunities, current methods are not efficient at leveraging very large sample sizes. The most accurate methods are still linkage disequilibrium (LD)-based methods that are only tractable for a few hundred samples. In this work, we propose a fast and memory-efficient method for estimating genetic maps from population genotyping data. Our method, FastRecomb, leverages the efficient positional Burrows-Wheeler transform (PBWT) data structure for counting IBD segment boundaries as potential recombination events. We used PBWT blocks to avoid redundant counting of pairwise matches. Moreover, we used a panel-smoothing technique to reduce the noise from errors and recent mutations. Using simulation, we found that FastRecomb achieves state-of-the-art performance at 10-kb resolution, in terms of correlation coefficients between the estimated map and the ground truth. This is mainly because FastRecomb can effectively take advantage of large panels comprising more than hundreds of thousands of haplotypes. At the same time, other methods lack the efficiency to handle such data. We believe further refinement of FastRecomb would deliver more accurate genetic maps for the genetics community.
Collapse
Affiliation(s)
- Ardalan Naseri
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center, Houston, Texas 77030, USA
| | - William Yue
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center, Houston, Texas 77030, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, Florida 32816, USA
| | - Degui Zhi
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center, Houston, Texas 77030, USA;
| |
Collapse
|
48
|
Wang N, Cao S, Liu Z, Xiao H, Hu J, Xu X, Chen P, Ma Z, Ye J, Chai L, Guo W, Larkin RM, Xu Q, Morrell PL, Zhou Y, Deng X. Genomic conservation of crop wild relatives: A case study of citrus. PLoS Genet 2023; 19:e1010811. [PMID: 37339133 DOI: 10.1371/journal.pgen.1010811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 06/01/2023] [Indexed: 06/22/2023] Open
Abstract
Conservation of crop wild relatives is critical for plant breeding and food security. The lack of clarity on the genetic factors that lead to endangered status or extinction create difficulties when attempting to develop concrete recommendations for conserving a citrus wild relative: the wild relatives of crops. Here, we evaluate the conservation of wild kumquat (Fortunella hindsii) using genomic, geographical, environmental, and phenotypic data, and forward simulations. Genome resequencing data from 73 accessions from the Fortunella genus were combined to investigate population structure, demography, inbreeding, introgression, and genetic load. Population structure was correlated with reproductive type (i.e., sexual and apomictic) and with a significant differentiation within the sexually reproducing population. The effective population size for one of the sexually reproducing subpopulations has recently declined to ~1,000, resulting in high levels of inbreeding. In particular, we found that 58% of the ecological niche overlapped between wild and cultivated populations and that there was extensive introgression into wild samples from cultivated populations. Interestingly, the introgression pattern and accumulation of genetic load may be influenced by the type of reproduction. In wild apomictic samples, the introgressed regions were primarily heterozygous, and genome-wide deleterious variants were hidden in the heterozygous state. In contrast, wild sexually reproducing samples carried a higher recessive deleterious burden. Furthermore, we also found that sexually reproducing samples were self-incompatible, which prevented the reduction of genetic diversity by selfing. Our population genomic analyses provide specific recommendations for distinct reproductive types and monitoring during conservation. This study highlights the genomic landscape of a wild relative of citrus and provides recommendations for the conservation of crop wild relatives.
Collapse
Affiliation(s)
- Nan Wang
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China
- State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Shuo Cao
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China
- State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Zhongjie Liu
- State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Hua Xiao
- State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Jianbing Hu
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China
| | - Xiaodong Xu
- State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Peng Chen
- Institute of Horticultural Research, Hunan Academy of Agricultural Sciences, Changsha, China
| | - Zhiyao Ma
- State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Junli Ye
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China
| | - Lijun Chai
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China
| | - Wenwu Guo
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Robert M Larkin
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Qiang Xu
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Peter L Morrell
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, Minnesota, United States of America
| | - Yongfeng Zhou
- State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
- State Key Laboratory of Tropical Crop Breeding, Tropical Crops Genetic Resources Institute, Chinese Academy of Tropical Agricultural Sciences, Haikou, China
| | - Xiuxin Deng
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| |
Collapse
|
49
|
Colbran LL, Ramos-Almodovar FC, Mathieson I. A gene-level test for directional selection on gene expression. Genetics 2023; 224:iyad060. [PMID: 37036411 PMCID: PMC10213495 DOI: 10.1093/genetics/iyad060] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 01/11/2023] [Accepted: 03/31/2023] [Indexed: 04/11/2023] Open
Abstract
Most variants identified in human genome-wide association studies and scans for selection are noncoding. Interpretation of their effects and the way in which they contribute to phenotypic variation and adaptation in human populations is therefore limited by our understanding of gene regulation and the difficulty of confidently linking noncoding variants to genes. To overcome this, we developed a gene-wise test for population-specific selection based on combinations of regulatory variants. Specifically, we use the QX statistic to test for polygenic selection on cis-regulatory variants based on whether the variance across populations in the predicted expression of a particular gene is higher than expected under neutrality. We then applied this approach to human data, testing for selection on 17,388 protein-coding genes in 26 populations from the Thousand Genomes Project. We identified 45 genes with significant evidence (FDR<0.1) for selection, including FADS1, KHK, SULT1A2, ITGAM, and several genes in the HLA region. We further confirm that these signals correspond to plausible population-level differences in predicted expression. While the small number of significant genes (0.2%) is consistent with most cis-regulatory variation evolving under genetic drift or stabilizing selection, it remains possible that there are effects not captured in this study. Our gene-level QX score is independent of standard genomic tests for selection, and may therefore be useful in combination with traditional selection scans to specifically identify selection on regulatory variation. Overall, our results demonstrate the utility of combining population-level genomic data with functional data to understand the evolution of gene expression.
Collapse
Affiliation(s)
- Laura L Colbran
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | - Iain Mathieson
- Corresponding author: Department of Genetics, Perelman School of Medicine, University of Pennsylvania, 405B Clinical Research Building, 415 Curie Blvd, Philadelphia, PA 19104, USA. ; *Corresponding author: Department of Genetics, Perelman School of Medicine, University of Pennsylvania, 405B Clinical Research Building, 415 Curie Blvd, Philadelphia, PA 19104, USA.
| |
Collapse
|
50
|
Spence JP, Zeng T, Mostafavi H, Pritchard JK. Scaling the Discrete-time Wright Fisher model to biobank-scale datasets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.19.541517. [PMID: 37293115 PMCID: PMC10245735 DOI: 10.1101/2023.05.19.541517] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The Discrete-Time Wright Fisher (DTWF) model and its large population diffusion limit are central to population genetics. These models describe the forward-in-time evolution of the frequency of an allele in a population and can include the fundamental forces of genetic drift, mutation, and selection. Computing like-lihoods under the diffusion process is feasible, but the diffusion approximation breaks down for large sample sizes or in the presence of strong selection. Unfortunately, existing methods for computing likelihoods under the DTWF model do not scale to current exome sequencing sample sizes in the hundreds of thousands. Here we present an algorithm that approximates the DTWF model with provably bounded error and runs in time linear in the size of the population. Our approach relies on two key observations about Binomial distributions. The first is that Binomial distributions are approximately sparse. The second is that Binomial distributions with similar success probabilities are extremely close as distributions, allowing us to approximate the DTWF Markov transition matrix as a very low rank matrix. Together, these observations enable matrix-vector multiplication in linear (as opposed to the usual quadratic) time. We prove similar properties for Hypergeometric distributions, enabling fast computation of likelihoods for subsamples of the population. We show theoretically and in practice that this approximation is highly accurate and can scale to population sizes in the billions, paving the way for rigorous biobank-scale population genetic inference. Finally, we use our results to estimate how increasing sample sizes will improve the estimation of selection coefficients acting on loss-of-function variants. We find that increasing sample sizes beyond existing large exome sequencing cohorts will provide essentially no additional information except for genes with the most extreme fitness effects.
Collapse
Affiliation(s)
| | - Tony Zeng
- Department of Genetics, Stanford University
| | | | - Jonathan K. Pritchard
- Department of Genetics, Stanford University
- Department of Biology, Stanford University
| |
Collapse
|