1
|
Arnab SP, Campelo dos Santos AL, Fumagalli M, DeGiorgio M. Efficient Detection and Characterization of Targets of Natural Selection Using Transfer Learning. Mol Biol Evol 2025; 42:msaf094. [PMID: 40341942 PMCID: PMC12062966 DOI: 10.1093/molbev/msaf094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Revised: 04/16/2025] [Accepted: 04/17/2025] [Indexed: 05/11/2025] Open
Abstract
Natural selection leaves detectable patterns of altered spatial diversity within genomes, and identifying affected regions is crucial for understanding species evolution. Recently, machine learning approaches applied to raw population genomic data have been developed to uncover these adaptive signatures. Convolutional neural networks (CNNs) are particularly effective for this task, as they handle large data arrays while maintaining element correlations. However, shallow CNNs may miss complex patterns due to their limited capacity, while deep CNNs can capture these patterns but require extensive data and computational power. Transfer learning addresses these challenges by utilizing a deep CNN pretrained on a large dataset as a feature extraction tool for downstream classification and evolutionary parameter prediction. This approach reduces extensive training data generation requirements and computational needs while maintaining high performance. In this study, we developed TrIdent, a tool that uses transfer learning to enhance detection of adaptive genomic regions from image representations of multilocus variation. We evaluated TrIdent across various genetic, demographic, and adaptive settings, in addition to unphased data and other confounding factors. TrIdent demonstrated improved detection of adaptive regions compared to recent methods using similar data representations. We further explored model interpretability through class activation maps and adapted TrIdent to infer selection parameters for identified adaptive candidates. Using whole-genome haplotype data from European and African populations, TrIdent effectively recapitulated known sweep candidates and identified novel cancer, and other disease-associated genes as potential sweeps.
Collapse
Affiliation(s)
- Sandipan Paul Arnab
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL, USA
| | | | - Matteo Fumagalli
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, UK
- The Alan Turing Institute, London, UK
| | - Michael DeGiorgio
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL, USA
| |
Collapse
|
2
|
van den Belt S, Alachiotis N. Fast and accurate deep learning scans for signatures of natural selection in genomes using FASTER-NN. Commun Biol 2025; 8:58. [PMID: 39814854 PMCID: PMC11735897 DOI: 10.1038/s42003-025-07480-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Accepted: 01/07/2025] [Indexed: 01/18/2025] Open
Abstract
Deep learning classification models based on Convolutional Neural Networks (CNNs) are increasingly used in population genetic inference for detecting signatures of natural selection. Prevailing detection methods treat the design of the classifier as a discrete phase, assuming that high classification accuracy is the sole prerequisite for precise detection. This frequently steers method development toward classification-driven optimizations that can inadvertently impede detection. We present FASTER-NN, a CNN classifier designed specifically for the precise detection of natural selection. It has higher sensitivity than state-of-the-art CNN classifiers while only processing allele frequencies and genomic positions through dilated convolutions to maximize data reuse. As a result, execution time is invariant to the sample size and the chromosome length, creating a highly suitable solution for large-scale, whole-genome scans. Furthermore, FASTER-NN can accurately identify selective sweeps in recombination hotspots, which is a highly challenging detection problem with very limited theoretical treatment to date.
Collapse
|
3
|
Anderson NW, Kirk L, Schraiber JG, Ragsdale AP. A path integral approach for allele frequency dynamics under polygenic selection. Genetics 2025; 229:1-63. [PMID: 39531638 PMCID: PMC12086674 DOI: 10.1093/genetics/iyae182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Revised: 10/11/2024] [Accepted: 10/16/2024] [Indexed: 11/16/2024] Open
Abstract
Many phenotypic traits have a polygenic genetic basis, making it challenging to learn their genetic architectures and predict individual phenotypes. One promising avenue to resolve the genetic basis of complex traits is through evolve-and-resequence (E&R) experiments, in which laboratory populations are exposed to some selective pressure and trait-contributing loci are identified by extreme frequency changes over the course of the experiment. However, small laboratory populations will experience substantial random genetic drift, and it is difficult to determine whether selection played a role in a given allele frequency change (AFC). Predicting AFCs under drift and selection, even for alleles contributing to simple, monogenic traits, has remained a challenging problem. Recently, there have been efforts to apply the path integral, a method borrowed from physics, to solve this problem. So far, this approach has been limited to genic selection, and is therefore inadequate to capture the complexity of quantitative, highly polygenic traits that are commonly studied. Here, we extend one of these path integral methods, the perturbation approximation, to selection scenarios that are of interest to quantitative genetics. We derive analytic expressions for the transition probability (i.e. the probability that an allele will change in frequency from x to y in time t) of an allele contributing to a trait subject to stabilizing selection, as well as that of an allele contributing to a trait rapidly adapting to a new phenotypic optimum. We use these expressions to characterize the use of AFC to test for selection, as well as explore optimal design choices for E&R experiments to uncover the genetic architecture of polygenic traits under selection.
Collapse
Affiliation(s)
- Nathan W Anderson
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Lloyd Kirk
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Joshua G Schraiber
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Aaron P Ragsdale
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, WI 53706, USA
| |
Collapse
|
4
|
Zhao H, Alachiotis N. Data preprocessing methods for selective sweep detection using convolutional neural networks. Methods 2025; 233:19-29. [PMID: 39550020 DOI: 10.1016/j.ymeth.2024.11.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Revised: 10/28/2024] [Accepted: 11/04/2024] [Indexed: 11/18/2024] Open
Abstract
The identification of positive selection has been framed as a classification task, with Convolutional Neural Networks (CNNs) already outperforming summary statistics and likelihood-based approaches in accuracy. Despite the prevalence of CNN-based methods that manipulate the pixels of images representing raw genomic data as a preprocessing step to improve classification accuracy, the efficacy of these pixel-rearrangement techniques remains inadequately examined, particularly in the presence of confounding factors like population bottlenecks, migration and recombination hotspots. We introduce a set of pixel rearrangement algorithms aimed at enhancing CNN classification accuracy in detecting selective sweeps. These algorithms are employed to assess the performance of four CNN models for selective sweep detection. Our findings illustrate that the judicious application of rearrangement algorithms notably enhances the overall classification accuracy of a CNN across various datasets simulating confounding factors. We observed that sorting the columns of the genomic matrices has higher on CNN performance than rearranging the sequences. To some extent, these rearrangement algorithms are more robust to misspecified demographic models compared with the utilization of the default preprocessing algorithm as suggested by the respective authors of each CNN architecture. We provide the data rearrangement algorithms as a distinct package available for download at: https://github.com/Zhaohq96/Genetic-data-rearrangement.
Collapse
Affiliation(s)
- Hanqing Zhao
- University of Twente, Drienerlolaan 5, Enschede, 7522 NB, Overijssel, the Netherlands.
| | - Nikolaos Alachiotis
- University of Twente, Drienerlolaan 5, Enschede, 7522 NB, Overijssel, the Netherlands.
| |
Collapse
|
5
|
Nocchi G, Whiting JR, Yeaman S. Repeated global adaptation across plant species. Proc Natl Acad Sci U S A 2024; 121:e2406832121. [PMID: 39705310 DOI: 10.1073/pnas.2406832121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 11/09/2024] [Indexed: 12/22/2024] Open
Abstract
Global adaptation occurs when all populations of a species undergo selection toward a common optimum. This can occur by a hard selective sweep with the emergence of a new globally advantageous allele that spreads throughout a species' natural range until reaching fixation. This evolutionary process leaves a temporary trace in the region affected, which is detectable using population genomic methods. While selective sweeps have been identified in many species, there have been few comparative and systematic studies of the genes involved in global adaptation. Building upon recent findings showing repeated genetic basis of local adaptation across independent populations and species, we asked whether certain genes play a more significant role in driving global adaptation across plant species. To address this question, we scanned the genomes of 17 plant species to identify signals of repeated global selective sweeps. Despite the substantial evolutionary distance between the species analyzed, we identified several gene families with strong evidence of repeated positive selection. These gene families tend to be enriched for reduced pleiotropy, consistent with predictions from Fisher's evolutionary model and the cost of complexity hypothesis. We also found that genes with repeated sweeps exhibit elevated levels of gene duplication. Our findings contrast with recent observations of increased pleiotropy in genes driving local adaptation, consistent with predictions based on the theory of migration-selection balance.
Collapse
Affiliation(s)
- Gabriele Nocchi
- Department of Biological Sciences, University of Calgary, Calgary, AB T2N 1N4, Canada
| | - James R Whiting
- Department of Biological Sciences, University of Calgary, Calgary, AB T2N 1N4, Canada
| | - Samuel Yeaman
- Department of Biological Sciences, University of Calgary, Calgary, AB T2N 1N4, Canada
| |
Collapse
|
6
|
Tiwari M, Gujar G, Shashank CG, Ponsuksili S. Selection signatures for high altitude adaptation in livestock: A review. Gene 2024; 927:148757. [PMID: 38986751 DOI: 10.1016/j.gene.2024.148757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Revised: 07/01/2024] [Accepted: 07/05/2024] [Indexed: 07/12/2024]
Abstract
High altitude adapted livestock species (cattle, yak, goat, sheep, and horse) has critical role in the human socioeconomic sphere and acts as good source of animal source products including milk, meat, and leather, among other things. These species sustain production and reproduction even in harsh environments on account of adaptation resulting from continued evolution of beneficial traits. Selection pressure leads to various adaptive strategies in livestock whose footprints are evident at the different genomic sites as the "Selection Signature". Scrutiny of these signatures provides us crucial insight into the evolutionary process and domestication of livestock adapted to diverse climatic conditions. These signatures have the potential to change the sphere of animal breeding and further usher the selection programmes in right direction. Technological revolution and recent strides made in genomic studies has opened the routes for the identification of selection signatures. Numerous statistical approaches and bioinformatics tools have been developed to detect the selection signature. Consequently, studies across years have identified candidate genes under selection region found associated with numerous traits which have a say in adaptation to high-altitude environment. This makes it pertinent to have a better understanding about the selection signature, the ways to identify and how to utilize them for betterment of livestock populations as well as farmers. This review takes a closer look into the general concept, various methodologies, and bioinformatics tools commonly employed in selection signature studies and summarize the results of recent selection signature studies related to high-altitude adaptation in various livestock species. This review will serve as an informative and useful insight for researchers and students in the field of animal breeding and evolutionary biology.
Collapse
Affiliation(s)
- Manish Tiwari
- ICAR-National Dairy Research Institute, Karnal, India; U.P. Pt. Deen Dayal Upadhyaya Veterinary Science University and Cattle Research Institute, Mathura, India.
| | | | - C G Shashank
- ICAR-National Dairy Research Institute, Karnal, India
| | | |
Collapse
|
7
|
Soni V, Terbot JW, Versoza CJ, Pfeifer SP, Jensen JD. A whole-genome scan for evidence of recent positive and balancing selection in aye-ayes ( Daubentonia madagascariensis) utilizing a well-fit evolutionary baseline model. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.08.622667. [PMID: 39605496 PMCID: PMC11601216 DOI: 10.1101/2024.11.08.622667] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
The aye-aye (Daubentonia madagascariensis) is one of the 25 most endangered primate species in the world, maintaining amongst the lowest genetic diversity of any primate measured to date. Characterizing patterns of genetic variation within aye-aye populations, and the relative influences of neutral and selective processes in shaping that variation, is thus important for future conservation efforts. In this study, we performed the first whole-genome scans for recent positive and balancing selection in the species, utilizing high-coverage population genomic data from newly sequenced individuals. We generated null thresholds for our genomic scans by creating an evolutionarily appropriate baseline model that incorporates the demographic history of this aye-aye population, and identified a small number of candidate genes. Most notably, a suite of genes involved in olfaction - a key trait in these nocturnal primates - were identified as experiencing long-term balancing selection. We also conducted analyses to quantify the expected statistical power to detect positive and balancing selection in this population using site frequency spectrum-based inference methods, once accounting for the potentially confounding contributions of population history, recombination and mutation rate variation, and purifying and background selection. This work, presenting the first high-quality, genome-wide polymorphism data across the functional regions of the aye-aye genome, thus provides important insights into the landscape of episodic selective forces in this highly endangered species.
Collapse
Affiliation(s)
- Vivak Soni
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - John W. Terbot
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Cyril J. Versoza
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Susanne P. Pfeifer
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Jeffrey D. Jensen
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
8
|
Whitehouse LS, Ray DD, Schrider DR. Tree Sequences as a General-Purpose Tool for Population Genetic Inference. Mol Biol Evol 2024; 41:msae223. [PMID: 39460991 PMCID: PMC11600592 DOI: 10.1093/molbev/msae223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 10/05/2024] [Accepted: 10/17/2024] [Indexed: 10/28/2024] Open
Abstract
As population genetic data increase in size, new methods have been developed to store genetic information in efficient ways, such as tree sequences. These data structures are computationally and storage efficient but are not interchangeable with existing data structures used for many population genetic inference methodologies such as the use of convolutional neural networks applied to population genetic alignments. To better utilize these new data structures, we propose and implement a graph convolutional network to directly learn from tree sequence topology and node data, allowing for the use of neural network applications without an intermediate step of converting tree sequences to population genetic alignment format. We then compare our approach to standard convolutional neural network approaches on a set of previously defined benchmarking tasks including recombination rate estimation, positive selection detection, introgression detection, and demographic model parameter inference. We show that tree sequences can be directly learned from using a graph convolutional network approach and can be used to perform well on these common population genetic inference tasks with accuracies roughly matching or even exceeding that of a convolutional neural network-based method. As tree sequences become more widely used in population genetic research, we foresee developments and optimizations of this work to provide a foundation for population genetic inference moving forward.
Collapse
Affiliation(s)
- Logan S Whitehouse
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Dylan D Ray
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA
| |
Collapse
|
9
|
Cheng X, Steinrücken M. Population Genomic Scans for Natural Selection and Demography. Annu Rev Genet 2024; 58:319-339. [PMID: 39227130 DOI: 10.1146/annurev-genet-111523-102651] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
Uncovering the fundamental processes that shape genomic variation in natural populations is a primary objective of population genetics. These processes include demographic effects such as past changes in effective population size or gene flow between structured populations. Furthermore, genomic variation is affected by selection on nonneutral genetic variants, for example, through the adaptation of beneficial alleles or balancing selection that maintains genetic variation. In this article, we discuss the characterization of these processes using population genetic models, and we review methods developed on the basis of these models to unravel the underlying processes from modern population genomic data sets. We briefly discuss the conditions in which these approaches can be used to infer demography or identify specific nonneutral genetic variants and cases in which caution is warranted. Moreover, we summarize the challenges of jointly inferring demography and selective processes that affect neutral variation genome-wide.
Collapse
Affiliation(s)
- Xiaoheng Cheng
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, USA;
| | - Matthias Steinrücken
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, USA;
| |
Collapse
|
10
|
Whitehouse LS, Ray D, Schrider DR. Tree sequences as a general-purpose tool for population genetic inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.20.581288. [PMID: 39185244 PMCID: PMC11343121 DOI: 10.1101/2024.02.20.581288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
As population genetics data increases in size new methods have been developed to store genetic information in efficient ways, such as tree sequences. These data structures are computationally and storage efficient, but are not interchangeable with existing data structures used for many population genetic inference methodologies such as the use of convolutional neural networks (CNNs) applied to population genetic alignments. To better utilize these new data structures we propose and implement a graph convolutional network (GCN) to directly learn from tree sequence topology and node data, allowing for the use of neural network applications without an intermediate step of converting tree sequences to population genetic alignment format. We then compare our approach to standard CNN approaches on a set of previously defined benchmarking tasks including recombination rate estimation, positive selection detection, introgression detection, and demographic model parameter inference. We show that tree sequences can be directly learned from using a GCN approach and can be used to perform well on these common population genetics inference tasks with accuracies roughly matching or even exceeding that of a CNN-based method. As tree sequences become more widely used in population genetics research we foresee developments and optimizations of this work to provide a foundation for population genetics inference moving forward.
Collapse
Affiliation(s)
- Logan S. Whitehouse
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA, 120 Mason Farm Rd, Chapel Hill, NC 27514
| | - Dylan Ray
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA, 120 Mason Farm Rd, Chapel Hill, NC 27514
| | - Daniel R. Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA, 120 Mason Farm Rd, Chapel Hill, NC 27514
| |
Collapse
|
11
|
Kaushik S, Jain K, Johri P. Genetic diversity during selective sweeps in non-recombining populations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.12.612756. [PMID: 39345399 PMCID: PMC11429930 DOI: 10.1101/2024.09.12.612756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Selective sweeps, resulting from the spread of beneficial, neutral, or deleterious mutations through a population, shape patterns of genetic variation at linked neutral sites. While many theoretical, computational, and statistical advances have been made in understanding the genomic signatures of selective sweeps in recombining populations, substantially less is understood in populations with little/no recombination. We present a mathematical framework based on diffusion theory for obtaining the site frequency spectrum (SFS) at linked neutral sites immediately post and during the fixation of moderately or strongly beneficial mutations. We find that when a single hard sweep occurs, the SFS decays as 1/ x for low derived allele frequencies ( x ), similar to the neutral SFS at equilibrium, whereas at higher derived allele frequencies, it follows a 1/ x 2 power law. These power laws are universal in the sense that they are independent of the dominance and inbreeding coefficient, and also characterize the SFS during the sweep. Additionally, we find that the derived allele frequency where the SFS shifts from the 1/ x to 1/ x 2 law, is inversely proportional to the selection strength: thus under strong selection, the SFS follows the 1/ x 2 dependence for most allele frequencies, resembling a rapidly expanding neutral population. When clonal interference is pervasive, the SFS immediately post-fixation becomes U-shaped and is better explained by the equilibrium SFS of selected sites. Our results will be important in developing statistical methods to infer the timing and strength of recent selective sweeps in asexual populations, genomic regions that lack recombination, and clonally propagating tumor populations.
Collapse
|
12
|
van den Belt S, Zhao H, Alachiotis N. Scalable CNN-based classification of selective sweeps using derived allele frequencies. Bioinformatics 2024; 40:ii29-ii36. [PMID: 39230693 PMCID: PMC11373383 DOI: 10.1093/bioinformatics/btae385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024] Open
Abstract
MOTIVATION Selective sweeps can successfully be distinguished from neutral genetic data using summary statistics and likelihood-based methods that analyze single nucleotide polymorphisms (SNPs). However, these methods are sensitive to confounding factors, such as severe population bottlenecks and old migration. By virtue of machine learning, and specifically convolutional neural networks (CNNs), new accurate classification models that are robust to confounding factors have been recently proposed. However, such methods are more computationally expensive than summary-statistic-based ones, yielding them impractical for processing large-scale genomic data. Moreover, SNP data are frequently preprocessed to improve classification accuracy, further exacerbating the long analysis times. RESULTS To this end, we propose a 1D CNN-based model, dubbed FAST-NN, that does not require any preprocessing while using only derived allele frequencies instead of summary statistics or raw SNP data, thereby yielding a sample-size-invariant, scalable solution. We evaluated several data fusion approaches to account for the variance of the density of genetic diversity across genomic regions (a selective sweep signature), and performed an extensive neural architecture search based on a state-of-the-art reference network architecture (SweepNet). The resulting model, FAST-NN, outperforms the reference architecture by up to 12% inference accuracy over all challenging evolutionary scenarios with confounding factors that were evaluated. Moreover, FAST-NN is between 30× and 259× faster on a single CPU core, and between 2.0× and 6.2× faster on a GPU, when processing sample sizes between 128 and 1000 samples. Our work paves the way for the practical use of CNNs in large-scale selective sweep detection. AVAILABILITY AND IMPLEMENTATION https://github.com/SjoerdvandenBelt/FAST-NN.
Collapse
Affiliation(s)
- Sjoerd van den Belt
- Department of Computer Science, Faculty of EEMCS, University of Twente, 7522NB Enschede, The Netherlands
| | - Hanqing Zhao
- Department of Computer Science, Faculty of EEMCS, University of Twente, 7522NB Enschede, The Netherlands
| | - Nikolaos Alachiotis
- Department of Computer Science, Faculty of EEMCS, University of Twente, 7522NB Enschede, The Netherlands
| |
Collapse
|
13
|
Cereghino C, Michalak K, DiGiuseppe S, Guerra J, Yu D, Faraji A, Sharp AK, Brown AM, Kang L, Weger-Lucarelli J, Michalak P. Evolution at Spike protein position 519 in SARS-CoV-2 facilitated adaptation to humans. NPJ VIRUSES 2024; 2:29. [PMID: 40295673 PMCID: PMC11721114 DOI: 10.1038/s44298-024-00036-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 04/25/2024] [Indexed: 04/30/2025]
Abstract
As the COVID-19 pandemic enters its fourth year, the pursuit of identifying a progenitor virus to SARS-CoV-2 and understanding the mechanism of its emergence persists, albeit against the backdrop of intensified efforts to monitor the ongoing evolution of the virus and the influx of new mutations. Surprisingly, few residues hypothesized to be essential for SARS-CoV-2 emergence and adaptation to humans have been validated experimentally, despite the importance that these mutations could contribute to the development of effective antivirals. To remedy this, we searched for genomic regions in the SARS-CoV-2 genome that show evidence of past selection around residues unique to SARS-CoV-2 compared with closely related coronaviruses. In doing so, we identified a residue at position 519 in Spike within the receptor binding domain that holds a static histidine in human-derived SARS-CoV-2 sequences but an asparagine in SARS-related coronaviruses from bats and pangolins. In experimental validation, the SARS-CoV-2 Spike protein mutant carrying the putatively ancestral H519N substitution showed reduced replication in human lung cells, suggesting that the histidine residue contributes to viral fitness in the human host. Structural analyses revealed a potential role of Spike residue 519 in mediating conformational transitions necessary for Spike prior to binding with ACE2. Pseudotyped viruses bearing the putatively ancestral N519 also demonstrated significantly reduced infectivity in cells expressing the human ACE2 receptor compared to H519. ELISA data corroborated that H519 enhances Spike binding affinity to the human ACE2 receptor compared to the putatively ancestral N519. Collectively, these findings suggest that the evolutionary transition at position 519 of the Spike protein played a critical role in SARS-CoV-2 emergence and adaptation to the human host. Additionally, this residue presents as a potential drug target for designing small molecule inhibitors tailored to this site.
Collapse
Affiliation(s)
- C Cereghino
- Department of Biomedical Sciences and Pathobiology, Virginia Tech, Blacksburg, VA, USA
- Center for Emerging, Zoonotic and Arthropod-borne Pathogens, Virginia Tech, Blacksburg, VA, USA
| | - K Michalak
- Department of Biomedical Research, Edward Via College of Osteopathic Medicine, Monroe, LA, USA
| | - S DiGiuseppe
- Department of Biomedical Research, Edward Via College of Osteopathic Medicine, Monroe, LA, USA
| | - J Guerra
- Department of Biomedical Research, Edward Via College of Osteopathic Medicine, Monroe, LA, USA
| | - D Yu
- Department of Biomedical Research, Edward Via College of Osteopathic Medicine, Monroe, LA, USA
| | - A Faraji
- Department of Biomedical Research, Edward Via College of Osteopathic Medicine, Monroe, LA, USA
| | - A K Sharp
- Department of Biochemistry, Virginia Tech, Blacksburg, VA, USA
| | - A M Brown
- Department of Biochemistry, Virginia Tech, Blacksburg, VA, USA
- Research and Informatics, University Libraries, Virginia Tech, Blacksburg, VA, USA
| | - L Kang
- Department of Biomedical Research, Edward Via College of Osteopathic Medicine, Monroe, LA, USA
- College of Pharmacy, University of Louisiana Monroe, Monroe, LA, USA
- Center for One Health Research, VA-MD College of Veterinary Medicine, Blacksburg, VA, USA
| | - J Weger-Lucarelli
- Department of Biomedical Sciences and Pathobiology, Virginia Tech, Blacksburg, VA, USA.
- Center for Emerging, Zoonotic and Arthropod-borne Pathogens, Virginia Tech, Blacksburg, VA, USA.
| | - P Michalak
- Department of Biomedical Research, Edward Via College of Osteopathic Medicine, Monroe, LA, USA.
- Center for One Health Research, VA-MD College of Veterinary Medicine, Blacksburg, VA, USA.
- Institute of Evolution, University of Haifa, Haifa, Israel.
| |
Collapse
|
14
|
Weng YM, Kavanaugh DH, Schoville SD. Evidence for Admixture and Rapid Evolution During Glacial Climate Change in an Alpine Specialist. Mol Biol Evol 2024; 41:msae130. [PMID: 38935588 PMCID: PMC11247348 DOI: 10.1093/molbev/msae130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 05/30/2024] [Accepted: 06/14/2024] [Indexed: 06/29/2024] Open
Abstract
The pace of current climate change is expected to be problematic for alpine flora and fauna, as their adaptive capacity may be limited by small population size. Yet, despite substantial genetic drift following post-glacial recolonization of alpine habitats, alpine species are notable for their success surviving in highly heterogeneous environments. Population genomic analyses demonstrating how alpine species have adapted to novel environments with limited genetic diversity remain rare, yet are important in understanding the potential for species to respond to contemporary climate change. In this study, we explored the evolutionary history of alpine ground beetles in the Nebria ingens complex, including the demographic and adaptive changes that followed the last glacier retreat. We first tested alternative models of evolutionary divergence in the species complex. Using millions of genome-wide SNP markers from hundreds of beetles, we found evidence that the N. ingens complex has been formed by past admixture of lineages responding to glacial cycles. Recolonization of alpine sites involved a distributional range shift to higher elevation, which was accompanied by a reduction in suitable habitat and the emergence of complex spatial genetic structure. We tested several possible genetic pathways involved in adaptation to heterogeneous local environments using genome scan and genotype-environment association approaches. From the identified genes, we found enriched functions associated with abiotic stress responses, with strong evidence for adaptation to hypoxia-related pathways. The results demonstrate that despite rapid demographic change, alpine beetles in the N. ingens complex underwent rapid physiological evolution.
Collapse
Affiliation(s)
- Yi-Ming Weng
- Department of Entomology, University of Wisconsin-Madison, Madison, WI, USA
- Okinawa Institute of Science and Technology, Graduate University, Okinawa, Japan
| | - David H Kavanaugh
- California Academy of Sciences, Department of Entomology, San Francisco, CA, USA
| | - Sean D Schoville
- Department of Entomology, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
15
|
Anderson NW, Kirk L, Schraiber JG, Ragsdale AP. A Path Integral Approach for Allele Frequency Dynamics Under Polygenic Selection. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.14.599114. [PMID: 38915613 PMCID: PMC11195211 DOI: 10.1101/2024.06.14.599114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Many phenotypic traits have a polygenic genetic basis, making it challenging to learn their genetic architectures and predict individual phenotypes. One promising avenue to resolve the genetic basis of complex traits is through evolve-and-resequence experiments, in which laboratory populations are exposed to some selective pressure and trait-contributing loci are identified by extreme frequency changes over the course of the experiment. However, small laboratory populations will experience substantial random genetic drift, and it is difficult to determine whether selection played a roll in a given allele frequency change. Predicting how much allele frequencies change under drift and selection had remained an open problem well into the 21st century, even those contributing to simple, monogenic traits. Recently, there have been efforts to apply the path integral, a method borrowed from physics, to solve this problem. So far, this approach has been limited to genic selection, and is therefore inadequate to capture the complexity of quantitative, highly polygenic traits that are commonly studied. Here we extend one of these path integral methods, the perturbation approximation, to selection scenarios that are of interest to quantitative genetics. In particular, we derive analytic expressions for the transition probability (i.e., the probability that an allele will change in frequency from x , to y in time t ) of an allele contributing to a trait subject to stabilizing selection, as well as that of an allele contributing to a trait rapidly adapting to a new phenotypic optimum. We use these expressions to characterize the use of allele frequency change to test for selection, as well as explore optimal design choices for evolve-and-resequence experiments to uncover the genetic architecture of polygenic traits under selection.
Collapse
Affiliation(s)
- Nathan W. Anderson
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Lloyd Kirk
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Joshua G. Schraiber
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, 90089, USA
| | - Aaron P. Ragsdale
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, WI, 53706, USA
| |
Collapse
|
16
|
Soni V, Jensen JD. Temporal challenges in detecting balancing selection from population genomic data. G3 (BETHESDA, MD.) 2024; 14:jkae069. [PMID: 38551137 DOI: 10.1093/g3journal/jkae069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 12/21/2023] [Accepted: 03/19/2024] [Indexed: 04/28/2024]
Abstract
The role of balancing selection in maintaining genetic variation remains an open question in population genetics. Recent years have seen numerous studies identifying candidate loci potentially experiencing balancing selection, most predominantly in human populations. There are however numerous alternative evolutionary processes that may leave similar patterns of variation, thereby potentially confounding inference, and the expected signatures of balancing selection additionally change in a temporal fashion. Here we use forward-in-time simulations to quantify expected statistical power to detect balancing selection using both site frequency spectrum- and linkage disequilibrium-based methods under a variety of evolutionarily realistic null models. We find that whilst site frequency spectrum-based methods have little power immediately after a balanced mutation begins segregating, power increases with time since the introduction of the balanced allele. Conversely, linkage disequilibrium-based methods have considerable power whilst the allele is young, and power dissipates rapidly as the time since introduction increases. Taken together, this suggests that site frequency spectrum-based methods are most effective at detecting long-term balancing selection (>25N generations since the introduction of the balanced allele) whilst linkage disequilibrium-based methods are effective over much shorter timescales (<1N generations), thereby leaving a large time frame over which current methods have little power to detect the action of balancing selection. Finally, we investigate the extent to which alternative evolutionary processes may mimic these patterns, and demonstrate the need for caution in attempting to distinguish the signatures of balancing selection from those of both neutral processes (e.g. population structure and admixture) as well as of alternative selective processes (e.g. partial selective sweeps).
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, AZ 85281, USA
| | - Jeffrey D Jensen
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, AZ 85281, USA
| |
Collapse
|
17
|
Song H, Chu J, Li W, Li X, Fang L, Han J, Zhao S, Ma Y. A Novel Approach Utilizing Domain Adversarial Neural Networks for the Detection and Classification of Selective Sweeps. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2304842. [PMID: 38308186 PMCID: PMC11005742 DOI: 10.1002/advs.202304842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 01/10/2024] [Indexed: 02/04/2024]
Abstract
The identification and classification of selective sweeps are of great significance for improving the understanding of biological evolution and exploring opportunities for precision medicine and genetic improvement. Here, a domain adaptation sweep detection and classification (DASDC) method is presented to balance the alignment of two domains and the classification performance through a domain-adversarial neural network and its adversarial learning modules. DASDC effectively addresses the issue of mismatch between training data and real genomic data in deep learning models, leading to a significant improvement in its generalization capability, prediction robustness, and accuracy. The DASDC method demonstrates improved identification performance compared to existing methods and excels in classification performance, particularly in scenarios where there is a mismatch between application data and training data. The successful implementation of DASDC in real data of three distinct species highlights its potential as a useful tool for identifying crucial functional genes and investigating adaptive evolutionary mechanisms, particularly with the increasing availability of genomic data.
Collapse
Affiliation(s)
- Hui Song
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
| | - Jinyu Chu
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
| | - Wangjiao Li
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
| | - Xinyun Li
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
- Hubei Hongshan LaboratoryWuhan430070China
| | - Lingzhao Fang
- Center for Quantitative Genetics and GenomicsAarhus UniversityAarhus8000Denmark
| | - Jianlin Han
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
- CAAS‐ILRI Joint Laboratory on Livestock and Forage Genetic ResourcesInstitute of Animal ScienceChinese Academy of Agricultural Sciences (CAAS)Beijing100193China
- Livestock Genetics ProgramInternational Livestock Research Institute (ILRI)Nairobi00100Kenya
| | - Shuhong Zhao
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
- Hubei Hongshan LaboratoryWuhan430070China
- Lingnan Modern Agricultural Science and Technology Guangdong LaboratoryGuangzhou510642China
| | - Yunlong Ma
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
- Hubei Hongshan LaboratoryWuhan430070China
- Lingnan Modern Agricultural Science and Technology Guangdong LaboratoryGuangzhou510642China
| |
Collapse
|
18
|
Rajawat D, Panigrahi M, Nayak SS, Bhushan B, Mishra BP, Dutt T. Dissecting the genomic regions of selection on the X chromosome in different cattle breeds. 3 Biotech 2024; 14:50. [PMID: 38268984 PMCID: PMC10803714 DOI: 10.1007/s13205-023-03905-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 12/18/2023] [Indexed: 01/26/2024] Open
Abstract
Mammalian X and Y chromosomes independently evolved from various autosomes approximately 300 million years ago (MYA). To fully understand the relationship between genomic composition and phenotypic diversity arising due to the course of evolution, we have scanned regions of selection signatures on the X chromosome in different cattle breeds. In this study, we have prepared the datasets of 184 individuals of different cattle breeds and explored the complete X chromosome by utilizing four within-population and two between-population methods. There were 23, 25, 30, 17, 17, and 12 outlier regions identified in Tajima's D, CLR, iHS, ROH, FST, and XP-EHH. Bioinformatics analysis showed that these regions harbor important candidate genes like AKAP4 for reproduction in Brown Swiss, MBTS2 for production traits in Brown Swiss and Guernsey, CXCR3 and CITED1 for health traits in Jersey and Nelore, and BMX and CD40LG for regulation of X chromosome inactivation in Nelore and Gir. We identified genes shared among multiple methods, such as TRNAC-GCA and IL1RAPL1, which appeared in Tajima's D, ROH, and iHS analyses. The gene TRNAW-CCA was found in ROH, CLR and iHS analyses. The X chromosome exhibits a distinctive interaction between demographic factors and genetic variations, and these findings may provide new insight into the X-linked selection in different cattle breeds.
Collapse
Affiliation(s)
- Divya Rajawat
- Division of Animal Genetics, Indian Veterinary Research Institute, Izatnagar, Bareilly, UP 243122 India
| | - Manjit Panigrahi
- Division of Animal Genetics, Indian Veterinary Research Institute, Izatnagar, Bareilly, UP 243122 India
| | - Sonali Sonejita Nayak
- Division of Animal Genetics, Indian Veterinary Research Institute, Izatnagar, Bareilly, UP 243122 India
| | - Bharat Bhushan
- Division of Animal Genetics, Indian Veterinary Research Institute, Izatnagar, Bareilly, UP 243122 India
| | - B. P. Mishra
- ICAR-National Bureau of Animal Genetic Resources, Karnal, Karnal, India
| | - Triveni Dutt
- Livestock Production and Management Section, Indian Veterinary Research Institute, Izatnagar, Bareilly, UP 243122 India
| |
Collapse
|
19
|
Ravi V, Shamim U, Khan MA, Swaminathan A, Mishra P, Singh R, Bharali P, Chauhan NS, Pandey R. Unraveling the genetic evolution of SARS-CoV-2 Recombinants using mutational dynamics across the different lineages. Front Med (Lausanne) 2024; 10:1294699. [PMID: 38288302 PMCID: PMC10823376 DOI: 10.3389/fmed.2023.1294699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 12/18/2023] [Indexed: 01/31/2024] Open
Abstract
Introduction Recombination serves as a common strategy employed by RNA viruses for their genetic evolution. Extensive genomic surveillance during the COVID-19 pandemic has reported SARS-CoV-2 Recombinant strains indicating recombination events during the viral evolution. This study introspects the phenomenon of genome recombination by tracing the footprint of prominent lineages of SARS-CoV-2 at different time points in the context of on-going evolution and emergence of Recombinants. Method Whole genome sequencing was carried out for 2,516 SARS-CoV-2 (discovery cohort) and 1,126 (validation cohort) using nasopharyngeal samples collected between the time period of March 2020 to August 2022, as part of the genomic surveillance program. The sequences were classified according to the different lineages of SARS-CoV-2 prevailing in India at respective time points. Results Mutational diversity and abundance evaluation across the 12 lineages identified 58 Recombinant sequences as harboring the least number of mutations (n = 111), with 14 low-frequency unique mutations with major chunk of mutations coming from the BA.2. The spontaneously/dynamically increasing and decreasing trends of mutations highlight the loss of mutations in the Recombinants that were associated with the SARS-CoV-2 replication efficiency, infectivity, and disease severity, rendering them functionally with low infectivity and pathogenicity. Linkage disequilibrium (LD) analysis revealed that mutations comprising the LD blocks of BA.1, BA.2, and Recombinants were found as minor alleles or as low-frequency alleles in the LD blocks from the previous SARS-CoV-2 variant samples, especially Pre-VOC. Moreover, a dissipation in the size of LD blocks as well as LD decay along with a high negative regression coefficient (R squared) value was demonstrated in the Omicron and BA.1 and BA.2 lineages, which corroborated with the breakpoint analysis. Conclusion Together, the findings help to understand the evolution and emergence of Recombinants after the Omicron lineages, for sustenance and adaptability, to maintain the epidemic spread of SARS-CoV-2 in the host population already high in immunity levels.
Collapse
Affiliation(s)
- Varsha Ravi
- Division of Immunology and Infectious Disease Biology, INtegrative GENomics of HOst-PathogEn (INGEN-HOPE) Laboratory, CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi, India
| | - Uzma Shamim
- Division of Immunology and Infectious Disease Biology, INtegrative GENomics of HOst-PathogEn (INGEN-HOPE) Laboratory, CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi, India
| | - Md Abuzar Khan
- Division of Immunology and Infectious Disease Biology, INtegrative GENomics of HOst-PathogEn (INGEN-HOPE) Laboratory, CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi, India
| | - Aparna Swaminathan
- Division of Immunology and Infectious Disease Biology, INtegrative GENomics of HOst-PathogEn (INGEN-HOPE) Laboratory, CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi, India
| | - Pallavi Mishra
- Division of Immunology and Infectious Disease Biology, INtegrative GENomics of HOst-PathogEn (INGEN-HOPE) Laboratory, CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi, India
| | - Rajender Singh
- CSIR-Central Drug Research Institute, (CSIR-CDRI), Lucknow, Lucknow, India
| | - Pankaj Bharali
- CSIR-North East Institute of Science and Technology (CSIR-NEIST), Jorhat, Assam, India
| | - Nar Singh Chauhan
- Department of Biochemistry, Maharshi Dayanand University, Rohtak, India
| | - Rajesh Pandey
- Division of Immunology and Infectious Disease Biology, INtegrative GENomics of HOst-PathogEn (INGEN-HOPE) Laboratory, CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| |
Collapse
|
20
|
Taliadoros D, Feurtey A, Wyatt N, Barrès B, Gladieux P, Friesen TL, Stukenbrock EH. Emergence and spread of the barley net blotch pathogen coincided with crop domestication and cultivation history. PLoS Genet 2024; 20:e1010884. [PMID: 38285729 PMCID: PMC10852282 DOI: 10.1371/journal.pgen.1010884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 02/08/2024] [Accepted: 12/11/2023] [Indexed: 01/31/2024] Open
Abstract
Fungal pathogens cause devastating disease in crops. Understanding the evolutionary origin of pathogens is essential to the prediction of future disease emergence and the potential of pathogens to disperse. The fungus Pyrenophora teres f. teres causes net form net blotch (NFNB), an economically significant disease of barley. In this study, we have used 104 P. teres f. teres genomes from four continents to explore the population structure and demographic history of the fungal pathogen. We showed that P. teres f. teres is structured into populations that tend to be geographically restricted to different regions. Using Multiple Sequentially Markovian Coalescent and machine learning approaches we demonstrated that the demographic history of the pathogen correlates with the history of barley, highlighting the importance of human migration and trade in spreading the pathogen. Exploring signatures of natural selection, we identified several population-specific selective sweeps that colocalized with genomic regions enriched in putative virulence genes, and loci previously identified as determinants of virulence specificities by quantitative trait locus analyses. This reflects rapid adaptation to local hosts and environmental conditions of P. teres f. teres as it spread with barley. Our research highlights how human activities can contribute to the spread of pathogens that significantly impact the productivity of field crops.
Collapse
Affiliation(s)
- Demetris Taliadoros
- Environmental Genomics, Max Planck Institute for Evolutionary Biology, Plön, Germany
- Christian-Albrechts University of Kiel, Kiel, Germany
| | - Alice Feurtey
- Environmental Genomics, Max Planck Institute for Evolutionary Biology, Plön, Germany
- Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchâtel, Neuchâtel, Switzerland
- Plant Pathology, D-USYS, Zurich, Switzerland
| | - Nathan Wyatt
- Cereal Crops Research Unit, Edward T. Schafer Agricultural Research Center, USDA-ARS, Fargo, North Dakota, United States of America
- Sugar Beet and Potato Research Unit, Edward T. Schafer Agricultural Research Center, USDA-ARS, Fargo, North Dakota, United States of America
| | - Benoit Barrès
- Université de Lyon, Anses, INRAE, USC CASPER, Lyon, France
| | - Pierre Gladieux
- PHIM Plant Health Institute, Univ Montpellier, INRAE, CIRAD, Institut Agro, IRD, Montpellier, France
| | - Timothy L. Friesen
- Cereal Crops Research Unit, Edward T. Schafer Agricultural Research Center, USDA-ARS, Fargo, North Dakota, United States of America
| | - Eva H. Stukenbrock
- Environmental Genomics, Max Planck Institute for Evolutionary Biology, Plön, Germany
- Christian-Albrechts University of Kiel, Kiel, Germany
| |
Collapse
|
21
|
Schrider DR. Allelic gene conversion softens selective sweeps. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.05.570141. [PMID: 38106127 PMCID: PMC10723294 DOI: 10.1101/2023.12.05.570141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
The prominence of positive selection, in which beneficial mutations are favored by natural selection and rapidly increase in frequency, is a subject of intense debate. Positive selection can result in selective sweeps, in which the haplotype(s) bearing the adaptive allele "sweep" through the population, thereby removing much of the genetic diversity from the region surrounding the target of selection. Two models of selective sweeps have been proposed: classical sweeps, or "hard sweeps", in which a single copy of the adaptive allele sweeps to fixation, and "soft sweeps", in which multiple distinct copies of the adaptive allele leave descendants after the sweep. Soft sweeps can be the outcome of recurrent mutation to the adaptive allele, or the presence of standing genetic variation consisting of multiple copies of the adaptive allele prior to the onset of selection. Importantly, soft sweeps will be common when populations can rapidly adapt to novel selective pressures, either because of a high mutation rate or because adaptive alleles are already present. The prevalence of soft sweeps is especially controversial, and it has been noted that selection on standing variation or recurrent mutations may not always produce soft sweeps. Here, we show that the inverse is true: selection on single-origin de novo mutations may often result in an outcome that is indistinguishable from a soft sweep. This is made possible by allelic gene conversion, which "softens" hard sweeps by copying the adaptive allele onto multiple genetic backgrounds, a process we refer to as a "pseudo-soft" sweep. We carried out a simulation study examining the impact of gene conversion on sweeps from a single de novo variant in models of human, Drosophila, and Arabidopsis populations. The fraction of simulations in which gene conversion had produced multiple haplotypes with the adaptive allele upon fixation was appreciable. Indeed, under realistic demographic histories and gene conversion rates, even if selection always acts on a single-origin mutation, sweeps involving multiple haplotypes are more likely than hard sweeps in large populations, especially when selection is not extremely strong. Thus, even when the mutation rate is low or there is no standing variation, hard sweeps are expected to be the exception rather than the rule in large populations. These results also imply that the presence of signatures of soft sweeps does not necessarily mean that adaptation has been especially rapid or is not mutation limited.
Collapse
Affiliation(s)
- Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599
| |
Collapse
|
22
|
Panigrahi M, Rajawat D, Nayak SS, Ghildiyal K, Sharma A, Jain K, Lei C, Bhushan B, Mishra BP, Dutt T. Landmarks in the history of selective sweeps. Anim Genet 2023; 54:667-688. [PMID: 37710403 DOI: 10.1111/age.13355] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 08/28/2023] [Indexed: 09/16/2023]
Abstract
Half a century ago, a seminal article on the hitchhiking effect by Smith and Haigh inaugurated the concept of the selection signature. Selective sweeps are characterised by the rapid spread of an advantageous genetic variant through a population and hence play an important role in shaping evolution and research on genetic diversity. The process by which a beneficial allele arises and becomes fixed in a population, leading to a increase in the frequency of other linked alleles, is known as genetic hitchhiking or genetic draft. Kimura's neutral theory and hitchhiking theory are complementary, with Kimura's neutral evolution as the 'null model' and positive selection as the 'signal'. Both are widely accepted in evolution, especially with genomics enabling precise measurements. Significant advances in genomic technologies, such as next-generation sequencing, high-density SNP arrays and powerful bioinformatics tools, have made it possible to systematically investigate selection signatures in a variety of species. Although the history of selection signatures is relatively recent, progress has been made in the last two decades, owing to the increasing availability of large-scale genomic data and the development of computational methods. In this review, we embark on a journey through the history of research on selective sweeps, ranging from early theoretical work to recent empirical studies that utilise genomic data.
Collapse
Affiliation(s)
- Manjit Panigrahi
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Divya Rajawat
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | | | - Kanika Ghildiyal
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Anurodh Sharma
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Karan Jain
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Chuzhao Lei
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, China
| | - Bharat Bhushan
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Bishnu Prasad Mishra
- Division of Animal Biotechnology, ICAR-National Bureau of Animal Genetic Resources, Karnal, India
| | - Triveni Dutt
- Livestock Production and Management Section, Indian Veterinary Research Institute, Bareilly, India
| |
Collapse
|
23
|
Cecil RM, Sugden LA. On convolutional neural networks for selection inference: Revealing the effect of preprocessing on model learning and the capacity to discover novel patterns. PLoS Comput Biol 2023; 19:e1010979. [PMID: 38011281 PMCID: PMC10703409 DOI: 10.1371/journal.pcbi.1010979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 12/07/2023] [Accepted: 10/26/2023] [Indexed: 11/29/2023] Open
Abstract
A central challenge in population genetics is the detection of genomic footprints of selection. As machine learning tools including convolutional neural networks (CNNs) have become more sophisticated and applied more broadly, these provide a logical next step for increasing our power to learn and detect such patterns; indeed, CNNs trained on simulated genome sequences have recently been shown to be highly effective at this task. Unlike previous approaches, which rely upon human-crafted summary statistics, these methods are able to be applied directly to raw genomic data, allowing them to potentially learn new signatures that, if well-understood, could improve the current theory surrounding selective sweeps. Towards this end, we examine a representative CNN from the literature, paring it down to the minimal complexity needed to maintain comparable performance; this low-complexity CNN allows us to directly interpret the learned evolutionary signatures. We then validate these patterns in more complex models using metrics that evaluate feature importance. Our findings reveal that preprocessing steps, which determine how the population genetic data is presented to the model, play a central role in the learned prediction method. This results in models that mimic previously-defined summary statistics; in one case, the summary statistic itself achieves similarly high accuracy. For evolutionary processes that are less well understood than selective sweeps, we hope this provides an initial framework for using CNNs in ways that go beyond simply achieving high classification performance. Instead, we propose that CNNs might be useful as tools for learning novel patterns that can translate to easy-to-implement summary statistics available to a wider community of researchers.
Collapse
Affiliation(s)
- Ryan M. Cecil
- Department of Mathematics and Computer Science, Duquesne University, Pittsburgh, Pennsylvania, United States of America
- Department of Statistics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Lauren A. Sugden
- Department of Mathematics and Computer Science, Duquesne University, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
24
|
Amin MR, Hasan M, Arnab SP, DeGiorgio M. Tensor Decomposition-based Feature Extraction and Classification to Detect Natural Selection from Genomic Data. Mol Biol Evol 2023; 40:msad216. [PMID: 37772983 PMCID: PMC10581699 DOI: 10.1093/molbev/msad216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 08/10/2023] [Accepted: 09/14/2023] [Indexed: 09/30/2023] Open
Abstract
Inferences of adaptive events are important for learning about traits, such as human digestion of lactose after infancy and the rapid spread of viral variants. Early efforts toward identifying footprints of natural selection from genomic data involved development of summary statistic and likelihood methods. However, such techniques are grounded in simple patterns or theoretical models that limit the complexity of settings they can explore. Due to the renaissance in artificial intelligence, machine learning methods have taken center stage in recent efforts to detect natural selection, with strategies such as convolutional neural networks applied to images of haplotypes. Yet, limitations of such techniques include estimation of large numbers of model parameters under nonconvex settings and feature identification without regard to location within an image. An alternative approach is to use tensor decomposition to extract features from multidimensional data although preserving the latent structure of the data, and to feed these features to machine learning models. Here, we adopt this framework and present a novel approach termed T-REx, which extracts features from images of haplotypes across sampled individuals using tensor decomposition, and then makes predictions from these features using classical machine learning methods. As a proof of concept, we explore the performance of T-REx on simulated neutral and selective sweep scenarios and find that it has high power and accuracy to discriminate sweeps from neutrality, robustness to common technical hurdles, and easy visualization of feature importance. Therefore, T-REx is a powerful addition to the toolkit for detecting adaptive processes from genomic data.
Collapse
Affiliation(s)
- Md Ruhul Amin
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Mahmudul Hasan
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Sandipan Paul Arnab
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Michael DeGiorgio
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| |
Collapse
|
25
|
Soni V, Johri P, Jensen JD. Evaluating power to detect recurrent selective sweeps under increasingly realistic evolutionary null models. Evolution 2023; 77:2113-2127. [PMID: 37395482 PMCID: PMC10547124 DOI: 10.1093/evolut/qpad120] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 06/15/2023] [Accepted: 06/30/2023] [Indexed: 07/04/2023]
Abstract
The detection of selective sweeps from population genomic data often relies on the premise that the beneficial mutations in question have fixed very near the sampling time. As it has been previously shown that the power to detect a selective sweep is strongly dependent on the time since fixation as well as the strength of selection, it is naturally the case that strong, recent sweeps leave the strongest signatures. However, the biological reality is that beneficial mutations enter populations at a rate, one that partially determines the mean wait time between sweep events and hence their age distribution. An important question thus remains about the power to detect recurrent selective sweeps when they are modeled by a realistic mutation rate and as part of a realistic distribution of fitness effects, as opposed to a single, recent, isolated event on a purely neutral background as is more commonly modeled. Here we use forward-in-time simulations to study the performance of commonly used sweep statistics, within the context of more realistic evolutionary baseline models incorporating purifying and background selection, population size change, and mutation and recombination rate heterogeneity. Results demonstrate the important interplay of these processes, necessitating caution when interpreting selection scans; specifically, false-positive rates are in excess of true-positive across much of the evaluated parameter space, and selective sweeps are often undetectable unless the strength of selection is exceptionally strong.
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| |
Collapse
|
26
|
Whitehouse LS, Schrider DR. Timesweeper: accurately identifying selective sweeps using population genomic time series. Genetics 2023; 224:iyad084. [PMID: 37157914 PMCID: PMC10324941 DOI: 10.1093/genetics/iyad084] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 07/25/2022] [Accepted: 04/25/2023] [Indexed: 05/10/2023] Open
Abstract
Despite decades of research, identifying selective sweeps, the genomic footprints of positive selection, remains a core problem in population genetics. Of the myriad methods that have been developed to tackle this task, few are designed to leverage the potential of genomic time-series data. This is because in most population genetic studies of natural populations, only a single period of time can be sampled. Recent advancements in sequencing technology, including improvements in extracting and sequencing ancient DNA, have made repeated samplings of a population possible, allowing for more direct analysis of recent evolutionary dynamics. Serial sampling of organisms with shorter generation times has also become more feasible due to improvements in the cost and throughput of sequencing. With these advances in mind, here we present Timesweeper, a fast and accurate convolutional neural network-based tool for identifying selective sweeps in data consisting of multiple genomic samplings of a population over time. Timesweeper analyzes population genomic time-series data by first simulating training data under a demographic model appropriate for the data of interest, training a one-dimensional convolutional neural network on said simulations, and inferring which polymorphisms in this serialized data set were the direct target of a completed or ongoing selective sweep. We show that Timesweeper is accurate under multiple simulated demographic and sampling scenarios, identifies selected variants with high resolution, and estimates selection coefficients more accurately than existing methods. In sum, we show that more accurate inferences about natural selection are possible when genomic time-series data are available; such data will continue to proliferate in coming years due to both the sequencing of ancient samples and repeated samplings of extant populations with faster generation times, as well as experimentally evolved populations where time-series data are often generated. Methodological advances such as Timesweeper thus have the potential to help resolve the controversy over the role of positive selection in the genome. We provide Timesweeper as a Python package for use by the community.
Collapse
Affiliation(s)
- Logan S Whitehouse
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27514, USA
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27514, USA
| |
Collapse
|
27
|
Zhao H, Souilljee M, Pavlidis P, Alachiotis N. Genome-wide scans for selective sweeps using convolutional neural networks. Bioinformatics 2023; 39:i194-i203. [PMID: 37387128 DOI: 10.1093/bioinformatics/btad265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Recent methods for selective sweep detection cast the problem as a classification task and use summary statistics as features to capture region characteristics that are indicative of a selective sweep, thereby being sensitive to confounding factors. Furthermore, they are not designed to perform whole-genome scans or to estimate the extent of the genomic region that was affected by positive selection; both are required for identifying candidate genes and the time and strength of selection. RESULTS We present ASDEC (https://github.com/pephco/ASDEC), a neural-network-based framework that can scan whole genomes for selective sweeps. ASDEC achieves similar classification performance to other convolutional neural network-based classifiers that rely on summary statistics, but it is trained 10× faster and classifies genomic regions 5× faster by inferring region characteristics from the raw sequence data directly. Deploying ASDEC for genomic scans achieved up to 15.2× higher sensitivity, 19.4× higher success rates, and 4× higher detection accuracy than state-of-the-art methods. We used ASDEC to scan human chromosome 1 of the Yoruba population (1000Genomes project), identifying nine known candidate genes.
Collapse
Affiliation(s)
- Hanqing Zhao
- Faculty of EEMCS, University of Twente, Enschede, The Netherlands
| | | | - Pavlos Pavlidis
- Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, Greece
| | | |
Collapse
|
28
|
Soni V, Johri P, Jensen JD. Evaluating power to detect recurrent selective sweeps under increasingly realistic evolutionary null models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.15.545166. [PMID: 37398347 PMCID: PMC10312679 DOI: 10.1101/2023.06.15.545166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
The detection of selective sweeps from population genomic data often relies on the premise that the beneficial mutations in question have fixed very near the sampling time. As it has been previously shown that the power to detect a selective sweep is strongly dependent on the time since fixation as well as the strength of selection, it is naturally the case that strong, recent sweeps leave the strongest signatures. However, the biological reality is that beneficial mutations enter populations at a rate, one that partially determines the mean wait time between sweep events and hence their age distribution. An important question thus remains about the power to detect recurrent selective sweeps when they are modelled by a realistic mutation rate and as part of a realistic distribution of fitness effects (DFE), as opposed to a single, recent, isolated event on a purely neutral background as is more commonly modelled. Here we use forward-in-time simulations to study the performance of commonly used sweep statistics, within the context of more realistic evolutionary baseline models incorporating purifying and background selection, population size change, and mutation and recombination rate heterogeneity. Results demonstrate the important interplay of these processes, necessitating caution when interpreting selection scans; specifically, false positive rates are in excess of true positive across much of the evaluated parameter space, and selective sweeps are often undetectable unless the strength of selection is exceptionally strong. Teaser Text Outlier-based genomic scans have proven a popular approach for identifying loci that have potentially experienced recent positive selection. However, it has previously been shown that an evolutionarily appropriate baseline model that incorporates non-equilibrium population histories, purifying and background selection, and variation in mutation and recombination rates is necessary to reduce often extreme false positive rates when performing genomic scans. Here we evaluate the power to detect recurrent selective sweeps using common SFS-based and haplotype-based methods under these increasingly realistic models. We find that while these appropriate evolutionary baselines are essential to reduce false positive rates, the power to accurately detect recurrent selective sweeps is generally low across much of the biologically relevant parameter space.
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
- Present address: Department of Biology, Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | | |
Collapse
|
29
|
Booker WW, Ray DD, Schrider DR. This population does not exist: learning the distribution of evolutionary histories with generative adversarial networks. Genetics 2023; 224:iyad063. [PMID: 37067864 PMCID: PMC10213497 DOI: 10.1093/genetics/iyad063] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 02/23/2023] [Accepted: 04/05/2023] [Indexed: 04/18/2023] Open
Abstract
Numerous studies over the last decade have demonstrated the utility of machine learning methods when applied to population genetic tasks. More recent studies show the potential of deep-learning methods in particular, which allow researchers to approach problems without making prior assumptions about how the data should be summarized or manipulated, instead learning their own internal representation of the data in an attempt to maximize inferential accuracy. One type of deep neural network, called Generative Adversarial Networks (GANs), can even be used to generate new data, and this approach has been used to create individual artificial human genomes free from privacy concerns. In this study, we further explore the application of GANs in population genetics by designing and training a network to learn the statistical distribution of population genetic alignments (i.e. data sets consisting of sequences from an entire population sample) under several diverse evolutionary histories-the first GAN capable of performing this task. After testing multiple different neural network architectures, we report the results of a fully differentiable Deep-Convolutional Wasserstein GAN with gradient penalty that is capable of generating artificial examples of population genetic alignments that successfully mimic key aspects of the training data, including the site-frequency spectrum, differentiation between populations, and patterns of linkage disequilibrium. We demonstrate consistent training success across various evolutionary models, including models of panmictic and subdivided populations, populations at equilibrium and experiencing changes in size, and populations experiencing either no selection or positive selection of various strengths, all without the need for extensive hyperparameter tuning. Overall, our findings highlight the ability of GANs to learn and mimic population genetic data and suggest future areas where this work can be applied in population genetics research that we discuss herein.
Collapse
Affiliation(s)
- William W Booker
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27514-2916, USA
| | - Dylan D Ray
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27514-2916, USA
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27514-2916, USA
| |
Collapse
|
30
|
Love RR, Sikder JR, Vivero RJ, Matute DR, Schrider DR. Strong Positive Selection in Aedes aegypti and the Rapid Evolution of Insecticide Resistance. Mol Biol Evol 2023; 40:msad072. [PMID: 36971242 PMCID: PMC10118305 DOI: 10.1093/molbev/msad072] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 02/13/2023] [Accepted: 03/23/2023] [Indexed: 03/29/2023] Open
Abstract
Aedes aegypti vectors the pathogens that cause dengue, yellow fever, Zika virus, and chikungunya and is a serious threat to public health in tropical regions. Decades of work has illuminated many aspects of Ae. aegypti's biology and global population structure and has identified insecticide resistance genes; however, the size and repetitive nature of the Ae. aegypti genome have limited our ability to detect positive selection in this mosquito. Combining new whole genome sequences from Colombia with publicly available data from Africa and the Americas, we identify multiple strong candidate selective sweeps in Ae. aegypti, many of which overlap genes linked to or implicated in insecticide resistance. We examine the voltage-gated sodium channel gene in three American cohorts and find evidence for successive selective sweeps in Colombia. The most recent sweep encompasses an intermediate-frequency haplotype containing four candidate insecticide resistance mutations that are in near-perfect linkage disequilibrium with one another in the Colombian sample. We hypothesize that this haplotype may continue to rapidly increase in frequency and perhaps spread geographically in the coming years. These results extend our knowledge of how insecticide resistance has evolved in this species and add to a growing body of evidence suggesting that Ae. aegypti has an extensive genomic capacity to rapidly adapt to insecticide-based vector control.
Collapse
Affiliation(s)
- R Rebecca Love
- Department of Genetics, School of Medicine, University of North Carolina, Chapel Hill, NCUSA
| | - Josh R Sikder
- Department of Genetics, School of Medicine, University of North Carolina, Chapel Hill, NCUSA
| | - Rafael J Vivero
- Programa de Estudio y Control de Enfermedades Tropicales, PECET, Universidad de Antioquia, Chapel Hill, NCColombia
| | - Daniel R Matute
- Department of Biology, College of Arts and Sciences, University of North Carolina, Chapel Hill, NC, USA
| | - Daniel R Schrider
- Department of Genetics, School of Medicine, University of North Carolina, Chapel Hill, NCUSA
| |
Collapse
|
31
|
Hamid I, Korunes KL, Schrider DR, Goldberg A. Localizing Post-Admixture Adaptive Variants with Object Detection on Ancestry-Painted Chromosomes. Mol Biol Evol 2023; 40:msad074. [PMID: 36947126 PMCID: PMC10116606 DOI: 10.1093/molbev/msad074] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2022] [Revised: 03/14/2023] [Accepted: 03/20/2023] [Indexed: 03/23/2023] Open
Abstract
Gene flow between previously differentiated populations during the founding of an admixed or hybrid population has the potential to introduce adaptive alleles into the new population. If the adaptive allele is common in one source population, but not the other, then as the adaptive allele rises in frequency in the admixed population, genetic ancestry from the source containing the adaptive allele will increase nearby as well. Patterns of genetic ancestry have therefore been used to identify post-admixture positive selection in humans and other animals, including examples in immunity, metabolism, and animal coloration. A common method identifies regions of the genome that have local ancestry "outliers" compared with the distribution across the rest of the genome, considering each locus independently. However, we lack theoretical models for expected distributions of ancestry under various demographic scenarios, resulting in potential false positives and false negatives. Further, ancestry patterns between distant sites are often not independent. As a result, current methods tend to infer wide genomic regions containing many genes as under selection, limiting biological interpretation. Instead, we develop a deep learning object detection method applied to images generated from local ancestry-painted genomes. This approach preserves information from the surrounding genomic context and avoids potential pitfalls of user-defined summary statistics. We find the method is robust to a variety of demographic misspecifications using simulated data. Applied to human genotype data from Cabo Verde, we localize a known adaptive locus to a single narrow region compared with multiple or long windows obtained using two other ancestry-based methods.
Collapse
Affiliation(s)
- Iman Hamid
- Department of Evolutionary Anthropology, Duke University, Durham, NC
| | | | - Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC
| | - Amy Goldberg
- Department of Evolutionary Anthropology, Duke University, Durham, NC
| |
Collapse
|
32
|
Amin MR, Hasan M, Arnab SP, DeGiorgio M. Tensor decomposition based feature extraction and classification to detect natural selection from genomic data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.27.527731. [PMID: 37034767 PMCID: PMC10081272 DOI: 10.1101/2023.03.27.527731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Inferences of adaptive events are important for learning about traits, such as human digestion of lactose after infancy and the rapid spread of viral variants. Early efforts toward identifying footprints of natural selection from genomic data involved development of summary statistic and likelihood methods. However, such techniques are grounded in simple patterns or theoretical models that limit the complexity of settings they can explore. Due to the renaissance in artificial intelligence, machine learning methods have taken center stage in recent efforts to detect natural selection, with strategies such as convolutional neural networks applied to images of haplotypes. Yet, limitations of such techniques include estimation of large numbers of model parameters under non-convex settings and feature identification without regard to location within an image. An alternative approach is to use tensor decomposition to extract features from multidimensional data while preserving the latent structure of the data, and to feed these features to machine learning models. Here, we adopt this framework and present a novel approach termed T-REx , which extracts features from images of haplotypes across sampled individuals using tensor decomposition, and then makes predictions from these features using classical machine learning methods. As a proof of concept, we explore the performance of T-REx on simulated neutral and selective sweep scenarios and find that it has high power and accuracy to discriminate sweeps from neutrality, robustness to common technical hurdles, and easy visualization of feature importance. Therefore, T-REx is a powerful addition to the toolkit for detecting adaptive processes from genomic data.
Collapse
|
33
|
The use of evolutionary analyses to predict functionally relevant traits in filamentous plant pathogens. Curr Opin Microbiol 2023; 73:102244. [PMID: 36889024 DOI: 10.1016/j.mib.2022.102244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 10/27/2022] [Accepted: 11/03/2022] [Indexed: 03/08/2023]
Abstract
Identifying traits involved in plant-pathogen interactions is one of the major objectives in molecular plant pathology. Evolutionary analyses may assist in the identification of genes encoding traits that are involved in virulence and local adaptation, including adaptation to agricultural intervention strategies. In the past decades, the number of available genome sequences of fungal plant pathogens has rapidly increased, providing a rich source for the discovery of functionally important genes as well as inference of species histories. Positive selection in the form of diversifying or directional selection leaves particular signatures in genome alignments and can be identified with statistical genetics methods. This review summarises the concepts and approaches used in evolutionary genomics and lists major discoveries related to plant-pathogen adaptative evolution. We underline the significant contribution of evolutionary genomics in discovering virulence-related traits and the study of plant-pathogen ecology and adaptive evolution.
Collapse
|
34
|
Árnason E, Koskela J, Halldórsdóttir K, Eldon B. Sweepstakes reproductive success via pervasive and recurrent selective sweeps. eLife 2023; 12:80781. [PMID: 36806325 PMCID: PMC9940914 DOI: 10.7554/elife.80781] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Accepted: 12/28/2022] [Indexed: 02/22/2023] Open
Abstract
Highly fecund natural populations characterized by high early mortality abound, yet our knowledge about their recruitment dynamics is somewhat rudimentary. This knowledge gap has implications for our understanding of genetic variation, population connectivity, local adaptation, and the resilience of highly fecund populations. The concept of sweepstakes reproductive success, which posits a considerable variance and skew in individual reproductive output, is key to understanding the distribution of individual reproductive success. However, it still needs to be determined whether highly fecund organisms reproduce through sweepstakes and, if they do, the relative roles of neutral and selective sweepstakes. Here, we use coalescent-based statistical analysis of population genomic data to show that selective sweepstakes likely explain recruitment dynamics in the highly fecund Atlantic cod. We show that the Kingman coalescent (modelling no sweepstakes) and the Xi-Beta coalescent (modelling random sweepstakes), including complex demography and background selection, do not provide an adequate fit for the data. The Durrett-Schweinsberg coalescent, in which selective sweepstakes result from recurrent and pervasive selective sweeps of new mutations, offers greater explanatory power. Our results show that models of sweepstakes reproduction and multiple-merger coalescents are relevant and necessary for understanding genetic diversity in highly fecund natural populations. These findings have fundamental implications for understanding the recruitment variation of fish stocks and general evolutionary genomics of high-fecundity organisms.
Collapse
Affiliation(s)
- Einar Árnason
- Institute of Life- and environmental Sciences, University of IcelandReykjavikIceland,Department of Organismal and Evolutionary Biology, Harvard UniversityCambridgeUnited States
| | - Jere Koskela
- Department of Statistics, University of WarwickCoventryUnited Kingdom
| | - Katrín Halldórsdóttir
- Institute of Life- and environmental Sciences, University of IcelandReykjavikIceland
| | - Bjarki Eldon
- Leibniz Institute for Evolution and Biodiversity Science, Museum für NaturkundeBerlinGermany
| |
Collapse
|
35
|
Kreiner JM, Latorre SM, Burbano HA, Stinchcombe JR, Otto SP, Weigel D, Wright SI. Rapid weed adaptation and range expansion in response to agriculture over the past two centuries. Science 2022; 378:1079-1085. [PMID: 36480621 DOI: 10.1126/science.abo7293] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
North America has experienced a massive increase in cropland use since 1800, accompanied more recently by the intensification of agricultural practices. Through genome analysis of present-day and historical samples spanning environments over the past two centuries, we studied the effect of these changes in farming on the extent and tempo of evolution across the native range of the common waterhemp (Amaranthus tuberculatus), a now pervasive agricultural weed. Modern agriculture has imposed strengths of selection rarely observed in the wild, with notable shifts in allele frequency trajectories since agricultural intensification in the 1960s. An evolutionary response to this extreme selection was facilitated by a concurrent human-mediated range shift. By reshaping genome-wide diversity across the landscape, agriculture has driven the success of this weed in the 21st century.
Collapse
Affiliation(s)
- Julia M Kreiner
- Department of Botany, University of British Columbia, Vancouver, BC, Canada.,Biodiversity Research Centre, University of British Columbia, Vancouver, BC, Canada
| | - Sergio M Latorre
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London, UK.,Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
| | - Hernán A Burbano
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London, UK.,Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
| | - John R Stinchcombe
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, ON, Canada
| | - Sarah P Otto
- Biodiversity Research Centre, University of British Columbia, Vancouver, BC, Canada.,Department of Zoology, University of British Columbia, Vancouver, BC, Canada
| | - Detlef Weigel
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
| | - Stephen I Wright
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
36
|
Sun Q, Yang Y, Rosen JD, Jiang MZ, Chen J, Liu W, Wen J, Raffield LM, Pace RG, Zhou YH, Wright FA, Blackman SM, Bamshad MJ, Gibson RL, Cutting GR, Knowles MR, Schrider DR, Fuchsberger C, Li Y. MagicalRsq: Machine-learning-based genotype imputation quality calibration. Am J Hum Genet 2022; 109:1986-1997. [PMID: 36198314 PMCID: PMC9674945 DOI: 10.1016/j.ajhg.2022.09.009] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Accepted: 09/16/2022] [Indexed: 01/26/2023] Open
Abstract
Whole-genome sequencing (WGS) is the gold standard for fully characterizing genetic variation but is still prohibitively expensive for large samples. To reduce costs, many studies sequence only a subset of individuals or genomic regions, and genotype imputation is used to infer genotypes for the remaining individuals or regions without sequencing data. However, not all variants can be well imputed, and the current state-of-the-art imputation quality metric, denoted as standard Rsq, is poorly calibrated for lower-frequency variants. Here, we propose MagicalRsq, a machine-learning-based method that integrates variant-level imputation and population genetics statistics, to provide a better calibrated imputation quality metric. Leveraging WGS data from the Cystic Fibrosis Genome Project (CFGP), and whole-exome sequence data from UK BioBank (UKB), we performed comprehensive experiments to evaluate the performance of MagicalRsq compared to standard Rsq for partially sequenced studies. We found that MagicalRsq aligns better with true R2 than standard Rsq in almost every situation evaluated, for both European and African ancestry samples. For example, when applying models trained from 1,992 CFGP sequenced samples to an independent 3,103 samples with no sequencing but TOPMed imputation from array genotypes, MagicalRsq, compared to standard Rsq, achieved net gains of 1.4 million rare, 117k low-frequency, and 18k common variants, where net gains were gained numbers of correctly distinguished variants by MagicalRsq over standard Rsq. MagicalRsq can serve as an improved post-imputation quality metric and will benefit downstream analysis by better distinguishing well-imputed variants from those poorly imputed. MagicalRsq is freely available on GitHub.
Collapse
Affiliation(s)
- Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yingxi Yang
- Department of Statistics and Data Science, Yale University, New Haven, CT 06520, USA
| | - Jonathan D Rosen
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Min-Zhi Jiang
- Department of Applied Physical Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jiawen Chen
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Weifang Liu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jia Wen
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Laura M Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Rhonda G Pace
- Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yi-Hui Zhou
- Department of Biological Sciences, North Carolina State University, Raleigh, NC 27695, USA
| | - Fred A Wright
- Department of Biological Sciences, North Carolina State University, Raleigh, NC 27695, USA; Bioinformatics Research Center and Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
| | - Scott M Blackman
- Division of Pediatric Endocrinology, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Michael J Bamshad
- Department of Pediatrics, University of Washington, Seattle, WA 98105, USA; Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Ronald L Gibson
- Department of Pediatrics, University of Washington, Seattle, WA 98105, USA
| | - Garry R Cutting
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Michael R Knowles
- Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Christian Fuchsberger
- Institute for Biomedicine, Eurac Research (affiliated with the University of Lübeck), Bolzano, Italy.
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
| |
Collapse
|
37
|
Park E, Poulin R. Extremely divergent COI sequences within an amphipod species complex: A possible role for endosymbionts? Ecol Evol 2022; 12:e9448. [PMID: 36311398 PMCID: PMC9609454 DOI: 10.1002/ece3.9448] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 10/04/2022] [Accepted: 10/05/2022] [Indexed: 11/10/2022] Open
Abstract
Some heritable endosymbionts can affect host mtDNA evolution in various ways. Amphipods host diverse endosymbionts, but whether their mtDNA has been influenced by these endosymbionts has yet to be considered. Here, we investigated the role of endosymbionts (microsporidians and Rickettsia) in explaining highly divergent COI sequences in Paracalliope fluviatilis species complex, the most common freshwater amphipods in New Zealand. We first contrasted phylogeographic patterns using COI, ITS, and 28S sequences. While molecular species delimitation methods based on 28S sequences supported 3-4 potential species (N, C, SA, and SB) among freshwater lineages, COI sequences supported 17-27 putative species reflecting high inter-population divergence. The deep divergence between NC and S lineages (~20%; 28S) and the substitution saturation on the 3rd codon position of COI detected even within one lineage (SA) indicate a very high level of morphological stasis. Interestingly, individuals infected and uninfected by Rickettsia comprised divergent COI lineages in one of four populations tested, suggesting a potential influence of endosymbionts in mtDNA patterns. We propose several plausible explanations for divergent COI lineages, although they would need further testing with multiple lines of evidence. Lastly, due to common morphological stasis and the presence of endosymbionts, phylogeographic patterns of amphipods based on mtDNA should be interpreted with caution.
Collapse
Affiliation(s)
- Eunji Park
- Department of ZoologyUniversity of OtagoDunedinNew Zealand
- Department of BotanyUniversity of British ColumbiaVancouverBritish ColumbiaCanada
| | - Robert Poulin
- Department of ZoologyUniversity of OtagoDunedinNew Zealand
| |
Collapse
|
38
|
An adaptive teosinte mexicana introgression modulates phosphatidylcholine levels and is associated with maize flowering time. Proc Natl Acad Sci U S A 2022; 119:e2100036119. [PMID: 35771940 PMCID: PMC9271162 DOI: 10.1073/pnas.2100036119] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Native Americans domesticated maize (Zea mays ssp. mays) from lowland teosinte parviglumis (Zea mays ssp. parviglumis) in the warm Mexican southwest and brought it to the highlands of Mexico and South America where it was exposed to lower temperatures that imposed strong selection on flowering time. Phospholipids are important metabolites in plant responses to low-temperature and phosphorus availability and have been suggested to influence flowering time. Here, we combined linkage mapping with genome scans to identify High PhosphatidylCholine 1 (HPC1), a gene that encodes a phospholipase A1 enzyme, as a major driver of phospholipid variation in highland maize. Common garden experiments demonstrated strong genotype-by-environment interactions associated with variation at HPC1, with the highland HPC1 allele leading to higher fitness in highlands, possibly by hastening flowering. The highland maize HPC1 variant resulted in impaired function of the encoded protein due to a polymorphism in a highly conserved sequence. A meta-analysis across HPC1 orthologs indicated a strong association between the identity of the amino acid at this position and optimal growth in prokaryotes. Mutagenesis of HPC1 via genome editing validated its role in regulating phospholipid metabolism. Finally, we showed that the highland HPC1 allele entered cultivated maize by introgression from the wild highland teosinte Zea mays ssp. mexicana and has been maintained in maize breeding lines from the Northern United States, Canada, and Europe. Thus, HPC1 introgressed from teosinte mexicana underlies a large metabolic QTL that modulates phosphatidylcholine levels and has an adaptive effect at least in part via induction of early flowering time.
Collapse
|
39
|
Kumar H, Panigrahi M, Panwar A, Rajawat D, Nayak SS, Saravanan KA, Kaisa K, Parida S, Bhushan B, Dutt T. Machine-Learning Prospects for Detecting Selection Signatures Using Population Genomics Data. J Comput Biol 2022; 29:943-960. [PMID: 35639362 DOI: 10.1089/cmb.2021.0447] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Natural selection has been given a lot of attention because it relates to the adaptation of populations to their environments, both biotic and abiotic. An allele is selected when it is favored by natural selection. Consequently, the favored allele increases in frequency in the population and neighboring linked variation diminishes, causing so-called selective sweeps. A high-throughput genomic sequence allows one to disentangle the evolutionary forces at play in populations. With the development of high-throughput genome sequencing technologies, it has become easier to detect these selective sweeps/selection signatures. Various methods can be used to detect selective sweeps, from simple implementations using summary statistics to complex statistical approaches. One of the important problems of these statistical models is the potential to provide inaccurate results when their assumptions are violated. The use of machine learning (ML) in population genetics has been introduced as an alternative method of detecting selection by treating the problem of detecting selection signatures as a classification problem. Since the availability of population genomics data is increasing, researchers may incorporate ML into these statistical models to infer signatures of selection with higher predictive accuracy and better resolution. This article describes how ML can be used to aid in detecting and studying natural selection patterns using population genomic data.
Collapse
Affiliation(s)
- Harshit Kumar
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Manjit Panigrahi
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Anuradha Panwar
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Divya Rajawat
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Sonali Sonejita Nayak
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - K A Saravanan
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Kaiho Kaisa
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Subhashree Parida
- Divisions of Pharmacology and Toxicology, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Bharat Bhushan
- Divisions of Animal Genetics, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| | - Triveni Dutt
- Livestock Production and Management Section, ICAR-Indian Veterinary Research Institute, Izatnagar, India
| |
Collapse
|
40
|
Johri P, Aquadro CF, Beaumont M, Charlesworth B, Excoffier L, Eyre-Walker A, Keightley PD, Lynch M, McVean G, Payseur BA, Pfeifer SP, Stephan W, Jensen JD. Recommendations for improving statistical inference in population genomics. PLoS Biol 2022; 20:e3001669. [PMID: 35639797 PMCID: PMC9154105 DOI: 10.1371/journal.pbio.3001669] [Citation(s) in RCA: 72] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
The field of population genomics has grown rapidly in response to the recent advent of affordable, large-scale sequencing technologies. As opposed to the situation during the majority of the 20th century, in which the development of theoretical and statistical population genetic insights outpaced the generation of data to which they could be applied, genomic data are now being produced at a far greater rate than they can be meaningfully analyzed and interpreted. With this wealth of data has come a tendency to focus on fitting specific (and often rather idiosyncratic) models to data, at the expense of a careful exploration of the range of possible underlying evolutionary processes. For example, the approach of directly investigating models of adaptive evolution in each newly sequenced population or species often neglects the fact that a thorough characterization of ubiquitous nonadaptive processes is a prerequisite for accurate inference. We here describe the perils of these tendencies, present our consensus views on current best practices in population genomic data analysis, and highlight areas of statistical inference and theory that are in need of further attention. Thereby, we argue for the importance of defining a biologically relevant baseline model tuned to the details of each new analysis, of skepticism and scrutiny in interpreting model fitting results, and of carefully defining addressable hypotheses and underlying uncertainties.
Collapse
Affiliation(s)
- Parul Johri
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | - Charles F. Aquadro
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, United States of America
| | - Mark Beaumont
- School of Biological Sciences, University of Bristol, Bristol, United Kingdom
| | - Brian Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Laurent Excoffier
- Institute of Ecology and Evolution, University of Berne, Berne, Switzerland
| | - Adam Eyre-Walker
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Peter D. Keightley
- Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Michael Lynch
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | - Gil McVean
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Bret A. Payseur
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Susanne P. Pfeifer
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | | | - Jeffrey D. Jensen
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| |
Collapse
|
41
|
Friedlander E, Steinrücken M. A numerical framework for genetic hitchhiking in populations of variable size. Genetics 2022; 220:6526396. [PMID: 35143667 PMCID: PMC8893261 DOI: 10.1093/genetics/iyac012] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 12/27/2021] [Indexed: 11/13/2022] Open
Abstract
Natural selection on beneficial or deleterious alleles results in an increase or decrease, respectively, of their frequency within the population. Due to chromosomal linkage, the dynamics of the selected site affect the genetic variation at nearby neutral loci in a process commonly referred to as genetic hitchhiking. Changes in population size, however, can yield patterns in genomic data that mimic the effects of selection. Accurately modeling these dynamics is thus crucial to understanding how selection and past population size changes impact observed patterns of genetic variation. Here, we model the evolution of haplotype frequencies with the Wright-Fisher diffusion to study the impact of selection on linked neutral variation. Explicit solutions are not known for the dynamics of this diffusion when selection and recombination act simultaneously. Thus, we present a method for numerically evaluating the Wright-Fisher diffusion dynamics of 2 linked loci separated by a certain recombination distance when selection is acting. We can account for arbitrary population size histories explicitly using this approach. A key step in the method is to express the moments of the associated transition density, or sampling probabilities, as solutions to ordinary differential equations. Numerically solving these differential equations relies on a novel accurate and numerically efficient technique to estimate higher order moments from lower order moments. We demonstrate how this numerical framework can be used to quantify the reduction and recovery of genetic diversity around a selected locus over time and elucidate distortions in the site-frequency-spectra of neutral variation linked to loci under selection in various demographic settings. The method can be readily extended to more general modes of selection and applied in likelihood frameworks to detect loci under selection and infer the strength of the selective pressure.
Collapse
Affiliation(s)
- Eric Friedlander
- Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637, USA,Department of Mathematics, Saint Norbert College, Green Bay, WI 54115, USA
| | - Matthias Steinrücken
- Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637, USA,Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA,Corresponding author: Department of Ecology & Evolution, The University of Chicago, 1101 E. 57th Street, Chicago, IL 60637, USA.
| |
Collapse
|
42
|
Dilber E, Terhorst J. Robust detection of natural selection using a probabilistic model of tree imbalance. Genetics 2022; 220:6511494. [PMID: 35100408 PMCID: PMC8893258 DOI: 10.1093/genetics/iyac009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Accepted: 12/16/2021] [Indexed: 01/21/2023] Open
Abstract
Neutrality tests such as Tajima's D and Fay and Wu's H are standard implements in the population genetics toolbox. One of their most common uses is to scan the genome for signals of natural selection. However, it is well understood that D and H are confounded by other evolutionary forces-in particular, population expansion-that may be unrelated to selection. Because they are not model-based, it is not clear how to deconfound these tests in a principled way. In this article, we derive new likelihood-based methods for detecting natural selection, which are robust to fluctuations in effective population size. At the core of our method is a novel probabilistic model of tree imbalance, which generalizes Kingman's coalescent to allow certain aberrant tree topologies to arise more frequently than is expected under neutrality. We derive a frequency spectrum-based estimator that can be used in place of D, and also extend to the case where genealogies are first estimated. We benchmark our methods on real and simulated data, and provide an open source software implementation.
Collapse
Affiliation(s)
- Enes Dilber
- Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jonathan Terhorst
- Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA,Corresponding author: Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
43
|
Good BH. Linkage disequilibrium between rare mutations. Genetics 2022; 220:6503502. [PMID: 35100407 PMCID: PMC8982034 DOI: 10.1093/genetics/iyac004] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 12/21/2021] [Indexed: 01/13/2023] Open
Abstract
The statistical associations between mutations, collectively known as linkage disequilibrium, encode important information about the evolutionary forces acting within a population. Yet in contrast to single-site analogues like the site frequency spectrum, our theoretical understanding of linkage disequilibrium remains limited. In particular, little is currently known about how mutations with different ages and fitness costs contribute to expected patterns of linkage disequilibrium, even in simple settings where recombination and genetic drift are the major evolutionary forces. Here, I introduce a forward-time framework for predicting linkage disequilibrium between pairs of neutral and deleterious mutations as a function of their present-day frequencies. I show that the dynamics of linkage disequilibrium become much simpler in the limit that mutations are rare, where they admit a simple heuristic picture based on the trajectories of the underlying lineages. I use this approach to derive analytical expressions for a family of frequency-weighted linkage disequilibrium statistics as a function of the recombination rate, the frequency scale, and the additive and epistatic fitness costs of the mutations. I find that the frequency scale can have a dramatic impact on the shapes of the resulting linkage disequilibrium curves, reflecting the broad range of time scales over which these correlations arise. I also show that the differences between neutral and deleterious linkage disequilibrium are not purely driven by differences in their mutation frequencies and can instead display qualitative features that are reminiscent of epistasis. I conclude by discussing the implications of these results for recent linkage disequilibrium measurements in bacteria. This forward-time approach may provide a useful framework for predicting linkage disequilibrium across a range of evolutionary scenarios.
Collapse
Affiliation(s)
- Benjamin H Good
- Department of Applied Physics, Stanford University, Stanford, CA 94305, USA,Corresponding author: Department of Applied Physics, Stanford University, Clark Center, 318 Campus Drive, Stanford, CA 94305, USA.
| |
Collapse
|
44
|
Cheng JY, Stern AJ, Racimo F, Nielsen R. Detecting Selection in Multiple Populations by Modeling Ancestral Admixture Components. Mol Biol Evol 2022; 39:msab294. [PMID: 34626111 PMCID: PMC8763095 DOI: 10.1093/molbev/msab294] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
One of the most powerful and commonly used approaches for detecting local adaptation in the genome is the identification of extreme allele frequency differences between populations. In this article, we present a new maximum likelihood method for finding regions under positive selection. It is based on a Gaussian approximation to allele frequency changes and it incorporates admixture between populations. The method can analyze multiple populations simultaneously and retains power to detect selection signatures specific to ancestry components that are not representative of any extant populations. Using simulated data, we compare our method to related approaches, and show that it is orders of magnitude faster than the state-of-the-art, while retaining similar or higher power for most simulation scenarios. We also apply it to human genomic data and identify loci with extreme genetic differentiation between major geographic groups. Many of the genes identified are previously known selected loci relating to hair pigmentation and morphology, skin, and eye pigmentation. We also identify new candidate regions, including various selected loci in the Native American component of admixed Mexican-Americans. These involve diverse biological functions, such as immunity, fat distribution, food intake, vision, and hair development.
Collapse
Affiliation(s)
- Jade Yu Cheng
- Lundbeck GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Aaron J Stern
- Graduate Group in Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Fernando Racimo
- Lundbeck GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Rasmus Nielsen
- Lundbeck GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Statistics, University of California, Berkeley, Berkeley, CA, USA
| |
Collapse
|
45
|
Hejase HA, Mo Z, Campagna L, Siepel A. A Deep-Learning Approach for Inference of Selective Sweeps from the Ancestral Recombination Graph. Mol Biol Evol 2022; 39:msab332. [PMID: 34888675 PMCID: PMC8789311 DOI: 10.1093/molbev/msab332] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Detecting signals of selection from genomic data is a central problem in population genetics. Coupling the rich information in the ancestral recombination graph (ARG) with a powerful and scalable deep-learning framework, we developed a novel method to detect and quantify positive selection: Selection Inference using the Ancestral recombination graph (SIA). Built on a Long Short-Term Memory (LSTM) architecture, a particular type of a Recurrent Neural Network (RNN), SIA can be trained to explicitly infer a full range of selection coefficients, as well as the allele frequency trajectory and time of selection onset. We benchmarked SIA extensively on simulations under a European human demographic model, and found that it performs as well or better as some of the best available methods, including state-of-the-art machine-learning and ARG-based methods. In addition, we used SIA to estimate selection coefficients at several loci associated with human phenotypes of interest. SIA detected novel signals of selection particular to the European (CEU) population at the MC1R and ABCC11 loci. In addition, it recapitulated signals of selection at the LCT locus and several pigmentation-related genes. Finally, we reanalyzed polymorphism data of a collection of recently radiated southern capuchino seedeater taxa in the genus Sporophila to quantify the strength of selection and improved the power of our previous methods to detect partial soft sweeps. Overall, SIA uses deep learning to leverage the ARG and thereby provides new insight into how selective sweeps shape genomic diversity.
Collapse
Affiliation(s)
- Hussein A Hejase
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Ziyi Mo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
- School of Biological Sciences, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Leonardo Campagna
- Fuller Evolutionary Biology Program, Cornell Lab of Ornithology, Ithaca, NY, USA
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, USA
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| |
Collapse
|
46
|
Qiu J, Zhou Q, Ye W, Chen Q, Bao YJ. SweepCluster: A SNP clustering tool for detecting gene-specific sweeps in prokaryotes. BMC Bioinformatics 2022; 23:19. [PMID: 34991447 PMCID: PMC8734265 DOI: 10.1186/s12859-021-04533-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 12/13/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The gene-specific sweep is a selection process where an advantageous mutation along with the nearby neutral sites in a gene region increases the frequency in the population. It has been demonstrated to play important roles in ecological differentiation or phenotypic divergence in microbial populations. Therefore, identifying gene-specific sweeps in microorganisms will not only provide insights into the evolutionary mechanisms, but also unravel potential genetic markers associated with biological phenotypes. However, current methods were mainly developed for detecting selective sweeps in eukaryotic data of sparse genotypes and are not readily applicable to prokaryotic data. Furthermore, some challenges have not been sufficiently addressed by the methods, such as the low spatial resolution of sweep regions and lack of consideration of the spatial distribution of mutations. RESULTS We proposed a novel gene-centric and spatial-aware approach for identifying gene-specific sweeps in prokaryotes and implemented it in a python tool SweepCluster. Our method searches for gene regions with a high level of spatial clustering of pre-selected polymorphisms in genotype datasets assuming a null distribution model of neutral selection. The pre-selection of polymorphisms is based on their genetic signatures, such as elevated population subdivision, excessive linkage disequilibrium, or significant phenotype association. Performance evaluation using simulation data showed that the sensitivity and specificity of the clustering algorithm in SweepCluster is above 90%. The application of SweepCluster in two real datasets from the bacteria Streptococcus pyogenes and Streptococcus suis showed that the impact of pre-selection was dramatic and significantly reduced the uninformative signals. We validated our method using the genotype data from Vibrio cyclitrophicus, the only available dataset of gene-specific sweeps in bacteria, and obtained a concordance rate of 78%. We noted that the concordance rate could be underestimated due to distinct reference genomes and clustering strategies. The application to the human genotype datasets showed that SweepCluster is also applicable to eukaryotic data and is able to recover 80% of a catalog of known sweep regions. CONCLUSION SweepCluster is applicable to a broad category of datasets. It will be valuable for detecting gene-specific sweeps in diverse genotypic data and provide novel insights on adaptive evolution.
Collapse
Affiliation(s)
- Junhui Qiu
- State Key Laboratory of Biocatalysis and Enzyme Engineering, Hubei Collaborative Innovation Center for Green Transformation of Bio-Resources, Hubei Key Laboratory of Industrial Biotechnology, School of Life Sciences, Hubei University, Wuhan, 430062, China
| | - Qi Zhou
- State Key Laboratory of Biocatalysis and Enzyme Engineering, Hubei Collaborative Innovation Center for Green Transformation of Bio-Resources, Hubei Key Laboratory of Industrial Biotechnology, School of Life Sciences, Hubei University, Wuhan, 430062, China
| | - Weicai Ye
- School of Computer Science and Engineering, Guangdong Province Key Laboratory of Computational Science, and National Engineering Laboratory for Big Data Analysis and Application, Sun Yat-Sen University, Guangzhou, 510275, China
| | - Qianjun Chen
- State Key Laboratory of Biocatalysis and Enzyme Engineering, Hubei Collaborative Innovation Center for Green Transformation of Bio-Resources, Hubei Key Laboratory of Industrial Biotechnology, School of Life Sciences, Hubei University, Wuhan, 430062, China.
| | - Yun-Juan Bao
- State Key Laboratory of Biocatalysis and Enzyme Engineering, Hubei Collaborative Innovation Center for Green Transformation of Bio-Resources, Hubei Key Laboratory of Industrial Biotechnology, School of Life Sciences, Hubei University, Wuhan, 430062, China.
| |
Collapse
|
47
|
Charlesworth B, Jensen JD. Effects of Selection at Linked Sites on Patterns of Genetic Variability. ANNUAL REVIEW OF ECOLOGY, EVOLUTION, AND SYSTEMATICS 2021; 52:177-197. [PMID: 37089401 PMCID: PMC10120885 DOI: 10.1146/annurev-ecolsys-010621-044528] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Patterns of variation and evolution at a given site in a genome can be strongly influenced by the effects of selection at genetically linked sites. In particular, the recombination rates of genomic regions correlate with their amount of within-population genetic variability, the degree to which the frequency distributions of DNA sequence variants differ from their neutral expectations, and the levels of adaptation of their functional components. We review the major population genetic processes that are thought to lead to these patterns, focusing on their effects on patterns of variability: selective sweeps, background selection, associative overdominance, and Hill–Robertson interference among deleterious mutations. We emphasize the difficulties in distinguishing among the footprints of these processes and disentangling them from the effects of purely demographic factors such as population size changes. We also discuss how interactions between selective and demographic processes can significantly affect patterns of variability within genomes.
Collapse
Affiliation(s)
- Brian Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3FL, United Kingdom
| | - Jeffrey D. Jensen
- School of Life Sciences, Arizona State University, Tempe, Arizona 85281, USA
| |
Collapse
|
48
|
Kidner J, Theodorou P, Engler JO, Taubert M, Husemann M. A brief history and popularity of methods and tools used to estimate micro-evolutionary forces. Ecol Evol 2021; 11:13723-13743. [PMID: 34707813 PMCID: PMC8525119 DOI: 10.1002/ece3.8076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 07/12/2021] [Accepted: 08/12/2021] [Indexed: 11/30/2022] Open
Abstract
Population genetics is a field of research that predates the current generations of sequencing technology. Those approaches, that were established before massively parallel sequencing methods, have been adapted to these new marker systems (in some cases involving the development of new methods) that allow genome-wide estimates of the four major micro-evolutionary forces-mutation, gene flow, genetic drift, and selection. Nevertheless, classic population genetic markers are still commonly used and a plethora of analysis methods and programs is available for these and high-throughput sequencing (HTS) data. These methods employ various and diverse theoretical and statistical frameworks, to varying degrees of success, to estimate similar evolutionary parameters making it difficult to get a concise overview across the available approaches. Presently, reviews on this topic generally focus on a particular class of methods to estimate one or two evolutionary parameters. Here, we provide a brief history of methods and a comprehensive list of available programs for estimating micro-evolutionary forces. We furthermore analyzed their usage within the research community based on popularity (citation bias) and discuss the implications of this bias for the software community. We found that a few programs received the majority of citations, with program success being independent of both the parameters estimated and the computing platform. The only deviation from a model of exponential growth in the number of citations was found for the presence of a graphical user interface (GUI). Interestingly, no relationship was found for the impact factor of the journals, when the tools were published, suggesting accessibility might be more important than visibility.
Collapse
Affiliation(s)
- Jonathan Kidner
- General Zoology Institute for Biology Martin Luther University Halle-Wittenberg Halle (Saale) Germany
| | - Panagiotis Theodorou
- General Zoology Institute for Biology Martin Luther University Halle-Wittenberg Halle (Saale) Germany
| | - Jan O Engler
- Terrestrial Ecology Unit Department of Biology Ghent University Ghent Belgium
| | - Martin Taubert
- Aquatic Geomicrobiology Institute for Biodiversity Friedrich Schiller University Jena Jena Germany
| | - Martin Husemann
- General Zoology Institute for Biology Martin Luther University Halle-Wittenberg Halle (Saale) Germany
- Centrum für Naturkunde University of Hamburg Hamburg Germany
| |
Collapse
|
49
|
Passamonti MM, Somenzi E, Barbato M, Chillemi G, Colli L, Joost S, Milanesi M, Negrini R, Santini M, Vajana E, Williams JL, Ajmone-Marsan P. The Quest for Genes Involved in Adaptation to Climate Change in Ruminant Livestock. Animals (Basel) 2021; 11:2833. [PMID: 34679854 PMCID: PMC8532622 DOI: 10.3390/ani11102833] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 09/21/2021] [Accepted: 09/23/2021] [Indexed: 12/14/2022] Open
Abstract
Livestock radiated out from domestication centres to most regions of the world, gradually adapting to diverse environments, from very hot to sub-zero temperatures and from wet and humid conditions to deserts. The climate is changing; generally global temperature is increasing, although there are also more extreme cold periods, storms, and higher solar radiation. These changes impact livestock welfare and productivity. This review describes advances in the methodology for studying livestock genomes and the impact of the environment on animal production, giving examples of discoveries made. Sequencing livestock genomes has facilitated genome-wide association studies to localize genes controlling many traits, and population genetics has identified genomic regions under selection or introgressed from one breed into another to improve production or facilitate adaptation. Landscape genomics, which combines global positioning and genomics, has identified genomic features that enable animals to adapt to local environments. Combining the advances in genomics and methods for predicting changes in climate is generating an explosion of data which calls for innovations in the way big data sets are treated. Artificial intelligence and machine learning are now being used to study the interactions between the genome and the environment to identify historic effects on the genome and to model future scenarios.
Collapse
Affiliation(s)
- Matilde Maria Passamonti
- Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.)
| | - Elisa Somenzi
- Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.)
| | - Mario Barbato
- Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.)
| | - Giovanni Chillemi
- Department for Innovation in Biological, Agro-Food and Forest Systems–DIBAF, Università Della Tuscia, Via S. Camillo de Lellis snc, 01100 Viterbo, Italy; (G.C.); (M.M.)
| | - Licia Colli
- Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.)
- Research Center on Biodiversity and Ancient DNA—BioDNA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy
| | - Stéphane Joost
- Laboratory of Geographic Information Systems (LASIG), School of Architecture, Civil and Environmental Engineering (ENAC), Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland; (S.J.); (E.V.)
| | - Marco Milanesi
- Department for Innovation in Biological, Agro-Food and Forest Systems–DIBAF, Università Della Tuscia, Via S. Camillo de Lellis snc, 01100 Viterbo, Italy; (G.C.); (M.M.)
| | - Riccardo Negrini
- Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.)
| | - Monia Santini
- Impacts on Agriculture, Forests and Ecosystem Services (IAFES) Division, Fondazione Centro Euro-Mediterraneo Sui Cambiamenti Climatici (CMCC), Viale Trieste 127, 01100 Viterbo, Italy;
| | - Elia Vajana
- Laboratory of Geographic Information Systems (LASIG), School of Architecture, Civil and Environmental Engineering (ENAC), Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland; (S.J.); (E.V.)
| | - John Lewis Williams
- Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.)
| | - Paolo Ajmone-Marsan
- Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.)
- Nutrigenomics and Proteomics Research Center—PRONUTRIGEN, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy
| |
Collapse
|
50
|
Zhang G, Mostad JD, Andersen EC. Natural variation in fecundity is correlated with species-wide levels of divergence in Caenorhabditis elegans. G3 (BETHESDA, MD.) 2021; 11:jkab168. [PMID: 33983439 PMCID: PMC8496234 DOI: 10.1093/g3journal/jkab168] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 05/03/2021] [Indexed: 01/08/2023]
Abstract
Life history traits underlie the fitness of organisms and are under strong natural selection. A new mutation that positively impacts a life history trait will likely increase in frequency and become fixed in a population (e.g., a selective sweep). The identification of the beneficial alleles that underlie selective sweeps provides insights into the mechanisms that occurred during the evolution of a species. In the global population of Caenorhabditis elegans, we previously identified selective sweeps that have drastically reduced chromosomal-scale genetic diversity in the species. Here, we measured the fecundity of 121 wild C. elegans strains, including many recently isolated divergent strains from the Hawaiian islands and found that strains with larger swept genomic regions have significantly higher fecundity than strains without evidence of the recent selective sweeps. We used genome-wide association (GWA) mapping to identify three quantitative trait loci (QTL) underlying the fecundity variation. In addition, we mapped previous fecundity data from wild C. elegans strains and C. elegans recombinant inbred advanced intercross lines that were grown in various conditions and detected eight QTL using GWA and linkage mappings. These QTL show the genetic complexity of fecundity across this species. Moreover, the haplotype structure in each GWA QTL region revealed correlations with recent selective sweeps in the C. elegans population. North American and European strains had significantly higher fecundity than most strains from Hawaii, a hypothesized origin of the C. elegans species, suggesting that beneficial alleles that caused increased fecundity could underlie the selective sweeps during the worldwide expansion of C. elegans.
Collapse
Affiliation(s)
- Gaotian Zhang
- Department of Molecular Biosciences, Northwestern University, Evanston, IL 60208, USA
| | - Jake D Mostad
- Department of Molecular Biosciences, Northwestern University, Evanston, IL 60208, USA
| | - Erik C Andersen
- Department of Molecular Biosciences, Northwestern University, Evanston, IL 60208, USA
| |
Collapse
|