1
|
Arnab SP, Campelo dos Santos AL, Fumagalli M, DeGiorgio M. Efficient Detection and Characterization of Targets of Natural Selection Using Transfer Learning. Mol Biol Evol 2025; 42:msaf094. [PMID: 40341942 PMCID: PMC12062966 DOI: 10.1093/molbev/msaf094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Revised: 04/16/2025] [Accepted: 04/17/2025] [Indexed: 05/11/2025] Open
Abstract
Natural selection leaves detectable patterns of altered spatial diversity within genomes, and identifying affected regions is crucial for understanding species evolution. Recently, machine learning approaches applied to raw population genomic data have been developed to uncover these adaptive signatures. Convolutional neural networks (CNNs) are particularly effective for this task, as they handle large data arrays while maintaining element correlations. However, shallow CNNs may miss complex patterns due to their limited capacity, while deep CNNs can capture these patterns but require extensive data and computational power. Transfer learning addresses these challenges by utilizing a deep CNN pretrained on a large dataset as a feature extraction tool for downstream classification and evolutionary parameter prediction. This approach reduces extensive training data generation requirements and computational needs while maintaining high performance. In this study, we developed TrIdent, a tool that uses transfer learning to enhance detection of adaptive genomic regions from image representations of multilocus variation. We evaluated TrIdent across various genetic, demographic, and adaptive settings, in addition to unphased data and other confounding factors. TrIdent demonstrated improved detection of adaptive regions compared to recent methods using similar data representations. We further explored model interpretability through class activation maps and adapted TrIdent to infer selection parameters for identified adaptive candidates. Using whole-genome haplotype data from European and African populations, TrIdent effectively recapitulated known sweep candidates and identified novel cancer, and other disease-associated genes as potential sweeps.
Collapse
Affiliation(s)
- Sandipan Paul Arnab
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL, USA
| | | | - Matteo Fumagalli
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, UK
- The Alan Turing Institute, London, UK
| | - Michael DeGiorgio
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL, USA
| |
Collapse
|
2
|
Ferraretti G, Rill A, Abondio P, Smith K, Ojeda-Granados C, De Fanti S, Alberti M, Izzi M, Sherpa PT, Cocco P, Tiriticco M, Di Marcello M, Dezi A, Gnecchi-Ruscone GA, Natali L, Corcelli A, Marinelli G, Garagnani P, Peluzzi D, Luiselli D, Pettener D, Sarno S, Sazzini M. Convergent evolution of complex adaptive traits modulates angiogenesis in high-altitude Andean and Himalayan human populations. Commun Biol 2025; 8:377. [PMID: 40050470 PMCID: PMC11885840 DOI: 10.1038/s42003-025-07813-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 02/25/2025] [Indexed: 03/09/2025] Open
Abstract
Convergent adaptations represent paradigmatic examples of the capacity of natural selection to influence organisms' biology. However, the possibility to investigate the genetic determinants underpinning convergent complex adaptive traits has been offered only recently by methods for inferring polygenic adaptations from genomic data. Relying on this approach, we demonstrate how high-altitude Andean human groups experienced pervasive selective events at angiogenic pathways, which resemble those previously attested for Himalayan populations despite partial convergence at the single-gene level was observed. This provides additional evidence for the drivers of convergent evolution of enhanced blood perfusion in populations exposed to hypobaric hypoxia for thousands of years.
Collapse
Affiliation(s)
- Giulia Ferraretti
- Laboratory of Molecular Anthropology & Centre for Genome Biology, Department of Biological, Geological and Environmental Sciences, University of Bologna, Bologna, Italy
| | - Aina Rill
- Laboratory of Molecular Anthropology & Centre for Genome Biology, Department of Biological, Geological and Environmental Sciences, University of Bologna, Bologna, Italy
- Josep Carreras Leukaemia Research Institute, PhD Programme in Biomedicine, University of Barcelona, Barcelona, Spain
| | - Paolo Abondio
- Department of Cultural Heritage, Ravenna Campus, University of Bologna, Ravenna, Italy
- Department of Biology, University of Rome Tor Vergata, Rome, Italy
| | - Kyra Smith
- Laboratory of Molecular Anthropology & Centre for Genome Biology, Department of Biological, Geological and Environmental Sciences, University of Bologna, Bologna, Italy
| | - Claudia Ojeda-Granados
- Laboratory of Molecular Anthropology & Centre for Genome Biology, Department of Biological, Geological and Environmental Sciences, University of Bologna, Bologna, Italy
- Department of Medical and Surgical Sciences and Advanced Technologies "GF Ingrassia", University of Catania, Catania, Italy
| | - Sara De Fanti
- IRCCS Istituto delle Scienze Neurologiche di Bologna, Bologna, Italy
| | - Marta Alberti
- Laboratory of Molecular Anthropology & Centre for Genome Biology, Department of Biological, Geological and Environmental Sciences, University of Bologna, Bologna, Italy
| | - Massimo Izzi
- Complex Operative Unit of Endocrinology and Diabetes Care, Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy
| | | | - Paolo Cocco
- Explora Nunaat International, Montorio al Vomano, Teramo, Italy
| | | | | | - Agnese Dezi
- Department of Precision and Regenerative Medicine and Ionian Area, University of Bari Aldo Moro, Bari, Italy
| | - Guido Alberto Gnecchi-Ruscone
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Archaeo- and Palaeogenetics, Institute for Archaeological Sciences, Department of Geosciences & Senckenberg Centre for Human Evolution and Palaeoenvironment, University of Tübingen, Tübingen, Germany
| | - Luca Natali
- Explora Nunaat International, Montorio al Vomano, Teramo, Italy
- Italian Institute of Human Paleontology, Rome, Italy
| | - Angela Corcelli
- Department of Translational Biomedicine and Neuroscience, University of Bari Aldo Moro, Bari, Italy
| | | | - Paolo Garagnani
- IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
- Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy
| | - Davide Peluzzi
- Explora Nunaat International, Montorio al Vomano, Teramo, Italy
| | - Donata Luiselli
- Department of Cultural Heritage, Ravenna Campus, University of Bologna, Ravenna, Italy
| | - Davide Pettener
- Laboratory of Molecular Anthropology & Centre for Genome Biology, Department of Biological, Geological and Environmental Sciences, University of Bologna, Bologna, Italy
| | - Stefania Sarno
- Laboratory of Molecular Anthropology & Centre for Genome Biology, Department of Biological, Geological and Environmental Sciences, University of Bologna, Bologna, Italy
| | - Marco Sazzini
- Laboratory of Molecular Anthropology & Centre for Genome Biology, Department of Biological, Geological and Environmental Sciences, University of Bologna, Bologna, Italy.
- Interdepartmental Centre Alma Mater Research Institute on Global Changes and Climate Change, University of Bologna, Bologna, Italy.
| |
Collapse
|
3
|
Arnab SP, Dos Santos ALC, Fumagalli M, DeGiorgio M. Efficient detection and characterization of targets of natural selection using transfer learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.05.641710. [PMID: 40093065 PMCID: PMC11908262 DOI: 10.1101/2025.03.05.641710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 03/19/2025]
Abstract
Natural selection leaves detectable patterns of altered spatial diversity within genomes, and identifying affected regions is crucial for understanding species evolution. Recently, machine learning approaches applied to raw population genomic data have been developed to uncover these adaptive signatures. Convolutional neural networks (CNNs) are particularly effective for this task, as they handle large data arrays while maintaining element correlations. However, shallow CNNs may miss complex patterns due to their limited capacity, while deep CNNs can capture these patterns but require extensive data and computational power. Transfer learning addresses these challenges by utilizing a deep CNN pre-trained on a large dataset as a feature extraction tool for downstream classification and evolutionary parameter prediction. This approach reduces extensive training data generation requirements and computational needs while maintaining high performance. In this study, we developed TrIdent, a tool that uses transfer learning to enhance detection of adaptive genomic regions from image representations of multilocus variation. We evaluated TrIdent across various genetic, demographic, and adaptive settings, in addition to unphased data and other confounding factors. TrIdent demonstrated improved detection of adaptive regions compared to recent methods using similar data representations. We further explored model interpretability through class activation maps and adapted TrIdent to infer selection parameters for identified adaptive candidates. Using whole-genome haplotype data from European and African populations, TrIdent effectively recapitulated known sweep candidates and identified novel cancer, and other disease-associated genes as potential sweeps.
Collapse
Affiliation(s)
- Sandipan Paul Arnab
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL, USA
| | | | - Matteo Fumagalli
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, UK
- The Alan Turing Institute, London, UK
| | - Michael DeGiorgio
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL, USA
| |
Collapse
|
4
|
Amin MR, Hasan M, DeGiorgio M. Digital Image Processing to Detect Adaptive Evolution. Mol Biol Evol 2024; 41:msae242. [PMID: 39565932 PMCID: PMC11631197 DOI: 10.1093/molbev/msae242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 10/28/2024] [Accepted: 11/13/2024] [Indexed: 11/22/2024] Open
Abstract
In recent years, advances in image processing and machine learning have fueled a paradigm shift in detecting genomic regions under natural selection. Early machine learning techniques employed population-genetic summary statistics as features, which focus on specific genomic patterns expected by adaptive and neutral processes. Though such engineered features are important when training data are limited, the ease at which simulated data can now be generated has led to the recent development of approaches that take in image representations of haplotype alignments and automatically extract important features using convolutional neural networks. Digital image processing methods termed α-molecules are a class of techniques for multiscale representation of objects that can extract a diverse set of features from images. One such α-molecule method, termed wavelet decomposition, lends greater control over high-frequency components of images. Another α-molecule method, termed curvelet decomposition, is an extension of the wavelet concept that considers events occurring along curves within images. We show that application of these α-molecule techniques to extract features from image representations of haplotype alignments yield high true positive rate and accuracy to detect hard and soft selective sweep signatures from genomic data with both linear and nonlinear machine learning classifiers. Moreover, we find that such models are easy to visualize and interpret, with performance rivaling those of contemporary deep learning approaches for detecting sweeps.
Collapse
Affiliation(s)
- Md Ruhul Amin
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Mahmudul Hasan
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Michael DeGiorgio
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| |
Collapse
|
5
|
Mackintosh A, Vila R, Martin SH, Setter D, Lohse K. Do chromosome rearrangements fix by genetic drift or natural selection? Insights from Brenthis butterflies. Mol Ecol 2024; 33:e17146. [PMID: 37807966 PMCID: PMC11628658 DOI: 10.1111/mec.17146] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 09/08/2023] [Accepted: 09/14/2023] [Indexed: 10/10/2023]
Abstract
Large-scale chromosome rearrangements, such as fissions and fusions, are a common feature of eukaryote evolution. They can have considerable influence on the evolution of populations, yet it remains unclear exactly how rearrangements become established and eventually fix. Rearrangements could fix by genetic drift if they are weakly deleterious or neutral, or they may instead be favoured by positive natural selection. Here, we compare genome assemblies of three closely related Brenthis butterfly species and characterize a complex history of fission and fusion rearrangements. An inferred demographic history of these species suggests that rearrangements became fixed in populations with large long-term effective size (N e), consistent with rearrangements being selectively neutral or only very weakly underdominant. Using a recently developed analytic framework for characterizing hard selective sweeps, we find that chromosome fusions are not enriched for evidence of past sweeps compared to other regions of the genome. Nonetheless, we do infer a strong and recent selective sweep around one chromosome fusion in the B. daphne genome. Our results suggest that rearrangements in these species likely have weak absolute fitness effects and fix by genetic drift. However, one putative selective sweep raises the possibility that natural selection may sometimes play a role in the fixation of chromosome fusions.
Collapse
Affiliation(s)
| | - Roger Vila
- Institut de Biologia Evolutiva (CSIC‐Universitat Pompeu Fabra)BarcelonaSpain
| | - Simon H. Martin
- Institute of Ecology and EvolutionUniversity of EdinburghEdinburghUK
| | - Derek Setter
- Institute of Ecology and EvolutionUniversity of EdinburghEdinburghUK
| | - Konrad Lohse
- Institute of Ecology and EvolutionUniversity of EdinburghEdinburghUK
| |
Collapse
|
6
|
Tiwari M, Gujar G, Shashank CG, Ponsuksili S. Selection signatures for high altitude adaptation in livestock: A review. Gene 2024; 927:148757. [PMID: 38986751 DOI: 10.1016/j.gene.2024.148757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Revised: 07/01/2024] [Accepted: 07/05/2024] [Indexed: 07/12/2024]
Abstract
High altitude adapted livestock species (cattle, yak, goat, sheep, and horse) has critical role in the human socioeconomic sphere and acts as good source of animal source products including milk, meat, and leather, among other things. These species sustain production and reproduction even in harsh environments on account of adaptation resulting from continued evolution of beneficial traits. Selection pressure leads to various adaptive strategies in livestock whose footprints are evident at the different genomic sites as the "Selection Signature". Scrutiny of these signatures provides us crucial insight into the evolutionary process and domestication of livestock adapted to diverse climatic conditions. These signatures have the potential to change the sphere of animal breeding and further usher the selection programmes in right direction. Technological revolution and recent strides made in genomic studies has opened the routes for the identification of selection signatures. Numerous statistical approaches and bioinformatics tools have been developed to detect the selection signature. Consequently, studies across years have identified candidate genes under selection region found associated with numerous traits which have a say in adaptation to high-altitude environment. This makes it pertinent to have a better understanding about the selection signature, the ways to identify and how to utilize them for betterment of livestock populations as well as farmers. This review takes a closer look into the general concept, various methodologies, and bioinformatics tools commonly employed in selection signature studies and summarize the results of recent selection signature studies related to high-altitude adaptation in various livestock species. This review will serve as an informative and useful insight for researchers and students in the field of animal breeding and evolutionary biology.
Collapse
Affiliation(s)
- Manish Tiwari
- ICAR-National Dairy Research Institute, Karnal, India; U.P. Pt. Deen Dayal Upadhyaya Veterinary Science University and Cattle Research Institute, Mathura, India.
| | | | - C G Shashank
- ICAR-National Dairy Research Institute, Karnal, India
| | | |
Collapse
|
7
|
Ferraretti G, Abondio P, Alberti M, Dezi A, Sherpa PT, Cocco P, Tiriticco M, Di Marcello M, Gnecchi-Ruscone GA, Natali L, Corcelli A, Marinelli G, Peluzzi D, Sarno S, Sazzini M. Archaic introgression contributed to shape the adaptive modulation of angiogenesis and cardiovascular traits in human high-altitude populations from the Himalayas. eLife 2024; 12:RP89815. [PMID: 39513938 PMCID: PMC11548878 DOI: 10.7554/elife.89815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2024] Open
Abstract
It is well established that several Homo sapiens populations experienced admixture with extinct human species during their evolutionary history. Sometimes, such a gene flow could have played a role in modulating their capability to cope with a variety of selective pressures, thus resulting in archaic adaptive introgression events. A paradigmatic example of this evolutionary mechanism is offered by the EPAS1 gene, whose most frequent haplotype in Himalayan highlanders was proved to reduce their susceptibility to chronic mountain sickness and to be introduced in the gene pool of their ancestors by admixture with Denisovans. In this study, we aimed at further expanding the investigation of the impact of archaic introgression on more complex adaptive responses to hypobaric hypoxia evolved by populations of Tibetan/Sherpa ancestry, which have been plausibly mediated by soft selective sweeps and/or polygenic adaptations rather than by hard selective sweeps. For this purpose, we used a combination of composite-likelihood and gene network-based methods to detect adaptive loci in introgressed chromosomal segments from Tibetan WGS data and to shortlist those presenting Denisovan-like derived alleles that participate to the same functional pathways and are absent in populations of African ancestry, which are supposed to do not have experienced Denisovan admixture. According to this approach, we identified multiple genes putatively involved in archaic introgression events and that, especially as regards TBC1D1, RASGRF2, PRKAG2, and KRAS, have plausibly contributed to shape the adaptive modulation of angiogenesis and of certain cardiovascular traits in high-altitude Himalayan peoples. These findings provided unprecedented evidence about the complexity of the adaptive phenotype evolved by these human groups to cope with challenges imposed by hypobaric hypoxia, offering new insights into the tangled interplay of genetic determinants that mediates the physiological adjustments crucial for human adaptation to the high-altitude environment.
Collapse
Affiliation(s)
- Giulia Ferraretti
- Laboratory of Molecular Anthropology and Centre for Genome Biology, Department of Biological, Geological and Environmental Sciences, University of BolognaBolognaItaly
| | - Paolo Abondio
- Department of Cultural Heritage, Ravenna Campus, University of BolognaBolognaItaly
| | - Marta Alberti
- Laboratory of Molecular Anthropology and Centre for Genome Biology, Department of Biological, Geological and Environmental Sciences, University of BolognaBolognaItaly
| | - Agnese Dezi
- Department of Emergency and Organ Transplantation, University of Bari Aldo MoroBari Aldo MoroItaly
| | | | - Paolo Cocco
- Explora Nunaat International, Montorio al VomanoTeramoItaly
| | | | | | | | - Luca Natali
- Explora Nunaat International, Montorio al VomanoTeramoItaly
- Italian Institute of Human PaleontologyRomeItaly
| | - Angela Corcelli
- Department of Basic Medical Science, Neuroscience and Sense Organs, University of Bari Aldo MoroBariItaly
| | | | - Davide Peluzzi
- Explora Nunaat International, Montorio al VomanoTeramoItaly
| | - Stefania Sarno
- Laboratory of Molecular Anthropology and Centre for Genome Biology, Department of Biological, Geological and Environmental Sciences, University of BolognaBolognaItaly
| | - Marco Sazzini
- Laboratory of Molecular Anthropology and Centre for Genome Biology, Department of Biological, Geological and Environmental Sciences, University of BolognaBolognaItaly
- Interdepartmental Centre Alma Mater Research Institute on Global Changes and Climate Change, University of BolognaBolognaItaly
| |
Collapse
|
8
|
Cheng X, Steinrücken M. Population Genomic Scans for Natural Selection and Demography. Annu Rev Genet 2024; 58:319-339. [PMID: 39227130 DOI: 10.1146/annurev-genet-111523-102651] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
Uncovering the fundamental processes that shape genomic variation in natural populations is a primary objective of population genetics. These processes include demographic effects such as past changes in effective population size or gene flow between structured populations. Furthermore, genomic variation is affected by selection on nonneutral genetic variants, for example, through the adaptation of beneficial alleles or balancing selection that maintains genetic variation. In this article, we discuss the characterization of these processes using population genetic models, and we review methods developed on the basis of these models to unravel the underlying processes from modern population genomic data sets. We briefly discuss the conditions in which these approaches can be used to infer demography or identify specific nonneutral genetic variants and cases in which caution is warranted. Moreover, we summarize the challenges of jointly inferring demography and selective processes that affect neutral variation genome-wide.
Collapse
Affiliation(s)
- Xiaoheng Cheng
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, USA;
| | - Matthias Steinrücken
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, USA;
| |
Collapse
|
9
|
Zhao S, Chi L, Fu M, Chen H. HaploSweep: Detecting and Distinguishing Recent Soft and Hard Selective Sweeps through Haplotype Structure. Mol Biol Evol 2024; 41:msae192. [PMID: 39288167 PMCID: PMC11452351 DOI: 10.1093/molbev/msae192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 07/29/2024] [Accepted: 09/03/2024] [Indexed: 09/19/2024] Open
Abstract
Identifying soft selective sweeps using genomic data is a challenging yet crucial task in population genetics. In this study, we present HaploSweep, a novel method for detecting and categorizing soft and hard selective sweeps based on haplotype structure. Through simulations spanning a broad range of selection intensities, softness levels, and demographic histories, we demonstrate that HaploSweep outperforms iHS, nSL, and H12 in detecting soft sweeps. HaploSweep achieves high classification accuracy-0.9247 for CHB, 0.9484 for CEU, and 0.9829 YRI-when applied to simulations in line with the human Out-of-Africa demographic model. We also observe that the classification accuracy remains consistently robust across different demographic models. Additionally, we introduce a refined method to accurately distinguish soft shoulders adjacent to hard sweeps from soft sweeps. Application of HaploSweep to genomic data of CHB, CEU, and YRI populations from the 1000 genomes project has led to the discovery of several new genes that bear strong evidence of population-specific soft sweeps (HRNR, AMBRA1, CBFA2T2, DYNC2H1, and RANBP2 etc.), with prevalent associations to immune functions and metabolic processes. The validated performance of HaploSweep, demonstrated through both simulated and real data, underscores its potential as a valuable tool for detecting and comprehending the role of soft sweeps in adaptive evolution.
Collapse
Affiliation(s)
- Shilei Zhao
- Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- School of Future Technology, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lianjiang Chi
- Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- School of Future Technology, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Mincong Fu
- Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- School of Future Technology, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hua Chen
- Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- School of Future Technology, University of Chinese Academy of Sciences, Beijing 100049, China
- CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
| |
Collapse
|
10
|
Igoshin AV, Romashov GA, Yurchenko AA, Yudin NS, Larkin DM. Scans for Signatures of Selection in Genomes of Wagyu and Buryat Cattle Breeds Reveal Candidate Genes and Genetic Variants for Adaptive Phenotypes and Production Traits. Animals (Basel) 2024; 14:2059. [PMID: 39061521 PMCID: PMC11274160 DOI: 10.3390/ani14142059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Revised: 07/10/2024] [Accepted: 07/11/2024] [Indexed: 07/28/2024] Open
Abstract
Past and ongoing selection shapes the genomes of livestock breeds. Identifying such signatures of selection allows for uncovering the genetic bases of affected phenotypes, including economically important traits and environmental adaptations, for the further improvement of breed genetics to respond to climate and economic challenges. Turano-Mongolian cattle are a group of taurine breeds known for their adaptation to extreme environmental conditions and outstanding production performance. Buryat Turano-Mongolian cattle are among the few breeds adapted to cold climates and poor forage. Wagyu, on the other hand, is famous for high productivity and unique top-quality marbled meat. We used hapFLK, the de-correlated composite of multiple signals (DCMS), PBS, and FST methods to search for signatures of selection in their genomes. The scans revealed signals in genes related to cold adaptation (e.g., STAT3, DOCK5, GSTM3, and CXCL8) and food digestibility (SI) in the Buryat breed, and growth and development traits (e.g., RBFOX2 and SHOX2) and marbling (e.g., DGAT1, IQGAP2, RSRC1, and DIP2B) in Wagyu. Several putatively selected genes associated with reproduction, immunity, and resistance to pathogens were found in both breed genomes. The results of our work could be used for creating new productive adapted breeds or improving the extant breeds.
Collapse
Affiliation(s)
- Alexander V. Igoshin
- The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Sciences (ICG SB RAS), Novosibirsk 630090, Russia; (A.V.I.)
| | - Grigorii A. Romashov
- The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Sciences (ICG SB RAS), Novosibirsk 630090, Russia; (A.V.I.)
| | - Andrey A. Yurchenko
- INSERM U981, Gustave Roussy Cancer Campus, Université Paris Saclay, 94800 Villejuif, France
| | - Nikolay S. Yudin
- The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Sciences (ICG SB RAS), Novosibirsk 630090, Russia; (A.V.I.)
| | - Denis M. Larkin
- Royal Veterinary College, University of London, London NW1 0TU, UK
| |
Collapse
|
11
|
Wang Y, Duchen P, Chávez A, Sree KS, Appenroth KJ, Zhao H, Höfer M, Huber M, Xu S. Population genomics and epigenomics of Spirodela polyrhiza provide insights into the evolution of facultative asexuality. Commun Biol 2024; 7:581. [PMID: 38755313 PMCID: PMC11099151 DOI: 10.1038/s42003-024-06266-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 04/30/2024] [Indexed: 05/18/2024] Open
Abstract
Many plants are facultatively asexual, balancing short-term benefits with long-term costs of asexuality. During range expansion, natural selection likely influences the genetic controls of asexuality in these organisms. However, evidence of natural selection driving asexuality is limited, and the evolutionary consequences of asexuality on the genomic and epigenomic diversity remain controversial. We analyzed population genomes and epigenomes of Spirodela polyrhiza, (L.) Schleid., a facultatively asexual plant that flowers rarely, revealing remarkably low genomic diversity and DNA methylation levels. Within species, demographic history and the frequency of asexual reproduction jointly determined intra-specific variations of genomic diversity and DNA methylation levels. Genome-wide scans revealed that genes associated with stress adaptations, flowering and embryogenesis were under positive selection. These data are consistent with the hypothesize that natural selection can shape the evolution of asexuality during habitat expansions, which alters genomic and epigenomic diversity levels.
Collapse
Affiliation(s)
- Yangzi Wang
- Institute of Organismic and Molecular Evolution, University of Mainz, 55128, Mainz, Germany
- Institute for Evolution and Biodiversity, University of Münster, 48161, Münster, Germany
| | - Pablo Duchen
- Institute of Organismic and Molecular Evolution, University of Mainz, 55128, Mainz, Germany
- Institute for Evolution and Biodiversity, University of Münster, 48161, Münster, Germany
| | - Alexandra Chávez
- Institute of Organismic and Molecular Evolution, University of Mainz, 55128, Mainz, Germany
- Institute for Evolution and Biodiversity, University of Münster, 48161, Münster, Germany
- Institute of Plant Biology and Biotechnology, University of Münster, 48161, Münster, Germany
| | - K Sowjanya Sree
- Department of Environmental Science, Central University of Kerala, Periya, 671320, India
| | - Klaus J Appenroth
- Matthias Schleiden Institute - Plant Physiology, Friedrich Schiller University of Jena, 07743, Jena, Germany
| | - Hai Zhao
- Chengdu Institute of Biology, Chinese Academy of Sciences, 6100641, Chengdu, China
| | - Martin Höfer
- Institute of Organismic and Molecular Evolution, University of Mainz, 55128, Mainz, Germany
- Institute for Evolution and Biodiversity, University of Münster, 48161, Münster, Germany
| | - Meret Huber
- Institute of Organismic and Molecular Evolution, University of Mainz, 55128, Mainz, Germany
- Institute of Plant Biology and Biotechnology, University of Münster, 48161, Münster, Germany
| | - Shuqing Xu
- Institute of Organismic and Molecular Evolution, University of Mainz, 55128, Mainz, Germany.
- Institute for Evolution and Biodiversity, University of Münster, 48161, Münster, Germany.
- Institute for Quantitative and Computational Biosciences, University of Mainz, 55218, Mainz, Germany.
| |
Collapse
|
12
|
Russo CAM, Eyre-Walker A, Katz LA, Gaut BS. Forty Years of Inferential Methods in the Journals of the Society for Molecular Biology and Evolution. Mol Biol Evol 2024; 41:msad264. [PMID: 38197288 PMCID: PMC10763999 DOI: 10.1093/molbev/msad264] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 11/27/2023] [Indexed: 01/11/2024] Open
Abstract
We are launching a series to celebrate the 40th anniversary of the first issue of Molecular Biology and Evolution. In 2024, we will publish virtual issues containing selected papers published in the Society for Molecular Biology and Evolution journals, Molecular Biology and Evolution and Genome Biology and Evolution. Each virtual issue will be accompanied by a perspective that highlights the historic and contemporary contributions of our journals to a specific topic in molecular evolution. This perspective, the first in the series, presents an account of the broad array of methods that have been published in the Society for Molecular Biology and Evolution journals, including methods to infer phylogenies, to test hypotheses in a phylogenetic framework, and to infer population genetic processes. We also mention many of the software implementations that make methods tractable for empiricists. In short, the Society for Molecular Biology and Evolution community has much to celebrate after four decades of publishing high-quality science including numerous important inferential methods.
Collapse
Affiliation(s)
- Claudia A M Russo
- Departamento de Genética, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | | | - Laura A Katz
- Department of Biological Sciences, Smith College, Northampton, MA, USA
| | - Brandon S Gaut
- School of Biological Sciences, University of California, Irvine, CA, USA
| |
Collapse
|
13
|
Szpiech ZA. selscan 2.0: scanning for sweeps in unphased data. Bioinformatics 2024; 40:btae006. [PMID: 38180866 PMCID: PMC10789311 DOI: 10.1093/bioinformatics/btae006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 12/26/2023] [Accepted: 01/03/2024] [Indexed: 01/07/2024] Open
Abstract
SUMMARY Several popular haplotype-based statistics for identifying recent or ongoing positive selection in genomes require knowledge of haplotype phase. Here, we provide an update to selscan which implements a re-definition of these statistics for use in unphased data. AVAILABILITY AND IMPLEMENTATION Source code and binaries are freely available at https://github.com/szpiech/selscan, implemented in C/C++, and supported on Linux, Windows, and MacOS.
Collapse
Affiliation(s)
- Zachary A Szpiech
- Department of Biology, Penn State University, University Park, PA 16802, United States
- Institute for Computational and Data Sciences, Penn State University, University Park, PA 16802, United States
| |
Collapse
|
14
|
Schrider DR. Allelic gene conversion softens selective sweeps. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.05.570141. [PMID: 38106127 PMCID: PMC10723294 DOI: 10.1101/2023.12.05.570141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
The prominence of positive selection, in which beneficial mutations are favored by natural selection and rapidly increase in frequency, is a subject of intense debate. Positive selection can result in selective sweeps, in which the haplotype(s) bearing the adaptive allele "sweep" through the population, thereby removing much of the genetic diversity from the region surrounding the target of selection. Two models of selective sweeps have been proposed: classical sweeps, or "hard sweeps", in which a single copy of the adaptive allele sweeps to fixation, and "soft sweeps", in which multiple distinct copies of the adaptive allele leave descendants after the sweep. Soft sweeps can be the outcome of recurrent mutation to the adaptive allele, or the presence of standing genetic variation consisting of multiple copies of the adaptive allele prior to the onset of selection. Importantly, soft sweeps will be common when populations can rapidly adapt to novel selective pressures, either because of a high mutation rate or because adaptive alleles are already present. The prevalence of soft sweeps is especially controversial, and it has been noted that selection on standing variation or recurrent mutations may not always produce soft sweeps. Here, we show that the inverse is true: selection on single-origin de novo mutations may often result in an outcome that is indistinguishable from a soft sweep. This is made possible by allelic gene conversion, which "softens" hard sweeps by copying the adaptive allele onto multiple genetic backgrounds, a process we refer to as a "pseudo-soft" sweep. We carried out a simulation study examining the impact of gene conversion on sweeps from a single de novo variant in models of human, Drosophila, and Arabidopsis populations. The fraction of simulations in which gene conversion had produced multiple haplotypes with the adaptive allele upon fixation was appreciable. Indeed, under realistic demographic histories and gene conversion rates, even if selection always acts on a single-origin mutation, sweeps involving multiple haplotypes are more likely than hard sweeps in large populations, especially when selection is not extremely strong. Thus, even when the mutation rate is low or there is no standing variation, hard sweeps are expected to be the exception rather than the rule in large populations. These results also imply that the presence of signatures of soft sweeps does not necessarily mean that adaptation has been especially rapid or is not mutation limited.
Collapse
Affiliation(s)
- Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599
| |
Collapse
|
15
|
Panigrahi M, Rajawat D, Nayak SS, Ghildiyal K, Sharma A, Jain K, Lei C, Bhushan B, Mishra BP, Dutt T. Landmarks in the history of selective sweeps. Anim Genet 2023; 54:667-688. [PMID: 37710403 DOI: 10.1111/age.13355] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 08/28/2023] [Indexed: 09/16/2023]
Abstract
Half a century ago, a seminal article on the hitchhiking effect by Smith and Haigh inaugurated the concept of the selection signature. Selective sweeps are characterised by the rapid spread of an advantageous genetic variant through a population and hence play an important role in shaping evolution and research on genetic diversity. The process by which a beneficial allele arises and becomes fixed in a population, leading to a increase in the frequency of other linked alleles, is known as genetic hitchhiking or genetic draft. Kimura's neutral theory and hitchhiking theory are complementary, with Kimura's neutral evolution as the 'null model' and positive selection as the 'signal'. Both are widely accepted in evolution, especially with genomics enabling precise measurements. Significant advances in genomic technologies, such as next-generation sequencing, high-density SNP arrays and powerful bioinformatics tools, have made it possible to systematically investigate selection signatures in a variety of species. Although the history of selection signatures is relatively recent, progress has been made in the last two decades, owing to the increasing availability of large-scale genomic data and the development of computational methods. In this review, we embark on a journey through the history of research on selective sweeps, ranging from early theoretical work to recent empirical studies that utilise genomic data.
Collapse
Affiliation(s)
- Manjit Panigrahi
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Divya Rajawat
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | | | - Kanika Ghildiyal
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Anurodh Sharma
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Karan Jain
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Chuzhao Lei
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, China
| | - Bharat Bhushan
- Division of Animal Genetics, Indian Veterinary Research Institute, Bareilly, India
| | - Bishnu Prasad Mishra
- Division of Animal Biotechnology, ICAR-National Bureau of Animal Genetic Resources, Karnal, India
| | - Triveni Dutt
- Livestock Production and Management Section, Indian Veterinary Research Institute, Bareilly, India
| |
Collapse
|
16
|
Jin M, Peng Y, Peng J, Zhang H, Shan Y, Liu K, Xiao Y. Transcriptional regulation and overexpression of GST cluster enhances pesticide resistance in the cotton bollworm, Helicoverpa armigera (Lepidoptera: Noctuidae). Commun Biol 2023; 6:1064. [PMID: 37857697 PMCID: PMC10587110 DOI: 10.1038/s42003-023-05447-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 10/11/2023] [Indexed: 10/21/2023] Open
Abstract
The rapid evolution of resistance in agricultural pest poses a serious threat to global food security. However, the mechanisms of resistance through metabolic regulation are largely unknown. Here, we found that a GST gene cluster was strongly selected in North China (NTC) population, and it was significantly genetically-linked to lambda-cyhalothrin resistance. Knockout of the GST cluster using CRISPR/Cas9 significantly increased the sensitivity of the knockout strain to lambda-cyhalothrin. Haplotype analysis revealed no non-synonymous mutations or structural variations in the GST cluster, whereas GST_119 and GST_121 were significantly overexpressed in the NTC population. Silencing of GST_119 or co-silencing of GST_119 and GST_121 with RNAi significantly increased larval sensitivity to lambda-cyhalothrin. We also identified additional GATAe transcription factor binding sites in the promoter of NTC_GST_119. Transient expression of GATAe in Hi5 cells activated NTC_GST_119 and Xinjiang (XJ)_GST_119 transcription, but the transcriptional activity of NTC_GST_119 was significantly higher than that of XJ_GST_119. These results demonstrate that variations in the regulatory region result in complex expression changes in the GST cluster, which enhances lambda-cyhalothrin resistance in field-populations. This study deepens our knowledge of the evolutionary mechanism of pest adaptation under environmental stress and provides potential targets for monitoring pest resistance and integrated management.
Collapse
Affiliation(s)
- Minghui Jin
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Gene Editing Technologies (Hainan), Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Yan Peng
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Gene Editing Technologies (Hainan), Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
- College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Jie Peng
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Gene Editing Technologies (Hainan), Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
- College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Huihui Zhang
- Institute of Entomology, School of Life Sciences, Central China Normal University, Wuhan, China
| | - Yinxue Shan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Gene Editing Technologies (Hainan), Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Kaiyu Liu
- Institute of Entomology, School of Life Sciences, Central China Normal University, Wuhan, China
| | - Yutao Xiao
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Gene Editing Technologies (Hainan), Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China.
| |
Collapse
|
17
|
Amin MR, Hasan M, Arnab SP, DeGiorgio M. Tensor Decomposition-based Feature Extraction and Classification to Detect Natural Selection from Genomic Data. Mol Biol Evol 2023; 40:msad216. [PMID: 37772983 PMCID: PMC10581699 DOI: 10.1093/molbev/msad216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 08/10/2023] [Accepted: 09/14/2023] [Indexed: 09/30/2023] Open
Abstract
Inferences of adaptive events are important for learning about traits, such as human digestion of lactose after infancy and the rapid spread of viral variants. Early efforts toward identifying footprints of natural selection from genomic data involved development of summary statistic and likelihood methods. However, such techniques are grounded in simple patterns or theoretical models that limit the complexity of settings they can explore. Due to the renaissance in artificial intelligence, machine learning methods have taken center stage in recent efforts to detect natural selection, with strategies such as convolutional neural networks applied to images of haplotypes. Yet, limitations of such techniques include estimation of large numbers of model parameters under nonconvex settings and feature identification without regard to location within an image. An alternative approach is to use tensor decomposition to extract features from multidimensional data although preserving the latent structure of the data, and to feed these features to machine learning models. Here, we adopt this framework and present a novel approach termed T-REx, which extracts features from images of haplotypes across sampled individuals using tensor decomposition, and then makes predictions from these features using classical machine learning methods. As a proof of concept, we explore the performance of T-REx on simulated neutral and selective sweep scenarios and find that it has high power and accuracy to discriminate sweeps from neutrality, robustness to common technical hurdles, and easy visualization of feature importance. Therefore, T-REx is a powerful addition to the toolkit for detecting adaptive processes from genomic data.
Collapse
Affiliation(s)
- Md Ruhul Amin
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Mahmudul Hasan
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Sandipan Paul Arnab
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Michael DeGiorgio
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| |
Collapse
|
18
|
Whitehouse LS, Schrider DR. Timesweeper: accurately identifying selective sweeps using population genomic time series. Genetics 2023; 224:iyad084. [PMID: 37157914 PMCID: PMC10324941 DOI: 10.1093/genetics/iyad084] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 07/25/2022] [Accepted: 04/25/2023] [Indexed: 05/10/2023] Open
Abstract
Despite decades of research, identifying selective sweeps, the genomic footprints of positive selection, remains a core problem in population genetics. Of the myriad methods that have been developed to tackle this task, few are designed to leverage the potential of genomic time-series data. This is because in most population genetic studies of natural populations, only a single period of time can be sampled. Recent advancements in sequencing technology, including improvements in extracting and sequencing ancient DNA, have made repeated samplings of a population possible, allowing for more direct analysis of recent evolutionary dynamics. Serial sampling of organisms with shorter generation times has also become more feasible due to improvements in the cost and throughput of sequencing. With these advances in mind, here we present Timesweeper, a fast and accurate convolutional neural network-based tool for identifying selective sweeps in data consisting of multiple genomic samplings of a population over time. Timesweeper analyzes population genomic time-series data by first simulating training data under a demographic model appropriate for the data of interest, training a one-dimensional convolutional neural network on said simulations, and inferring which polymorphisms in this serialized data set were the direct target of a completed or ongoing selective sweep. We show that Timesweeper is accurate under multiple simulated demographic and sampling scenarios, identifies selected variants with high resolution, and estimates selection coefficients more accurately than existing methods. In sum, we show that more accurate inferences about natural selection are possible when genomic time-series data are available; such data will continue to proliferate in coming years due to both the sequencing of ancient samples and repeated samplings of extant populations with faster generation times, as well as experimentally evolved populations where time-series data are often generated. Methodological advances such as Timesweeper thus have the potential to help resolve the controversy over the role of positive selection in the genome. We provide Timesweeper as a Python package for use by the community.
Collapse
Affiliation(s)
- Logan S Whitehouse
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27514, USA
| | - Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27514, USA
| |
Collapse
|
19
|
Arnab SP, Amin MR, DeGiorgio M. Uncovering Footprints of Natural Selection Through Spectral Analysis of Genomic Summary Statistics. Mol Biol Evol 2023; 40:msad157. [PMID: 37433019 PMCID: PMC10365025 DOI: 10.1093/molbev/msad157] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 06/28/2023] [Accepted: 07/06/2023] [Indexed: 07/13/2023] Open
Abstract
Natural selection leaves a spatial pattern along the genome, with a haplotype distribution distortion near the selected locus that fades with distance. Evaluating the spatial signal of a population-genetic summary statistic across the genome allows for patterns of natural selection to be distinguished from neutrality. Considering the genomic spatial distribution of multiple summary statistics is expected to aid in uncovering subtle signatures of selection. In recent years, numerous methods have been devised that consider genomic spatial distributions across summary statistics, utilizing both classical machine learning and deep learning architectures. However, better predictions may be attainable by improving the way in which features are extracted from these summary statistics. We apply wavelet transform, multitaper spectral analysis, and S-transform to summary statistic arrays to achieve this goal. Each analysis method converts one-dimensional summary statistic arrays to two-dimensional images of spectral analysis, allowing simultaneous temporal and spectral assessment. We feed these images into convolutional neural networks and consider combining models using ensemble stacking. Our modeling framework achieves high accuracy and power across a diverse set of evolutionary settings, including population size changes and test sets of varying sweep strength, softness, and timing. A scan of central European whole-genome sequences recapitulated well-established sweep candidates and predicted novel cancer-associated genes as sweeps with high support. Given that this modeling framework is also robust to missing genomic segments, we believe that it will represent a welcome addition to the population-genomic toolkit for learning about adaptive processes from genomic data.
Collapse
Affiliation(s)
- Sandipan Paul Arnab
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Md Ruhul Amin
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Michael DeGiorgio
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| |
Collapse
|
20
|
Amin MR, Hasan M, Arnab SP, DeGiorgio M. Tensor decomposition based feature extraction and classification to detect natural selection from genomic data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.27.527731. [PMID: 37034767 PMCID: PMC10081272 DOI: 10.1101/2023.03.27.527731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Inferences of adaptive events are important for learning about traits, such as human digestion of lactose after infancy and the rapid spread of viral variants. Early efforts toward identifying footprints of natural selection from genomic data involved development of summary statistic and likelihood methods. However, such techniques are grounded in simple patterns or theoretical models that limit the complexity of settings they can explore. Due to the renaissance in artificial intelligence, machine learning methods have taken center stage in recent efforts to detect natural selection, with strategies such as convolutional neural networks applied to images of haplotypes. Yet, limitations of such techniques include estimation of large numbers of model parameters under non-convex settings and feature identification without regard to location within an image. An alternative approach is to use tensor decomposition to extract features from multidimensional data while preserving the latent structure of the data, and to feed these features to machine learning models. Here, we adopt this framework and present a novel approach termed T-REx , which extracts features from images of haplotypes across sampled individuals using tensor decomposition, and then makes predictions from these features using classical machine learning methods. As a proof of concept, we explore the performance of T-REx on simulated neutral and selective sweep scenarios and find that it has high power and accuracy to discriminate sweeps from neutrality, robustness to common technical hurdles, and easy visualization of feature importance. Therefore, T-REx is a powerful addition to the toolkit for detecting adaptive processes from genomic data.
Collapse
|
21
|
Zhong L, Zhu Y, Olsen KM. Hard versus soft selective sweeps during domestication and improvement in soybean. Mol Ecol 2022; 31:3137-3153. [PMID: 35366022 DOI: 10.1111/mec.16454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Revised: 03/16/2022] [Accepted: 03/28/2022] [Indexed: 11/28/2022]
Abstract
Genome scans for selection can provide an efficient way to dissect the genetic basis of domestication traits and understand mechanisms of adaptation during crop evolution. Selection involving soft sweeps (simultaneous selection for multiple alleles) is probably common in plant genomes but is under-studied, and few if any studies have systematically scanned for soft sweeps in the context of crop domestication. Using genome resequencing data from 302 wild and domesticated soybean accessions, we conducted selection scans using five widely employed statistics to identify selection candidates under classical (hard) and soft sweeps. Across the genome, inferred hard sweeps are predominant in domesticated soybean landraces and improved varieties, whereas soft sweeps are more prevalent in a representative subpopulation of the wild ancestor. Six domestication-related genes, representing both hard and soft sweeps and different stages of domestication, were used as positive controls to assess the detectability of domestication-associated sweeps. Performance of various test statistics suggests that differentiation-based (FST ) methods are robust for detecting complete hard sweeps, and that LD-based strategies perform well for identifying recent/ongoing sweeps; however, none of the test statistics detected a known soft sweep we previously documented at the domestication gene Dt1. Genome scans yielded a set of 66 candidate loci that were identified by both differentiation-based and LD-based (iHH) methods; notably, this shared set overlaps with many previously identified QTLs for soybean domestication/improvement traits. Collectively, our results will help to advance genetic characterizations of soybean domestication traits and shed light on selection modes involved in adaptation in domesticated plant species.
Collapse
Affiliation(s)
- Limei Zhong
- Key Laboratory of Molecular Biology and Gene Engineering in Jiangxi, School of Life Sciences, Nanchang University, Nanchang, China
- Department of Biology, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Youlin Zhu
- Key Laboratory of Molecular Biology and Gene Engineering in Jiangxi, School of Life Sciences, Nanchang University, Nanchang, China
| | - Kenneth M Olsen
- Department of Biology, Washington University in St. Louis, St. Louis, Missouri, USA
| |
Collapse
|
22
|
DeGiorgio M, Szpiech ZA. A spatially aware likelihood test to detect sweeps from haplotype distributions. PLoS Genet 2022; 18:e1010134. [PMID: 35404934 PMCID: PMC9022890 DOI: 10.1371/journal.pgen.1010134] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 04/21/2022] [Accepted: 03/04/2022] [Indexed: 01/13/2023] Open
Abstract
The inference of positive selection in genomes is a problem of great interest in evolutionary genomics. By identifying putative regions of the genome that contain adaptive mutations, we are able to learn about the biology of organisms and their evolutionary history. Here we introduce a composite likelihood method that identifies recently completed or ongoing positive selection by searching for extreme distortions in the spatial distribution of the haplotype frequency spectrum along the genome relative to the genome-wide expectation taken as neutrality. Furthermore, the method simultaneously infers two parameters of the sweep: the number of sweeping haplotypes and the "width" of the sweep, which is related to the strength and timing of selection. We demonstrate that this method outperforms the leading haplotype-based selection statistics, though strong signals in low-recombination regions merit extra scrutiny. As a positive control, we apply it to two well-studied human populations from the 1000 Genomes Project and examine haplotype frequency spectrum patterns at the LCT and MHC loci. We also apply it to a data set of brown rats sampled in NYC and identify genes related to olfactory perception. To facilitate use of this method, we have implemented it in user-friendly open source software.
Collapse
Affiliation(s)
- Michael DeGiorgio
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, Florida, United States of America
| | - Zachary A. Szpiech
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania, United States of America
- Institute for Computational and Data Sciences, Pennsylvania State University, University Park, Pennsylvania, United States of America
| |
Collapse
|
23
|
Klassmann A, Gautier M. Detecting selection using extended haplotype homozygosity (EHH)-based statistics in unphased or unpolarized data. PLoS One 2022; 17:e0262024. [PMID: 35041674 PMCID: PMC8765611 DOI: 10.1371/journal.pone.0262024] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 12/15/2021] [Indexed: 12/19/2022] Open
Abstract
Analysis of population genetic data often includes a search for genomic regions with signs of recent positive selection. One of such approaches involves the concept of extended haplotype homozygosity (EHH) and its associated statistics. These statistics typically require phased haplotypes, and some of them necessitate polarized variants. Here, we unify and extend previously proposed modifications to loosen these requirements. We compare the modified versions with the original ones by measuring the false discovery rate in simulated whole-genome scans and by quantifying the overlap of inferred candidate regions in empirical data. We find that phasing information is indispensable for accurate estimation of within-population statistics (for all but very large samples) and of cross-population statistics for small samples. Ancestry information, in contrast, is of lesser importance for both types of statistic. Our publicly available R package rehh incorporates the modified statistics presented here.
Collapse
Affiliation(s)
| | - Mathieu Gautier
- CBGP, Univ Montpellier, CIRAD, INRAE, IRD, Institut Agro, Montpellier, France
| |
Collapse
|
24
|
Qiu J, Zhou Q, Ye W, Chen Q, Bao YJ. SweepCluster: A SNP clustering tool for detecting gene-specific sweeps in prokaryotes. BMC Bioinformatics 2022; 23:19. [PMID: 34991447 PMCID: PMC8734265 DOI: 10.1186/s12859-021-04533-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 12/13/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The gene-specific sweep is a selection process where an advantageous mutation along with the nearby neutral sites in a gene region increases the frequency in the population. It has been demonstrated to play important roles in ecological differentiation or phenotypic divergence in microbial populations. Therefore, identifying gene-specific sweeps in microorganisms will not only provide insights into the evolutionary mechanisms, but also unravel potential genetic markers associated with biological phenotypes. However, current methods were mainly developed for detecting selective sweeps in eukaryotic data of sparse genotypes and are not readily applicable to prokaryotic data. Furthermore, some challenges have not been sufficiently addressed by the methods, such as the low spatial resolution of sweep regions and lack of consideration of the spatial distribution of mutations. RESULTS We proposed a novel gene-centric and spatial-aware approach for identifying gene-specific sweeps in prokaryotes and implemented it in a python tool SweepCluster. Our method searches for gene regions with a high level of spatial clustering of pre-selected polymorphisms in genotype datasets assuming a null distribution model of neutral selection. The pre-selection of polymorphisms is based on their genetic signatures, such as elevated population subdivision, excessive linkage disequilibrium, or significant phenotype association. Performance evaluation using simulation data showed that the sensitivity and specificity of the clustering algorithm in SweepCluster is above 90%. The application of SweepCluster in two real datasets from the bacteria Streptococcus pyogenes and Streptococcus suis showed that the impact of pre-selection was dramatic and significantly reduced the uninformative signals. We validated our method using the genotype data from Vibrio cyclitrophicus, the only available dataset of gene-specific sweeps in bacteria, and obtained a concordance rate of 78%. We noted that the concordance rate could be underestimated due to distinct reference genomes and clustering strategies. The application to the human genotype datasets showed that SweepCluster is also applicable to eukaryotic data and is able to recover 80% of a catalog of known sweep regions. CONCLUSION SweepCluster is applicable to a broad category of datasets. It will be valuable for detecting gene-specific sweeps in diverse genotypic data and provide novel insights on adaptive evolution.
Collapse
Affiliation(s)
- Junhui Qiu
- State Key Laboratory of Biocatalysis and Enzyme Engineering, Hubei Collaborative Innovation Center for Green Transformation of Bio-Resources, Hubei Key Laboratory of Industrial Biotechnology, School of Life Sciences, Hubei University, Wuhan, 430062, China
| | - Qi Zhou
- State Key Laboratory of Biocatalysis and Enzyme Engineering, Hubei Collaborative Innovation Center for Green Transformation of Bio-Resources, Hubei Key Laboratory of Industrial Biotechnology, School of Life Sciences, Hubei University, Wuhan, 430062, China
| | - Weicai Ye
- School of Computer Science and Engineering, Guangdong Province Key Laboratory of Computational Science, and National Engineering Laboratory for Big Data Analysis and Application, Sun Yat-Sen University, Guangzhou, 510275, China
| | - Qianjun Chen
- State Key Laboratory of Biocatalysis and Enzyme Engineering, Hubei Collaborative Innovation Center for Green Transformation of Bio-Resources, Hubei Key Laboratory of Industrial Biotechnology, School of Life Sciences, Hubei University, Wuhan, 430062, China.
| | - Yun-Juan Bao
- State Key Laboratory of Biocatalysis and Enzyme Engineering, Hubei Collaborative Innovation Center for Green Transformation of Bio-Resources, Hubei Key Laboratory of Industrial Biotechnology, School of Life Sciences, Hubei University, Wuhan, 430062, China.
| |
Collapse
|
25
|
Bemmels JB, Mikkelsen EK, Haddrath O, Colbourne RM, Robertson HA, Weir JT. Demographic decline and lineage-specific adaptations characterize New Zealand kiwi. Proc Biol Sci 2021; 288:20212362. [PMID: 34905706 PMCID: PMC8670953 DOI: 10.1098/rspb.2021.2362] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 11/19/2021] [Indexed: 12/24/2022] Open
Abstract
Small and fragmented populations may become rapidly differentiated due to genetic drift, making it difficult to distinguish whether neutral genetic structure is a signature of recent demographic events, or of long-term evolutionary processes that could have allowed populations to adaptively diverge. We sequenced 52 whole genomes to examine Holocene demographic history and patterns of adaptation in kiwi (Apteryx), and recovered 11 strongly differentiated genetic clusters corresponding to previously recognized lineages. Demographic models suggest that all 11 lineages experienced dramatic population crashes relative to early- or mid-Holocene levels. Small population size is associated with low genetic diversity and elevated genetic differentiation (FST), suggesting that population declines have strengthened genetic structure and led to the loss of genetic diversity. However, population size is not correlated with inbreeding rates. Eight lineages show signatures of lineage-specific selective sweeps (284 sweeps total) that are unlikely to have been caused by demographic stochasticity. Overall, these results suggest that despite strong genetic drift associated with recent bottlenecks, most kiwi lineages possess unique adaptations and should be recognized as separate adaptive units in conservation contexts. Our work highlights how whole-genome datasets can address longstanding uncertainty about the evolutionary and conservation significance of small and fragmented populations of threatened species.
Collapse
Affiliation(s)
- Jordan B. Bemmels
- Department of Biological Sciences, University of Toronto Scarborough, Toronto, Canada ON M1C 1A4
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Canada ON M5S 3B2
| | - Else K. Mikkelsen
- Department of Biological Sciences, University of Toronto Scarborough, Toronto, Canada ON M1C 1A4
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Canada ON M5S 3B2
| | - Oliver Haddrath
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Canada ON M5S 3B2
- Department of Natural History, Royal Ontario Museum, Toronto, Canada ON M5S 2C6
| | | | | | - Jason T. Weir
- Department of Biological Sciences, University of Toronto Scarborough, Toronto, Canada ON M1C 1A4
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Canada ON M5S 3B2
- Department of Natural History, Royal Ontario Museum, Toronto, Canada ON M5S 2C6
| |
Collapse
|
26
|
Mughal MR, Koch H, Huang J, Chiaromonte F, DeGiorgio M. Learning the properties of adaptive regions with functional data analysis. PLoS Genet 2020; 16:e1008896. [PMID: 32853200 PMCID: PMC7480868 DOI: 10.1371/journal.pgen.1008896] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Revised: 09/09/2020] [Accepted: 05/29/2020] [Indexed: 12/12/2022] Open
Abstract
Identifying regions of positive selection in genomic data remains a challenge in population genetics. Most current approaches rely on comparing values of summary statistics calculated in windows. We present an approach termed SURFDAWave, which translates measures of genetic diversity calculated in genomic windows to functional data. By transforming our discrete data points to be outputs of continuous functions defined over genomic space, we are able to learn the features of these functions that signify selection. This enables us to confidently identify complex modes of natural selection, including adaptive introgression. We are also able to predict important selection parameters that are responsible for shaping the inferred selection events. By applying our model to human population-genomic data, we recapitulate previously identified regions of selective sweeps, such as OCA2 in Europeans, and predict that its beneficial mutation reached a frequency of 0.02 before it swept 1,802 generations ago, a time when humans were relatively new to Europe. In addition, we identify BNC2 in Europeans as a target of adaptive introgression, and predict that it harbors a beneficial mutation that arose in an archaic human population that split from modern humans within the hypothesized modern human-Neanderthal divergence range.
Collapse
Affiliation(s)
- Mehreen R. Mughal
- Bioinformatics and Genomics at the Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Hillary Koch
- Department of Statistics, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Jinguo Huang
- Bioinformatics and Genomics at the Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Francesca Chiaromonte
- Department of Statistics, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Michael DeGiorgio
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, Florida, United States of America
| |
Collapse
|
27
|
Harris AM, DeGiorgio M. Identifying and Classifying Shared Selective Sweeps from Multilocus Data. Genetics 2020; 215:143-171. [PMID: 32152048 PMCID: PMC7198270 DOI: 10.1534/genetics.120.303137] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Accepted: 02/29/2020] [Indexed: 11/18/2022] Open
Abstract
Positive selection causes beneficial alleles to rise to high frequency, resulting in a selective sweep of the diversity surrounding the selected sites. Accordingly, the signature of a selective sweep in an ancestral population may still remain in its descendants. Identifying signatures of selection in the ancestor that are shared among its descendants is important to contextualize the timing of a sweep, but few methods exist for this purpose. We introduce the statistic SS-H12, which can identify genomic regions under shared positive selection across populations and is based on the theory of the expected haplotype homozygosity statistic H12, which detects recent hard and soft sweeps from the presence of high-frequency haplotypes. SS-H12 is distinct from comparable statistics because it requires a minimum of only two populations, and properly identifies and differentiates between independent convergent sweeps and true ancestral sweeps, with high power and robustness to a variety of demographic models. Furthermore, we can apply SS-H12 in conjunction with the ratio of statistics we term [Formula: see text] and [Formula: see text] to further classify identified shared sweeps as hard or soft. Finally, we identified both previously reported and novel shared sweep candidates from human whole-genome sequences. Previously reported candidates include the well-characterized ancestral sweeps at LCT and SLC24A5 in Indo-Europeans, as well as GPHN worldwide. Novel candidates include an ancestral sweep at RGS18 in sub-Saharan Africans involved in regulating the platelet response and implicated in sudden cardiac death, and a convergent sweep at C2CD5 between European and East Asian populations that may explain their different insulin responses.
Collapse
Affiliation(s)
- Alexandre M Harris
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802
- Molecular, Cellular, and Integrative Biosciences at the Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802
| | - Michael DeGiorgio
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, Florida 33431
| |
Collapse
|