1
|
Shou W, Zhang C, Wang Y, Wang H, Guo L, Li L, Zhang T, Huang W, Shi J. Contrastive sequence signatures between the both sides of a recombination spot reveal an adaptation at PPARD locus from standing variation for pleiotropy since out-of-Africa dispersal. BMC Genomics 2025; 26:427. [PMID: 40307732 PMCID: PMC12042533 DOI: 10.1186/s12864-025-11620-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2025] [Accepted: 04/21/2025] [Indexed: 05/02/2025] Open
Abstract
BACKGROUND Drug metabolism and transporter genes are a specialized class of genes involved in absorption, distribution, metabolism and excretion. They easily present distinct genetic population differentiation and are vulnerable to natural selection. RESULTS We initiated a study using a special panel of informative genetic markers in such genes and dissected the genetic structure in representative Chinese and worldwide populations. A distinctive sub-population stratification was discovered in extensive Eurasians and resulted from divergence at the PPARD locus. The contrastive sequence signatures between the both sides of a recombination spot prove a selective sweep on this locus for genetic hitchhiking effect. A genealogy-based framework demonstrates the positive selection acting from standing variation exerted a moderate pressure in Eurasians, and drove the adaptive allele up to a high frequency. The timing and tempo estimations for the genetic adaptation indicate its onset coincided with the early out-of-Africa migration of modern humans and it lasted over a prolonged evolutionary history. A phenome-wide association analysis reveals an extended cis-regulation on the local gene expression and the pleiotropy implicated in a variety of complex traits. The colocalization analyses between the genetic associations from cis-acting gene expression and complex traits signify the most likely selective pressure from physical capacity, energy metabolism, and immune-related involvement, and provide prioritization for the effective genes and casual variants. CONCLUSIONS This work has laid a foundation for following efforts to make full sense of the biological mechanisms underlying the genetic adaptation.
Collapse
Affiliation(s)
- Weihua Shou
- Yunnan Key Laboratory of Children's Major Disease Research, Yunnan Institute of Pediatrics, Kunming Children's Hospital, 288 Qianxing Road, Kunming, Yunnan, 650228, P.R. China.
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Shanghai Institute for Biomedical and Pharmaceutical Technologies (SIBPT), 2140 Xietu Road, Shanghai, 200032, P.R. China.
| | - Chenhui Zhang
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Shanghai Institute for Biomedical and Pharmaceutical Technologies (SIBPT), 2140 Xietu Road, Shanghai, 200032, P.R. China
| | - Ying Wang
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Shanghai Institute for Biomedical and Pharmaceutical Technologies (SIBPT), 2140 Xietu Road, Shanghai, 200032, P.R. China
| | - Haifeng Wang
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Shanghai Institute for Biomedical and Pharmaceutical Technologies (SIBPT), 2140 Xietu Road, Shanghai, 200032, P.R. China
| | - Lei Guo
- Yunnan Key Laboratory of Children's Major Disease Research, Yunnan Institute of Pediatrics, Kunming Children's Hospital, 288 Qianxing Road, Kunming, Yunnan, 650228, P.R. China
| | - Li Li
- Yunnan Key Laboratory of Children's Major Disease Research, Yunnan Institute of Pediatrics, Kunming Children's Hospital, 288 Qianxing Road, Kunming, Yunnan, 650228, P.R. China
| | - Tiesong Zhang
- Yunnan Key Laboratory of Children's Major Disease Research, Yunnan Institute of Pediatrics, Kunming Children's Hospital, 288 Qianxing Road, Kunming, Yunnan, 650228, P.R. China.
| | - Wei Huang
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Shanghai Institute for Biomedical and Pharmaceutical Technologies (SIBPT), 2140 Xietu Road, Shanghai, 200032, P.R. China
| | - Jinxiu Shi
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Shanghai Institute for Biomedical and Pharmaceutical Technologies (SIBPT), 2140 Xietu Road, Shanghai, 200032, P.R. China.
| |
Collapse
|
2
|
Ewers C, Brandis D, da Silva N, Hayer S, Immel A, Moesges Z, Susat J, Torres-Oliva M, Krause-Kyora B. Museomics of an extinct European flat oyster population. Sci Rep 2025; 15:13906. [PMID: 40263463 PMCID: PMC12015263 DOI: 10.1038/s41598-025-96743-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 03/31/2025] [Indexed: 04/24/2025] Open
Abstract
Understanding the factors that predispose species and populations to decline and extinction is a major challenge of biodiversity research. In the present study, we investigated the historical population genomics of an extinct population of the European oyster (Ostrea edulis L.) from the Wadden Sea collected between 1868 and 1888, and compared it to French and English populations sampled at the same time. Our museomic results indicate that the now-extinct population was genetically isolated from the French and English populations and showed signs of local adaptation in the form of Fst outlier loci between the Wadden Sea and the other two populations. Thus the Wadden Sea oysters may have been predisposed for extinction because they were not naturally replenished from other populations. A comparison of population-wide genomic diversity may hint towards a sudden population contraction of the Wadden Sea population, possibly being the result of stronger - or earlier - population decline in this population than in the others. In summary, our historical population genomic exploration hints at some potential causes of population decline in flat oysters from the Wadden Sea, which might have led to their extinction.
Collapse
Affiliation(s)
- Christine Ewers
- Zoological Museum, Kiel University, Hegewischstraße 3, 24105, Kiel, Germany.
| | - Dirk Brandis
- Zoological Museum, Kiel University, Hegewischstraße 3, 24105, Kiel, Germany
| | - Nicolas da Silva
- Institute of Clinical Molecular Biology, Kiel University, Rosalind-Franklin-Straße 12, 24105, Kiel, Germany
| | - Sarah Hayer
- Zoological Museum, Kiel University, Hegewischstraße 3, 24105, Kiel, Germany
| | - Alex Immel
- Institute of Clinical Molecular Biology, Kiel University, Rosalind-Franklin-Straße 12, 24105, Kiel, Germany
| | - Zoe Moesges
- Zoological Museum, Kiel University, Hegewischstraße 3, 24105, Kiel, Germany
| | - Julian Susat
- Institute of Clinical Molecular Biology, Kiel University, Rosalind-Franklin-Straße 12, 24105, Kiel, Germany
| | - Montserrat Torres-Oliva
- Institute of Clinical Molecular Biology, Kiel University, Rosalind-Franklin-Straße 12, 24105, Kiel, Germany
| | - Ben Krause-Kyora
- Institute of Clinical Molecular Biology, Kiel University, Rosalind-Franklin-Straße 12, 24105, Kiel, Germany
| |
Collapse
|
3
|
Silva‐Arias GA, Gagnon E, Hembrom S, Fastner A, Khan MR, Stam R, Tellier A. Patterns of presence-absence variation of NLRs across populations of Solanum chilense are clade-dependent and mainly shaped by past demographic history. THE NEW PHYTOLOGIST 2025; 245:1718-1732. [PMID: 39582196 PMCID: PMC11754929 DOI: 10.1111/nph.20293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Accepted: 10/31/2024] [Indexed: 11/26/2024]
Abstract
Understanding the evolution of pathogen resistance genes (nucleotide-binding site-leucine-rich repeats, NLRs) within a species requires a comprehensive examination of factors that affect gene loss and gain. We present a new reference genome of Solanum chilense, which leads to an increased number and more accurate annotation of NLRs. Using a target capture approach, we quantify the presence-absence variation (PAV) of NLR loci across 20 populations from different habitats. We build a rigorous pipeline to validate the identification of PAV of NLRs and then show that PAV is larger within populations than between populations, suggesting that maintenance of NLR diversity is linked to population dynamics. The amount of PAV appears not to be correlated with the NLR presence in gene clusters in the genome, but rather with the past demographic history of the species, with loss of NLRs in diverging (smaller) populations at the distribution edges. Finally, using a redundancy analysis, we find limited evidence of PAV being linked to environmental gradients. Our results suggest that random processes (genetic drift and demography) and weak positive selection for local adaptation shape the evolution of NLRs at the single nucleotide polymorphism and PAV levels in an outcrossing plant with high nucleotide diversity.
Collapse
Affiliation(s)
- Gustavo A. Silva‐Arias
- Professorship for Population Genetics, TUM School of Life SciencesTechnical University of MunichLiesel‐Beckmann Strasse 2Freising85354Germany
- Facultad de Ciencias, Instituto de Ciencias NaturalesUniversidad Nacional de Colombia ‐ Sede Bogotá, Ciudad UniversitariaBogotá111321Colombia
| | - Edeline Gagnon
- Department of Integrative Biology, College of Biological ScienceUniversity of Guelph50 Stone Road EastGuelphONN1G 2W1Canada
- Chair of Phytopathology, TUM School of Life SciencesTechnical University of MunichEmil‐Ramman‐St. 2Freising85354Germany
- Faculty of Agricultural and Nutritional Sciences, Department of Phytopathology and Crop Protection, Institute of PhytopathologyChristian Albrechts UniversityHermann Rodewald Str 9Kiel24118Germany
| | - Surya Hembrom
- Professorship for Population Genetics, TUM School of Life SciencesTechnical University of MunichLiesel‐Beckmann Strasse 2Freising85354Germany
| | - Alexander Fastner
- Faculty of Agricultural and Nutritional Sciences, Department of Phytopathology and Crop Protection, Institute of PhytopathologyChristian Albrechts UniversityHermann Rodewald Str 9Kiel24118Germany
| | - Muhammad Ramzan Khan
- National Institute for Genomics and Advanced BiotechnologyNational Agricultural Research CentrePark Rd, Islamabad Capital TerritoryIslamabadPakistan
- PARC Institute for Advanced Studies in AgricultureNARCPark Rd, Islamabad Capital TerritoryIslamabadPakistan
| | - Remco Stam
- Chair of Phytopathology, TUM School of Life SciencesTechnical University of MunichEmil‐Ramman‐St. 2Freising85354Germany
- Faculty of Agricultural and Nutritional Sciences, Department of Phytopathology and Crop Protection, Institute of PhytopathologyChristian Albrechts UniversityHermann Rodewald Str 9Kiel24118Germany
| | - Aurélien Tellier
- Professorship for Population Genetics, TUM School of Life SciencesTechnical University of MunichLiesel‐Beckmann Strasse 2Freising85354Germany
| |
Collapse
|
4
|
Carvajal-Rodríguez A. iHDSel software: The price equation and the population stability index to detect genomic patterns compatible with selective sweeps. An example with SARS-CoV-2. Biol Methods Protoc 2024; 9:bpae089. [PMID: 39679303 PMCID: PMC11646571 DOI: 10.1093/biomethods/bpae089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2024] [Revised: 11/19/2024] [Accepted: 11/25/2024] [Indexed: 12/17/2024] Open
Abstract
A large number of methods have been developed and continue to evolve for detecting the signatures of selective sweeps in genomes. Significant advances have been made, including the combination of different statistical strategies and the incorporation of artificial intelligence (machine learning) methods. Despite these advances, several common problems persist, such as the unknown null distribution of the statistics used, necessitating simulations and resampling to assign significance to the statistics. Additionally, it is not always clear how deviations from the specific assumptions of each method might affect the results. In this work, allelic classes of haplotypes are used along with the informational interpretation of the Price equation to design a statistic with a known distribution that can detect genomic patterns caused by selective sweeps. The statistic consists of Jeffreys divergence, also known as the population stability index, applied to the distribution of allelic classes of haplotypes in two samples. Results with simulated data show optimal performance of the statistic in detecting divergent selection. Analysis of real severe acute respiratory syndrome coronavirus 2 genome data also shows that some of the sites playing key roles in the virus's fitness and immune escape capability are detected by the method. The new statistic, called JHAC , is incorporated into the iHDSel (informed HacDivSel) software available at https://acraaj.webs.uvigo.es/iHDSel.html.
Collapse
Affiliation(s)
- Antonio Carvajal-Rodríguez
- Centro de Investigación Mariña (CIM), Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, Vigo, 36310 Spain
| |
Collapse
|
5
|
Soni V, Terbot JW, Jensen JD. Population genetic considerations regarding the interpretation of within-patient SARS-CoV-2 polymorphism data. Nat Commun 2024; 15:3240. [PMID: 38627371 PMCID: PMC11021480 DOI: 10.1038/s41467-024-46261-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 01/29/2024] [Indexed: 04/19/2024] Open
Affiliation(s)
- Vivak Soni
- Center for Evolution & Medicine, Arizona State University, School of Life Sciences, Tempe, AZ, USA
| | - John W Terbot
- Center for Evolution & Medicine, Arizona State University, School of Life Sciences, Tempe, AZ, USA
- Division of Biological Sciences, University of Montana, Missoula, MT, USA
| | - Jeffrey D Jensen
- Center for Evolution & Medicine, Arizona State University, School of Life Sciences, Tempe, AZ, USA.
| |
Collapse
|
6
|
Vellnow N, Gossmann TI, Waxman D. The pseudoentropy of allele frequency trajectories, the persistence of variation, and the effective population size. Biosystems 2024; 238:105176. [PMID: 38479654 DOI: 10.1016/j.biosystems.2024.105176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 03/01/2024] [Accepted: 03/01/2024] [Indexed: 03/24/2024]
Abstract
To concisely describe how genetic variation, at individual loci or across whole genomes, changes over time, and to follow transitory allelic changes, we introduce a quantity related to entropy, that we term pseudoentropy. This quantity emerges in a diffusion analysis of the mean time a mutation segregates in a population. For a neutral locus with an arbitrary number of alleles, the mean time of segregation is generally proportional to the pseudoentropy of initial allele frequencies. After the initial time point, pseudoentropy generally decreases, but other behaviours are possible, depending on the genetic diversity and selective forces present. For a biallelic locus, pseudoentropy and entropy coincide, but they are distinct quantities with more than two alleles. Thus for populations with multiple biallelic loci, the language of entropy suffices. Then entropy, combined across loci, serves as a concise description of genetic variation. We used individual based simulations to explore how this entropy behaves under different evolutionary scenarios. In agreement with predictions, the entropy associated with unlinked neutral loci decreases over time. However, deviations from free recombination and neutrality have clear and informative effects on the entropy's behaviour over time. Analysis of publicly available data of a natural D. melanogaster population, that had been sampled over seven years, using a sliding-window approach, yielded considerable variation in entropy trajectories of different genomic regions. These mostly follow a pattern that suggests a substantial effective population size and a limited effect of positive selection on genome-wide diversity over short time scales.
Collapse
Affiliation(s)
- Nikolas Vellnow
- TU Dortmund University, Computational Systems Biology, Faculty of Biochemical and Chemical Engineering, Emil-Figge-Str. 66, 44227 Dortmund, Germany.
| | - Toni I Gossmann
- TU Dortmund University, Computational Systems Biology, Faculty of Biochemical and Chemical Engineering, Emil-Figge-Str. 66, 44227 Dortmund, Germany.
| | - David Waxman
- Fudan University, Centre for Computational Systems Biology, ISTBI, 220 Handan Road, Shanghai 200433, People's Republic of China.
| |
Collapse
|
7
|
Schrider DR. Allelic gene conversion softens selective sweeps. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.05.570141. [PMID: 38106127 PMCID: PMC10723294 DOI: 10.1101/2023.12.05.570141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
The prominence of positive selection, in which beneficial mutations are favored by natural selection and rapidly increase in frequency, is a subject of intense debate. Positive selection can result in selective sweeps, in which the haplotype(s) bearing the adaptive allele "sweep" through the population, thereby removing much of the genetic diversity from the region surrounding the target of selection. Two models of selective sweeps have been proposed: classical sweeps, or "hard sweeps", in which a single copy of the adaptive allele sweeps to fixation, and "soft sweeps", in which multiple distinct copies of the adaptive allele leave descendants after the sweep. Soft sweeps can be the outcome of recurrent mutation to the adaptive allele, or the presence of standing genetic variation consisting of multiple copies of the adaptive allele prior to the onset of selection. Importantly, soft sweeps will be common when populations can rapidly adapt to novel selective pressures, either because of a high mutation rate or because adaptive alleles are already present. The prevalence of soft sweeps is especially controversial, and it has been noted that selection on standing variation or recurrent mutations may not always produce soft sweeps. Here, we show that the inverse is true: selection on single-origin de novo mutations may often result in an outcome that is indistinguishable from a soft sweep. This is made possible by allelic gene conversion, which "softens" hard sweeps by copying the adaptive allele onto multiple genetic backgrounds, a process we refer to as a "pseudo-soft" sweep. We carried out a simulation study examining the impact of gene conversion on sweeps from a single de novo variant in models of human, Drosophila, and Arabidopsis populations. The fraction of simulations in which gene conversion had produced multiple haplotypes with the adaptive allele upon fixation was appreciable. Indeed, under realistic demographic histories and gene conversion rates, even if selection always acts on a single-origin mutation, sweeps involving multiple haplotypes are more likely than hard sweeps in large populations, especially when selection is not extremely strong. Thus, even when the mutation rate is low or there is no standing variation, hard sweeps are expected to be the exception rather than the rule in large populations. These results also imply that the presence of signatures of soft sweeps does not necessarily mean that adaptation has been especially rapid or is not mutation limited.
Collapse
Affiliation(s)
- Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599
| |
Collapse
|
8
|
Terbot JW, Cooper BS, Good JM, Jensen JD. A Simulation Framework for Modeling the Within-Patient Evolutionary Dynamics of SARS-CoV-2. Genome Biol Evol 2023; 15:evad204. [PMID: 37950882 PMCID: PMC10664409 DOI: 10.1093/gbe/evad204] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 10/31/2023] [Accepted: 11/07/2023] [Indexed: 11/13/2023] Open
Abstract
The global impact of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has led to considerable interest in detecting novel beneficial mutations and other genomic changes that may signal the development of variants of concern (VOCs). The ability to accurately detect these changes within individual patient samples is important in enabling early detection of VOCs. Such genomic scans for rarely acting positive selection are best performed via comparison of empirical data with simulated data wherein commonly acting evolutionary factors, including mutation and recombination, reproductive and infection dynamics, and purifying and background selection, can be carefully accounted for and parameterized. Although there has been work to quantify these factors in SARS-CoV-2, they have yet to be integrated into a baseline model describing intrahost evolutionary dynamics. To construct such a baseline model, we develop a simulation framework that enables one to establish expectations for underlying levels and patterns of patient-level variation. By varying eight key parameters, we evaluated 12,096 different model-parameter combinations and compared them with existing empirical data. Of these, 592 models (∼5%) were plausible based on the resulting mean expected number of segregating variants. These plausible models shared several commonalities shedding light on intrahost SARS-CoV-2 evolutionary dynamics: severe infection bottlenecks, low levels of reproductive skew, and a distribution of fitness effects skewed toward strongly deleterious mutations. We also describe important areas of model uncertainty and highlight additional sequence data that may help to further refine a baseline model. This study lays the groundwork for the improved analysis of existing and future SARS-CoV-2 within-patient data.
Collapse
Affiliation(s)
- John W Terbot
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, Arizona, USA
- Division of Biological Sciences, University of Montana, Missoula, Montana, USA
| | - Brandon S Cooper
- Division of Biological Sciences, University of Montana, Missoula, Montana, USA
| | - Jeffrey M Good
- Division of Biological Sciences, University of Montana, Missoula, Montana, USA
| | - Jeffrey D Jensen
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, Arizona, USA
| |
Collapse
|
9
|
Soni V, Johri P, Jensen JD. Evaluating power to detect recurrent selective sweeps under increasingly realistic evolutionary null models. Evolution 2023; 77:2113-2127. [PMID: 37395482 PMCID: PMC10547124 DOI: 10.1093/evolut/qpad120] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 06/15/2023] [Accepted: 06/30/2023] [Indexed: 07/04/2023]
Abstract
The detection of selective sweeps from population genomic data often relies on the premise that the beneficial mutations in question have fixed very near the sampling time. As it has been previously shown that the power to detect a selective sweep is strongly dependent on the time since fixation as well as the strength of selection, it is naturally the case that strong, recent sweeps leave the strongest signatures. However, the biological reality is that beneficial mutations enter populations at a rate, one that partially determines the mean wait time between sweep events and hence their age distribution. An important question thus remains about the power to detect recurrent selective sweeps when they are modeled by a realistic mutation rate and as part of a realistic distribution of fitness effects, as opposed to a single, recent, isolated event on a purely neutral background as is more commonly modeled. Here we use forward-in-time simulations to study the performance of commonly used sweep statistics, within the context of more realistic evolutionary baseline models incorporating purifying and background selection, population size change, and mutation and recombination rate heterogeneity. Results demonstrate the important interplay of these processes, necessitating caution when interpreting selection scans; specifically, false-positive rates are in excess of true-positive across much of the evaluated parameter space, and selective sweeps are often undetectable unless the strength of selection is exceptionally strong.
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| |
Collapse
|
10
|
Terbot JW, Cooper BS, Good JM, Jensen JD. A simulation framework for modeling the within-patient evolutionary dynamics of SARS-CoV-2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.13.548462. [PMID: 37503016 PMCID: PMC10370031 DOI: 10.1101/2023.07.13.548462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
The global impact of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has led to considerable interest in detecting novel beneficial mutations and other genomic changes that may signal the development of variants of concern (VOCs). The ability to accurately detect these changes within individual patient samples is important in enabling early detection of VOCs. Such genomic scans for positive selection are best performed via comparison of empirical data to simulated data wherein evolutionary factors, including mutation and recombination rates, reproductive and infection dynamics, and purifying and background selection, can be carefully accounted for and parameterized. While there has been work to quantify these factors in SARS-CoV-2, they have yet to be integrated into a baseline model describing intra-host evolutionary dynamics. To construct such a baseline model, we develop a simulation framework that enables one to establish expectations for underlying levels and patterns of patient-level variation. By varying eight key parameters, we evaluated 12,096 different model-parameter combinations and compared them to existing empirical data. Of these, 592 models (~5%) were plausible based on the resulting mean expected number of segregating variants. These plausible models shared several commonalities shedding light on intra-host SARS-CoV-2 evolutionary dynamics: severe infection bottlenecks, low levels of reproductive skew, and a distribution of fitness effects skewed towards strongly deleterious mutations. We also describe important areas of model uncertainty and highlight additional sequence data that may help to further refine a baseline model. This study lays the groundwork for the improved analysis of existing and future SARS-CoV-2 within-patient data.
Collapse
Affiliation(s)
- John W Terbot
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine, Tempe, Arizona, United States of America
- University of Montana, Division of Biological Sciences, Missoula, Montana, United States of America
| | - Brandon S. Cooper
- University of Montana, Division of Biological Sciences, Missoula, Montana, United States of America
| | - Jeffrey M. Good
- University of Montana, Division of Biological Sciences, Missoula, Montana, United States of America
| | - Jeffrey D. Jensen
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine, Tempe, Arizona, United States of America
| |
Collapse
|
11
|
Soni V, Johri P, Jensen JD. Evaluating power to detect recurrent selective sweeps under increasingly realistic evolutionary null models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.15.545166. [PMID: 37398347 PMCID: PMC10312679 DOI: 10.1101/2023.06.15.545166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
The detection of selective sweeps from population genomic data often relies on the premise that the beneficial mutations in question have fixed very near the sampling time. As it has been previously shown that the power to detect a selective sweep is strongly dependent on the time since fixation as well as the strength of selection, it is naturally the case that strong, recent sweeps leave the strongest signatures. However, the biological reality is that beneficial mutations enter populations at a rate, one that partially determines the mean wait time between sweep events and hence their age distribution. An important question thus remains about the power to detect recurrent selective sweeps when they are modelled by a realistic mutation rate and as part of a realistic distribution of fitness effects (DFE), as opposed to a single, recent, isolated event on a purely neutral background as is more commonly modelled. Here we use forward-in-time simulations to study the performance of commonly used sweep statistics, within the context of more realistic evolutionary baseline models incorporating purifying and background selection, population size change, and mutation and recombination rate heterogeneity. Results demonstrate the important interplay of these processes, necessitating caution when interpreting selection scans; specifically, false positive rates are in excess of true positive across much of the evaluated parameter space, and selective sweeps are often undetectable unless the strength of selection is exceptionally strong. Teaser Text Outlier-based genomic scans have proven a popular approach for identifying loci that have potentially experienced recent positive selection. However, it has previously been shown that an evolutionarily appropriate baseline model that incorporates non-equilibrium population histories, purifying and background selection, and variation in mutation and recombination rates is necessary to reduce often extreme false positive rates when performing genomic scans. Here we evaluate the power to detect recurrent selective sweeps using common SFS-based and haplotype-based methods under these increasingly realistic models. We find that while these appropriate evolutionary baselines are essential to reduce false positive rates, the power to accurately detect recurrent selective sweeps is generally low across much of the biologically relevant parameter space.
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
- Present address: Department of Biology, Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | | |
Collapse
|
12
|
Johri P, Pfeifer SP, Jensen JD. Developing an evolutionary baseline model for humans: jointly inferring purifying selection with population history. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.11.536488. [PMID: 37090533 PMCID: PMC10120674 DOI: 10.1101/2023.04.11.536488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
Building evolutionarily appropriate baseline models for natural populations is not only important for answering fundamental questions in population genetics - including quantifying the relative contributions of adaptive vs. non-adaptive processes - but it is also essential for identifying candidate loci experiencing relatively rare and episodic forms of selection ( e.g., positive or balancing selection). Here, a baseline model was developed for a human population of West African ancestry, the Yoruba, comprising processes constantly operating on the genome ( i.e. , purifying and background selection, population size changes, recombination rate heterogeneity, and gene conversion). Specifically, to perform joint inference of selective effects with demography, an approximate Bayesian approach was employed that utilizes the decay of background selection effects around functional elements, taking into account genomic architecture. This approach inferred a recent 6-fold population growth together with a distribution of fitness effects that is skewed towards effectively neutral mutations. Importantly, these results further suggest that, while strong and/or frequent recurrent positive selection is inconsistent with observed data, weak to moderate positive selection is consistent but unidentifiable if rare.
Collapse
|
13
|
Terbot JW, Johri P, Liphardt SW, Soni V, Pfeifer SP, Cooper BS, Good JM, Jensen JD. Developing an appropriate evolutionary baseline model for the study of SARS-CoV-2 patient samples. PLoS Pathog 2023; 19:e1011265. [PMID: 37018331 PMCID: PMC10075409 DOI: 10.1371/journal.ppat.1011265] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2023] Open
Abstract
Over the past 3 years, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has spread through human populations in several waves, resulting in a global health crisis. In response, genomic surveillance efforts have proliferated in the hopes of tracking and anticipating the evolution of this virus, resulting in millions of patient isolates now being available in public databases. Yet, while there is a tremendous focus on identifying newly emerging adaptive viral variants, this quantification is far from trivial. Specifically, multiple co-occurring and interacting evolutionary processes are constantly in operation and must be jointly considered and modeled in order to perform accurate inference. We here outline critical individual components of such an evolutionary baseline model-mutation rates, recombination rates, the distribution of fitness effects, infection dynamics, and compartmentalization-and describe the current state of knowledge pertaining to the related parameters of each in SARS-CoV-2. We close with a series of recommendations for future clinical sampling, model construction, and statistical analysis.
Collapse
Affiliation(s)
- John W Terbot
- University of Montana, Division of Biological Sciences, Missoula, Montana, United States of America
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine, Tempe, Arizona, United States of America
| | - Parul Johri
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine, Tempe, Arizona, United States of America
| | - Schuyler W Liphardt
- University of Montana, Division of Biological Sciences, Missoula, Montana, United States of America
| | - Vivak Soni
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine, Tempe, Arizona, United States of America
| | - Susanne P Pfeifer
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine, Tempe, Arizona, United States of America
| | - Brandon S Cooper
- University of Montana, Division of Biological Sciences, Missoula, Montana, United States of America
| | - Jeffrey M Good
- University of Montana, Division of Biological Sciences, Missoula, Montana, United States of America
| | - Jeffrey D Jensen
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine, Tempe, Arizona, United States of America
| |
Collapse
|
14
|
Jensen JD. Population genetic concerns related to the interpretation of empirical outliers and the neglect of common evolutionary processes. Heredity (Edinb) 2023; 130:109-110. [PMID: 36829044 PMCID: PMC9981695 DOI: 10.1038/s41437-022-00575-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 10/27/2022] [Accepted: 10/28/2022] [Indexed: 02/26/2023] Open
Affiliation(s)
- Jeffrey D Jensen
- School of Life Science, Arizona State University, Tempe, AZ, USA.
| |
Collapse
|
15
|
Souilmi Y, Tobler R, Johar A, Williams M, Grey ST, Schmidt J, Teixeira JC, Rohrlach A, Tuke J, Johnson O, Gower G, Turney C, Cox M, Cooper A, Huber CD. Admixture has obscured signals of historical hard sweeps in humans. Nat Ecol Evol 2022; 6:2003-2015. [PMID: 36316412 PMCID: PMC9715430 DOI: 10.1038/s41559-022-01914-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2021] [Accepted: 09/16/2022] [Indexed: 11/06/2022]
Abstract
The role of natural selection in shaping biological diversity is an area of intense interest in modern biology. To date, studies of positive selection have primarily relied on genomic datasets from contemporary populations, which are susceptible to confounding factors associated with complex and often unknown aspects of population history. In particular, admixture between diverged populations can distort or hide prior selection events in modern genomes, though this process is not explicitly accounted for in most selection studies despite its apparent ubiquity in humans and other species. Through analyses of ancient and modern human genomes, we show that previously reported Holocene-era admixture has masked more than 50 historic hard sweeps in modern European genomes. Our results imply that this canonical mode of selection has probably been underappreciated in the evolutionary history of humans and suggest that our current understanding of the tempo and mode of selection in natural populations may be inaccurate.
Collapse
Affiliation(s)
- Yassine Souilmi
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia.
| | - Raymond Tobler
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia.
- Evolution of Cultural Diversity Initiative, Australian National University, Canberra, Australian Capital Territory, Australia.
| | - Angad Johar
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia.
- Department of Cardiovascular Diseases, Mayo Clinic, Rochester, MN, USA.
| | - Matthew Williams
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia
| | - Shane T Grey
- Transplantation Immunology Group, Immunology Division, Garvan Institute of Medical Research, Darlinghurst, New South Wales, Australia
- St Vincent's Clinical School, Faculty of Medicine, UNSW, Darlinghurst, New South Wales, Australia
| | - Joshua Schmidt
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia
| | - João C Teixeira
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia
| | - Adam Rohrlach
- ARC Centre of Excellence for Mathematical and Statistical Frontiers, The University of Adelaide, Adelaide, South Australia, Australia
- Department of Archaeogenetics, Max Planck Institute for the Science of Human History, Jena, Germany
| | - Jonathan Tuke
- ARC Centre of Excellence for Mathematical and Statistical Frontiers, The University of Adelaide, Adelaide, South Australia, Australia
- School of Mathematical Sciences, The University of Adelaide, Adelaide, South Australia, Australia
| | - Olivia Johnson
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia
| | - Graham Gower
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia
| | - Chris Turney
- Chronos 14Carbon-Cycle Facility and Earth and Sustainability Science Research Centre, University of New South Wales, Sydney, New South Wales, Australia
| | - Murray Cox
- Statistics and Bioinformatics Group, School of Fundamental Sciences, Massey University, Palmerston North, New Zealand
| | - Alan Cooper
- South Australian Museum, Adelaide, South Australia, Australia.
- BlueSky Genetics, Ashton, South Australia, Australia.
| | - Christian D Huber
- Australian Centre for Ancient DNA, The University of Adelaide, Adelaide, South Australia, Australia.
- Department of Biology, Penn State University, University Park, PA, USA.
| |
Collapse
|
16
|
Abstract
We discuss the genetic, demographic, and selective forces that are likely to be at play in restricting observed levels of DNA sequence variation in natural populations to a much smaller range of values than would be expected from the distribution of census population sizes alone-Lewontin's Paradox. While several processes that have previously been strongly emphasized must be involved, including the effects of direct selection and genetic hitchhiking, it seems unlikely that they are sufficient to explain this observation without contributions from other factors. We highlight a potentially important role for the less-appreciated contribution of population size change; specifically, the likelihood that many species and populations may be quite far from reaching the relatively high equilibrium diversity values that would be expected given their current census sizes.
Collapse
Affiliation(s)
- Brian Charlesworth
- Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
17
|
Johri P, Eyre-Walker A, Gutenkunst RN, Lohmueller KE, Jensen JD. On the prospect of achieving accurate joint estimation of selection with population history. Genome Biol Evol 2022; 14:evac088. [PMID: 35675379 PMCID: PMC9254643 DOI: 10.1093/gbe/evac088] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/02/2022] [Indexed: 11/15/2022] Open
Abstract
As both natural selection and population history can affect genome-wide patterns of variation, disentangling the contributions of each has remained as a major challenge in population genetics. We here discuss historical and recent progress towards this goal-highlighting theoretical and computational challenges that remain to be addressed, as well as inherent difficulties in dealing with model complexity and model violations-and offer thoughts on potentially fruitful next steps.
Collapse
Affiliation(s)
- Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | | | - Ryan N Gutenkunst
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ, USA
| | - Kirk E Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA, USA
- Department of Human Genetics, University of California, Los Angeles, CA, USA
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
18
|
Johri P, Aquadro CF, Beaumont M, Charlesworth B, Excoffier L, Eyre-Walker A, Keightley PD, Lynch M, McVean G, Payseur BA, Pfeifer SP, Stephan W, Jensen JD. Recommendations for improving statistical inference in population genomics. PLoS Biol 2022; 20:e3001669. [PMID: 35639797 PMCID: PMC9154105 DOI: 10.1371/journal.pbio.3001669] [Citation(s) in RCA: 72] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
The field of population genomics has grown rapidly in response to the recent advent of affordable, large-scale sequencing technologies. As opposed to the situation during the majority of the 20th century, in which the development of theoretical and statistical population genetic insights outpaced the generation of data to which they could be applied, genomic data are now being produced at a far greater rate than they can be meaningfully analyzed and interpreted. With this wealth of data has come a tendency to focus on fitting specific (and often rather idiosyncratic) models to data, at the expense of a careful exploration of the range of possible underlying evolutionary processes. For example, the approach of directly investigating models of adaptive evolution in each newly sequenced population or species often neglects the fact that a thorough characterization of ubiquitous nonadaptive processes is a prerequisite for accurate inference. We here describe the perils of these tendencies, present our consensus views on current best practices in population genomic data analysis, and highlight areas of statistical inference and theory that are in need of further attention. Thereby, we argue for the importance of defining a biologically relevant baseline model tuned to the details of each new analysis, of skepticism and scrutiny in interpreting model fitting results, and of carefully defining addressable hypotheses and underlying uncertainties.
Collapse
Affiliation(s)
- Parul Johri
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | - Charles F. Aquadro
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, United States of America
| | - Mark Beaumont
- School of Biological Sciences, University of Bristol, Bristol, United Kingdom
| | - Brian Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Laurent Excoffier
- Institute of Ecology and Evolution, University of Berne, Berne, Switzerland
| | - Adam Eyre-Walker
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Peter D. Keightley
- Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Michael Lynch
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | - Gil McVean
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Bret A. Payseur
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Susanne P. Pfeifer
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | | | - Jeffrey D. Jensen
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| |
Collapse
|