1
|
Soni V, Terbot JW, Versoza CJ, Pfeifer SP, Jensen JD. A whole-genome scan for evidence of recent positive and balancing selection in aye-ayes ( Daubentonia madagascariensis) utilizing a well-fit evolutionary baseline model. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.08.622667. [PMID: 39605496 PMCID: PMC11601216 DOI: 10.1101/2024.11.08.622667] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
The aye-aye (Daubentonia madagascariensis) is one of the 25 most endangered primate species in the world, maintaining amongst the lowest genetic diversity of any primate measured to date. Characterizing patterns of genetic variation within aye-aye populations, and the relative influences of neutral and selective processes in shaping that variation, is thus important for future conservation efforts. In this study, we performed the first whole-genome scans for recent positive and balancing selection in the species, utilizing high-coverage population genomic data from newly sequenced individuals. We generated null thresholds for our genomic scans by creating an evolutionarily appropriate baseline model that incorporates the demographic history of this aye-aye population, and identified a small number of candidate genes. Most notably, a suite of genes involved in olfaction - a key trait in these nocturnal primates - were identified as experiencing long-term balancing selection. We also conducted analyses to quantify the expected statistical power to detect positive and balancing selection in this population using site frequency spectrum-based inference methods, once accounting for the potentially confounding contributions of population history, recombination and mutation rate variation, and purifying and background selection. This work, presenting the first high-quality, genome-wide polymorphism data across the functional regions of the aye-aye genome, thus provides important insights into the landscape of episodic selective forces in this highly endangered species.
Collapse
Affiliation(s)
- Vivak Soni
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - John W. Terbot
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Cyril J. Versoza
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Susanne P. Pfeifer
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Jeffrey D. Jensen
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
2
|
Marsh JI, Johri P. Biases in ARG-Based Inference of Historical Population Size in Populations Experiencing Selection. Mol Biol Evol 2024; 41:msae118. [PMID: 38874402 PMCID: PMC11245712 DOI: 10.1093/molbev/msae118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 06/05/2024] [Accepted: 06/11/2024] [Indexed: 06/15/2024] Open
Abstract
Inferring the demographic history of populations provides fundamental insights into species dynamics and is essential for developing a null model to accurately study selective processes. However, background selection and selective sweeps can produce genomic signatures at linked sites that mimic or mask signals associated with historical population size change. While the theoretical biases introduced by the linked effects of selection have been well established, it is unclear whether ancestral recombination graph (ARG)-based approaches to demographic inference in typical empirical analyses are susceptible to misinference due to these effects. To address this, we developed highly realistic forward simulations of human and Drosophila melanogaster populations, including empirically estimated variability of gene density, mutation rates, recombination rates, purifying, and positive selection, across different historical demographic scenarios, to broadly assess the impact of selection on demographic inference using a genealogy-based approach. Our results indicate that the linked effects of selection minimally impact demographic inference for human populations, although it could cause misinference in populations with similar genome architecture and population parameters experiencing more frequent recurrent sweeps. We found that accurate demographic inference of D. melanogaster populations by ARG-based methods is compromised by the presence of pervasive background selection alone, leading to spurious inferences of recent population expansion, which may be further worsened by recurrent sweeps, depending on the proportion and strength of beneficial mutations. Caution and additional testing with species-specific simulations are needed when inferring population history with non-human populations using ARG-based approaches to avoid misinference due to the linked effects of selection.
Collapse
Affiliation(s)
- Jacob I Marsh
- Department of Biology, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Parul Johri
- Department of Biology, University of North Carolina, Chapel Hill, NC 27599, USA
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA
- Integrative Program for Biological and Genome Sciences, University of North Carolina, Chapel Hill, NC 27599, USA
| |
Collapse
|
3
|
Soni V, Jensen JD. Temporal challenges in detecting balancing selection from population genomic data. G3 (BETHESDA, MD.) 2024; 14:jkae069. [PMID: 38551137 DOI: 10.1093/g3journal/jkae069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 12/21/2023] [Accepted: 03/19/2024] [Indexed: 04/28/2024]
Abstract
The role of balancing selection in maintaining genetic variation remains an open question in population genetics. Recent years have seen numerous studies identifying candidate loci potentially experiencing balancing selection, most predominantly in human populations. There are however numerous alternative evolutionary processes that may leave similar patterns of variation, thereby potentially confounding inference, and the expected signatures of balancing selection additionally change in a temporal fashion. Here we use forward-in-time simulations to quantify expected statistical power to detect balancing selection using both site frequency spectrum- and linkage disequilibrium-based methods under a variety of evolutionarily realistic null models. We find that whilst site frequency spectrum-based methods have little power immediately after a balanced mutation begins segregating, power increases with time since the introduction of the balanced allele. Conversely, linkage disequilibrium-based methods have considerable power whilst the allele is young, and power dissipates rapidly as the time since introduction increases. Taken together, this suggests that site frequency spectrum-based methods are most effective at detecting long-term balancing selection (>25N generations since the introduction of the balanced allele) whilst linkage disequilibrium-based methods are effective over much shorter timescales (<1N generations), thereby leaving a large time frame over which current methods have little power to detect the action of balancing selection. Finally, we investigate the extent to which alternative evolutionary processes may mimic these patterns, and demonstrate the need for caution in attempting to distinguish the signatures of balancing selection from those of both neutral processes (e.g. population structure and admixture) as well as of alternative selective processes (e.g. partial selective sweeps).
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, AZ 85281, USA
| | - Jeffrey D Jensen
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, AZ 85281, USA
| |
Collapse
|
4
|
Song H, Chu J, Li W, Li X, Fang L, Han J, Zhao S, Ma Y. A Novel Approach Utilizing Domain Adversarial Neural Networks for the Detection and Classification of Selective Sweeps. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2304842. [PMID: 38308186 PMCID: PMC11005742 DOI: 10.1002/advs.202304842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 01/10/2024] [Indexed: 02/04/2024]
Abstract
The identification and classification of selective sweeps are of great significance for improving the understanding of biological evolution and exploring opportunities for precision medicine and genetic improvement. Here, a domain adaptation sweep detection and classification (DASDC) method is presented to balance the alignment of two domains and the classification performance through a domain-adversarial neural network and its adversarial learning modules. DASDC effectively addresses the issue of mismatch between training data and real genomic data in deep learning models, leading to a significant improvement in its generalization capability, prediction robustness, and accuracy. The DASDC method demonstrates improved identification performance compared to existing methods and excels in classification performance, particularly in scenarios where there is a mismatch between application data and training data. The successful implementation of DASDC in real data of three distinct species highlights its potential as a useful tool for identifying crucial functional genes and investigating adaptive evolutionary mechanisms, particularly with the increasing availability of genomic data.
Collapse
Affiliation(s)
- Hui Song
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
| | - Jinyu Chu
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
| | - Wangjiao Li
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
| | - Xinyun Li
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
- Hubei Hongshan LaboratoryWuhan430070China
| | - Lingzhao Fang
- Center for Quantitative Genetics and GenomicsAarhus UniversityAarhus8000Denmark
| | - Jianlin Han
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
- CAAS‐ILRI Joint Laboratory on Livestock and Forage Genetic ResourcesInstitute of Animal ScienceChinese Academy of Agricultural Sciences (CAAS)Beijing100193China
- Livestock Genetics ProgramInternational Livestock Research Institute (ILRI)Nairobi00100Kenya
| | - Shuhong Zhao
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
- Hubei Hongshan LaboratoryWuhan430070China
- Lingnan Modern Agricultural Science and Technology Guangdong LaboratoryGuangzhou510642China
| | - Yunlong Ma
- Key Laboratory of Agricultural Animal GeneticsBreeding, and Reproduction of the Ministry of Education & Key Laboratory of Swine Genetics and Breeding of the Ministry of AgricultureHuazhong Agricultural UniversityWuhan430070China
- Hubei Hongshan LaboratoryWuhan430070China
- Lingnan Modern Agricultural Science and Technology Guangdong LaboratoryGuangzhou510642China
| |
Collapse
|
5
|
Soni V, Johri P, Jensen JD. Evaluating power to detect recurrent selective sweeps under increasingly realistic evolutionary null models. Evolution 2023; 77:2113-2127. [PMID: 37395482 PMCID: PMC10547124 DOI: 10.1093/evolut/qpad120] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 06/15/2023] [Accepted: 06/30/2023] [Indexed: 07/04/2023]
Abstract
The detection of selective sweeps from population genomic data often relies on the premise that the beneficial mutations in question have fixed very near the sampling time. As it has been previously shown that the power to detect a selective sweep is strongly dependent on the time since fixation as well as the strength of selection, it is naturally the case that strong, recent sweeps leave the strongest signatures. However, the biological reality is that beneficial mutations enter populations at a rate, one that partially determines the mean wait time between sweep events and hence their age distribution. An important question thus remains about the power to detect recurrent selective sweeps when they are modeled by a realistic mutation rate and as part of a realistic distribution of fitness effects, as opposed to a single, recent, isolated event on a purely neutral background as is more commonly modeled. Here we use forward-in-time simulations to study the performance of commonly used sweep statistics, within the context of more realistic evolutionary baseline models incorporating purifying and background selection, population size change, and mutation and recombination rate heterogeneity. Results demonstrate the important interplay of these processes, necessitating caution when interpreting selection scans; specifically, false-positive rates are in excess of true-positive across much of the evaluated parameter space, and selective sweeps are often undetectable unless the strength of selection is exceptionally strong.
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| |
Collapse
|
6
|
Soni V, Johri P, Jensen JD. Evaluating power to detect recurrent selective sweeps under increasingly realistic evolutionary null models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.15.545166. [PMID: 37398347 PMCID: PMC10312679 DOI: 10.1101/2023.06.15.545166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
The detection of selective sweeps from population genomic data often relies on the premise that the beneficial mutations in question have fixed very near the sampling time. As it has been previously shown that the power to detect a selective sweep is strongly dependent on the time since fixation as well as the strength of selection, it is naturally the case that strong, recent sweeps leave the strongest signatures. However, the biological reality is that beneficial mutations enter populations at a rate, one that partially determines the mean wait time between sweep events and hence their age distribution. An important question thus remains about the power to detect recurrent selective sweeps when they are modelled by a realistic mutation rate and as part of a realistic distribution of fitness effects (DFE), as opposed to a single, recent, isolated event on a purely neutral background as is more commonly modelled. Here we use forward-in-time simulations to study the performance of commonly used sweep statistics, within the context of more realistic evolutionary baseline models incorporating purifying and background selection, population size change, and mutation and recombination rate heterogeneity. Results demonstrate the important interplay of these processes, necessitating caution when interpreting selection scans; specifically, false positive rates are in excess of true positive across much of the evaluated parameter space, and selective sweeps are often undetectable unless the strength of selection is exceptionally strong. Teaser Text Outlier-based genomic scans have proven a popular approach for identifying loci that have potentially experienced recent positive selection. However, it has previously been shown that an evolutionarily appropriate baseline model that incorporates non-equilibrium population histories, purifying and background selection, and variation in mutation and recombination rates is necessary to reduce often extreme false positive rates when performing genomic scans. Here we evaluate the power to detect recurrent selective sweeps using common SFS-based and haplotype-based methods under these increasingly realistic models. We find that while these appropriate evolutionary baselines are essential to reduce false positive rates, the power to accurately detect recurrent selective sweeps is generally low across much of the biologically relevant parameter space.
Collapse
Affiliation(s)
- Vivak Soni
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
- Present address: Department of Biology, Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | | |
Collapse
|
7
|
Gompert Z, Feder JL, Nosil P. The short-term, genome-wide effects of indirect selection deserve study: A response to Charlesworth and Jensen (2022). Mol Ecol 2022; 31:4444-4450. [PMID: 35909250 DOI: 10.1111/mec.16614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 06/21/2022] [Accepted: 07/01/2022] [Indexed: 11/30/2022]
Abstract
We recently published a paper quantifying the genome-wide consequences of natural selection, including the effects of indirect selection due to the correlation of genetic regions (neutral or selected) with directly selected regions (Gompert et al., 2022). In their critique of our paper, Charlesworth and Jensen (2022) make two main points: (i) indirect selection is equivalent to hitchhiking and thus well documented (i.e., our results are not novel) and (ii) that we do not demonstrate the source of linkage disequilibrium (LD) between SNPs and the Mel-Stripe locus in the Timema cristinae experiment we analyse. As we discuss in detail below, neither of these are substantial criticisms of our work.
Collapse
Affiliation(s)
- Zachariah Gompert
- Department of Biology, Utah State University, Logan, Utah, USA.,Ecology Center, Utah State University, Logan, Utah, USA
| | - Jeffrey L Feder
- Department of Biological Sciences, University of Notre Dame, Notre Dame, Indiana, USA
| | - Patrik Nosil
- CEFE, University Montpellier, CNRS, EPHE, IRD, University Paul Valéry Montpellier 3, Montpellier, France
| |
Collapse
|
8
|
Abstract
We discuss the genetic, demographic, and selective forces that are likely to be at play in restricting observed levels of DNA sequence variation in natural populations to a much smaller range of values than would be expected from the distribution of census population sizes alone-Lewontin's Paradox. While several processes that have previously been strongly emphasized must be involved, including the effects of direct selection and genetic hitchhiking, it seems unlikely that they are sufficient to explain this observation without contributions from other factors. We highlight a potentially important role for the less-appreciated contribution of population size change; specifically, the likelihood that many species and populations may be quite far from reaching the relatively high equilibrium diversity values that would be expected given their current census sizes.
Collapse
Affiliation(s)
- Brian Charlesworth
- Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
9
|
Laval G, Patin E, Boutillier P, Quintana-Murci L. Sporadic occurrence of recent selective sweeps from standing variation in humans as revealed by an approximate Bayesian computation approach. Genetics 2021; 219:6377789. [PMID: 34849862 DOI: 10.1093/genetics/iyab161] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2021] [Accepted: 09/01/2021] [Indexed: 12/14/2022] Open
Abstract
During their dispersals over the last 100,000 years, modern humans have been exposed to a large variety of environments, resulting in genetic adaptation. While genome-wide scans for the footprints of positive Darwinian selection have increased knowledge of genes and functions potentially involved in human local adaptation, they have globally produced evidence of a limited contribution of selective sweeps in humans. Conversely, studies based on machine learning algorithms suggest that recent sweeps from standing variation are widespread in humans, an observation that has been recently questioned. Here, we sought to formally quantify the number of recent selective sweeps in humans, by leveraging approximate Bayesian computation and whole-genome sequence data. Our computer simulations revealed suitable ABC estimations, regardless of the frequency of the selected alleles at the onset of selection and the completion of sweeps. Under a model of recent selection from standing variation, we inferred that an average of 68 (from 56 to 79) and 140 (from 94 to 198) sweeps occurred over the last 100,000 years of human history, in African and Eurasian populations, respectively. The former estimation is compatible with human adaptation rates estimated since divergence with chimps, and reveals numbers of sweeps per generation per site in the range of values estimated in Drosophila. Our results confirm the rarity of selective sweeps in humans and show a low contribution of sweeps from standing variation to recent human adaptation.
Collapse
Affiliation(s)
- Guillaume Laval
- Human Evolutionary Genetics Unit, Institut Pasteur, UMR 2000, CNRS, Paris 75015, France
| | - Etienne Patin
- Human Evolutionary Genetics Unit, Institut Pasteur, UMR 2000, CNRS, Paris 75015, France
| | - Pierre Boutillier
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Lluis Quintana-Murci
- Human Evolutionary Genetics Unit, Institut Pasteur, UMR 2000, CNRS, Paris 75015, France.,Human Genomics and Evolution, Collège de France, 75005 Paris, France
| |
Collapse
|
10
|
Johri P, Charlesworth B, Howell EK, Lynch M, Jensen JD. Revisiting the notion of deleterious sweeps. Genetics 2021; 219:iyab094. [PMID: 34125884 PMCID: PMC9101445 DOI: 10.1093/genetics/iyab094] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 06/08/2021] [Indexed: 11/14/2022] Open
Abstract
It has previously been shown that, conditional on its fixation, the time to fixation of a semi-dominant deleterious autosomal mutation in a randomly mating population is the same as that of an advantageous mutation. This result implies that deleterious mutations could generate selective sweep-like effects. Although their fixation probabilities greatly differ, the much larger input of deleterious relative to beneficial mutations suggests that this phenomenon could be important. We here examine how the fixation of mildly deleterious mutations affects levels and patterns of polymorphism at linked sites-both in the presence and absence of interference amongst deleterious mutations-and how this class of sites may contribute to divergence between-populations and species. We find that, while deleterious fixations are unlikely to represent a significant proportion of outliers in polymorphism-based genomic scans within populations, minor shifts in the frequencies of deleterious mutations can influence the proportions of private variants and the value of FST after a recent population split. As sites subject to deleterious mutations are necessarily found in functional genomic regions, interpretations in terms of recurrent positive selection may require reconsideration.
Collapse
Affiliation(s)
- Parul Johri
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
| | - Brian Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Emma K Howell
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
| | - Michael Lynch
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
- Center for Mechanisms of Evolution, The Biodesign Institute, Arizona State University, Tempe, AZ 85287, USA
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
| |
Collapse
|
11
|
Gompert Z, Feder JL, Nosil P. Natural selection drives genome-wide evolution via chance genetic associations. Mol Ecol 2021; 31:467-481. [PMID: 34704650 DOI: 10.1111/mec.16247] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 10/13/2021] [Accepted: 10/15/2021] [Indexed: 11/29/2022]
Abstract
Understanding selection's impact on the genome is a major theme in biology. Functionally neutral genetic regions can be affected indirectly by natural selection, via their statistical association with genes under direct selection. The genomic extent of such indirect selection, particularly across loci not physically linked to those under direct selection, remains poorly understood, as does the time scale at which indirect selection occurs. Here, we use field experiments and genomic data in stick insects, deer mice and stickleback fish to show that widespread statistical associations with genes known to affect fitness cause many genetic loci across the genome to be impacted indirectly by selection. This includes regions physically distant from those directly under selection. Then, focusing on the stick insect system, we show that statistical associations between SNPs and other unknown, causal variants result in additional indirect selection in general and specifically within genomic regions of physically linked loci. This widespread indirect selection necessarily makes aspects of evolution more predictable. Thus, natural selection combines with chance genetic associations to affect genome-wide evolution across linked and unlinked loci and even in modest-sized populations. This process has implications for the application of evolutionary principles in basic and applied science.
Collapse
Affiliation(s)
- Zachariah Gompert
- Department of Biology, Utah State University, Logan, Utah, USA.,Ecology Center, Utah State University, Logan, Utah, USA
| | - Jeffrey L Feder
- Department of Biological Sciences, University of Notre Dame, Notre Dame, Indiana, USA
| | - Patrik Nosil
- CEFE, Univ Montpellier, CNRS, EPHE, IRD, Univ Paul Valéry Montpellier 3, Montpellier, France
| |
Collapse
|
12
|
Buffalo V. Quantifying the relationship between genetic diversity and population size suggests natural selection cannot explain Lewontin's Paradox. eLife 2021; 10:e67509. [PMID: 34409937 PMCID: PMC8486380 DOI: 10.7554/elife.67509] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 08/16/2021] [Indexed: 12/21/2022] Open
Abstract
Neutral theory predicts that genetic diversity increases with population size, yet observed levels of diversity across metazoans vary only two orders of magnitude while population sizes vary over several. This unexpectedly narrow range of diversity is known as Lewontin's Paradox of Variation (1974). While some have suggested selection constrains diversity, tests of this hypothesis seem to fall short. Here, I revisit Lewontin's Paradox to assess whether current models of linked selection are capable of reducing diversity to this extent. To quantify the discrepancy between pairwise diversity and census population sizes across species, I combine previously-published estimates of pairwise diversity from 172 metazoan taxa with newly derived estimates of census sizes. Using phylogenetic comparative methods, I show this relationship is significant accounting for phylogeny, but with high phylogenetic signal and evidence that some lineages experience shifts in the evolutionary rate of diversity deep in the past. Additionally, I find a negative relationship between recombination map length and census size, suggesting abundant species have less recombination and experience greater reductions in diversity due to linked selection. However, I show that even assuming strong and abundant selection, models of linked selection are unlikely to explain the observed relationship between diversity and census sizes across species.
Collapse
Affiliation(s)
- Vince Buffalo
- Institute for Ecology and Evolution, University of OregonEugeneUnited States
| |
Collapse
|
13
|
Abstract
Drosophila melanogaster, a small dipteran of African origin, represents one of the best-studied model organisms. Early work in this system has uniquely shed light on the basic principles of genetics and resulted in a versatile collection of genetic tools that allow to uncover mechanistic links between genotype and phenotype. Moreover, given its worldwide distribution in diverse habitats and its moderate genome-size, Drosophila has proven very powerful for population genetics inference and was one of the first eukaryotes whose genome was fully sequenced. In this book chapter, we provide a brief historical overview of research in Drosophila and then focus on recent advances during the genomic era. After describing different types and sources of genomic data, we discuss mechanisms of neutral evolution including the demographic history of Drosophila and the effects of recombination and biased gene conversion. Then, we review recent advances in detecting genome-wide signals of selection, such as soft and hard selective sweeps. We further provide a brief introduction to background selection, selection of noncoding DNA and codon usage and focus on the role of structural variants, such as transposable elements and chromosomal inversions, during the adaptive process. Finally, we discuss how genomic data helps to dissect neutral and adaptive evolutionary mechanisms that shape genetic and phenotypic variation in natural populations along environmental gradients. In summary, this book chapter serves as a starting point to Drosophila population genomics and provides an introduction to the system and an overview to data sources, important population genetic concepts and recent advances in the field.
Collapse
|
14
|
Schneider K, White TJ, Mitchell S, Adams CE, Reeve R, Elmer KR. The pitfalls and virtues of population genetic summary statistics: Detecting selective sweeps in recent divergences. J Evol Biol 2020; 34:893-909. [DOI: 10.1111/jeb.13738] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 10/22/2020] [Accepted: 10/24/2020] [Indexed: 12/12/2022]
Affiliation(s)
- Kevin Schneider
- Institute of Biodiversity, Animal Health & Comparative Medicine College of Medical, Veterinary & Life Sciences University of Glasgow Glasgow UK
| | - Tom J. White
- Institute of Biodiversity, Animal Health & Comparative Medicine College of Medical, Veterinary & Life Sciences University of Glasgow Glasgow UK
| | - Sonia Mitchell
- Institute of Biodiversity, Animal Health & Comparative Medicine College of Medical, Veterinary & Life Sciences University of Glasgow Glasgow UK
| | - Colin E. Adams
- Institute of Biodiversity, Animal Health & Comparative Medicine College of Medical, Veterinary & Life Sciences University of Glasgow Glasgow UK
- Scottish Centre for Ecology and the Natural Environment Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary & Life Sciences University of Glasgow Glasgow UK
| | - Richard Reeve
- Institute of Biodiversity, Animal Health & Comparative Medicine College of Medical, Veterinary & Life Sciences University of Glasgow Glasgow UK
| | - Kathryn R. Elmer
- Institute of Biodiversity, Animal Health & Comparative Medicine College of Medical, Veterinary & Life Sciences University of Glasgow Glasgow UK
| |
Collapse
|
15
|
Dapper AL, Wade MJ. Relaxed Selection and the Rapid Evolution of Reproductive Genes. Trends Genet 2020; 36:640-649. [PMID: 32713599 DOI: 10.1016/j.tig.2020.06.014] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Revised: 06/22/2020] [Accepted: 06/23/2020] [Indexed: 10/23/2022]
Abstract
Evolutionary genomic studies find that reproductive protein genes, those directly involved in reproductive processes, diversify more rapidly than most other gene categories. Strong postcopulatory sexual selection acting within species is the predominant hypothesis proposed to account for the observed pattern. Recently, relaxed selection due to sex-specific gene expression has also been put forward to explain the relatively rapid diversification. We contend that relaxed selection due to sex-limited gene expression is the correct null model for tests of molecular evolution of reproductive genes and argue that it may play a more significant role in the evolutionary diversification of reproductive genes than previously recognized. We advocate for a re-evaluation of adaptive explanations for the rapid diversification of reproductive genes.
Collapse
Affiliation(s)
- Amy L Dapper
- Department of Biological Sciences, Mississippi State University, Mississippi State, MS 39762, USA; Department of Biology, Indiana University, Bloomington, IN 47401, USA.
| | - Michael J Wade
- Department of Biology, Indiana University, Bloomington, IN 47401, USA
| |
Collapse
|
16
|
Thornton KR. Polygenic Adaptation to an Environmental Shift: Temporal Dynamics of Variation Under Gaussian Stabilizing Selection and Additive Effects on a Single Trait. Genetics 2019; 213:1513-1530. [PMID: 31653678 PMCID: PMC6893385 DOI: 10.1534/genetics.119.302662] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Accepted: 10/21/2019] [Indexed: 11/26/2022] Open
Abstract
Predictions about the effect of natural selection on patterns of linked neutral variation are largely based on models involving the rapid fixation of unconditionally beneficial mutations. However, when phenotypes adapt to a new optimum trait value, the strength of selection on individual mutations decreases as the population adapts. Here, I use explicit forward simulations of a single trait with additive-effect mutations adapting to an "optimum shift." Detectable "hitchhiking" patterns are only apparent if (i) the optimum shifts are large with respect to equilibrium variation for the trait, (ii) mutation rates to large-effect mutations are low, and (iii) large-effect mutations rapidly increase in frequency and eventually reach fixation, which typically occurs after the population reaches the new optimum. For the parameters simulated here, partial sweeps do not appreciably affect patterns of linked variation, even when the mutations are strongly selected. The contribution of new mutations vs. standing variation to fixation depends on the mutation rate affecting trait values. Given the fixation of a strongly selected variant, patterns of hitchhiking are similar on average for the two classes of sweeps because sweeps from standing variation involving large-effect mutations are rare when the optimum shifts. The distribution of effect sizes of new mutations has little effect on the time to reach the new optimum, but reducing the mutational variance increases the magnitude of hitchhiking patterns. In general, populations reach the new optimum prior to the completion of any sweeps, and the times to fixation are longer for this model than for standard models of directional selection. The long fixation times are due to a combination of declining selection pressures during adaptation and the possibility of interference among weakly selected sites for traits with high mutation rates.
Collapse
Affiliation(s)
- Kevin R Thornton
- Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697
| |
Collapse
|
17
|
Adaptation in structured populations and fuzzy boundaries between hard and soft sweeps. PLoS Comput Biol 2019; 15:e1007426. [PMID: 31710623 PMCID: PMC6872172 DOI: 10.1371/journal.pcbi.1007426] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Revised: 11/21/2019] [Accepted: 09/20/2019] [Indexed: 11/19/2022] Open
Abstract
Selective sweeps, the genetic footprint of positive selection, have been extensively studied in the past decades, with dozens of methods developed to identify swept regions. However, these methods suffer from both false positive and false negative reports, and the candidates identified with different methods are often inconsistent with each other. We propose that a biological cause of this problem can be population subdivision, and a technical cause can be incomplete, or inaccurate, modeling of the dynamic process associated with sweeps. Here we used simulations to show how these effects interact and potentially cause bias. In particular, we show that sweeps maybe misclassified as either hard or soft, when the true time stage of a sweep and that implied, or pre-supposed, by the model do not match. We call this "temporal misclassification". Similarly, "spatial misclassification (softening)" can occur when hard sweeps, which are imported by migration into a new subpopulation, are falsely identified as soft. This can easily happen in case of local adaptation, i.e. when the sweeping allele is not under positive selection in the new subpopulation, and the underlying model assumes panmixis instead of substructure. The claim that most sweeps in the evolutionary history of humans were soft, may have to be reconsidered in the light of these findings.
Collapse
|
18
|
A Deep Learning Approach for Detecting Copy Number Variation in Next-Generation Sequencing Data. G3-GENES GENOMES GENETICS 2019; 9:3575-3582. [PMID: 31455677 PMCID: PMC6829143 DOI: 10.1534/g3.119.400596] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Copy number variants (CNV) are associated with phenotypic variation in several species. However, properly detecting changes in copy numbers of sequences remains a difficult problem, especially in lower quality or lower coverage next-generation sequencing data. Here, inspired by recent applications of machine learning in genomics, we describe a method to detect duplications and deletions in short-read sequencing data. In low coverage data, machine learning appears to be more powerful in the detection of CNVs than the gold-standard methods of coverage estimation alone, and of equal power in high coverage data. We also demonstrate how replicating training sets allows a more precise detection of CNVs, even identifying novel CNVs in two genomes previously surveyed thoroughly for CNVs using long read data.
Collapse
|
19
|
Exploiting selection at linked sites to infer the rate and strength of adaptation. Nat Ecol Evol 2019; 3:977-984. [PMID: 31061475 PMCID: PMC6693860 DOI: 10.1038/s41559-019-0890-6] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Accepted: 03/28/2019] [Indexed: 12/18/2022]
Abstract
Genomic data encodes past evolutionary events and has the potential to reveal the strength, rate, and biological drivers of adaptation. However, jointly estimating adaptation rate (a) and adaptation strength remains challenging because evolutionary processes such as demography, linkage, and non-neutral polymorphism can confound inference. Here, we exploit the influence of background selection to reduce the fixation rate of weakly-beneficial alleles to jointly infer the strength and rate of adaptation. We develop an MK-based method (ABC-MK) to infer adaptation rate and strength, and estimate α = 0.135 in human protein-coding sequences, 72% of which is contributed by weakly-adaptive variants. We show that in this adaptation regime α is reduced ≈ 25% by linkage genome-wide. Moreover, we show that virus-interacting proteins (VIPs) undergo adaptation that is both stronger and nearly twice as frequent as the genome average (α = 0.224, 56% due to strongly-beneficial alleles). Our results suggest that while most adaptation in human proteins is weakly-beneficial, adaptation to viruses is often strongly-beneficial. Our method provides a robust framework for estimating adaptation rate and strength across species.
Collapse
|
20
|
Abstract
For almost 20 years, many inference methods have been developed to detect selective sweeps and localize the targets of directional selection in the genome. These methods are based on population genetic models that describe the effect of a beneficial allele (e.g., a new mutation) on linked neutral variation (driven by directional selection from a single copy to fixation). Here, I discuss these models, ranging from selective sweeps in a panmictic population of constant size to evolutionary traffic when simultaneous sweeps at multiple loci interfere, and emphasize the important role of demography and population structure in data analysis. In the past 10 years, soft sweeps that may arise after an environmental change from directional selection on standing variation have become a focus of population genetic research. In contrast to selective sweeps, they are caused by beneficial alleles that were neutrally segregating in a population before the environmental change or were present at a mutation-selection balance in appreciable frequency.
Collapse
|
21
|
Lange JD, Pool JE. Impacts of Recurrent Hitchhiking on Divergence and Demographic Inference in Drosophila. Genome Biol Evol 2018; 10:1882-1891. [PMID: 30010915 PMCID: PMC6075209 DOI: 10.1093/gbe/evy142] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2018] [Indexed: 12/14/2022] Open
Abstract
In species with large population sizes such as Drosophila, natural selection may have substantial effects on genetic diversity and divergence. However, the implications of this widespread nonneutrality for standard population genetic assumptions and practices remain poorly resolved. Here, we assess the consequences of recurrent hitchhiking (RHH), in which selective sweeps occur at a given rate randomly across the genome. We use forward simulations to examine two published RHH models for D. melanogaster, reflecting relatively common/weak and rare/strong selection. We find that unlike the rare/strong RHH model, the common/weak model entails a slight degree of Hill-Robertson interference in high recombination regions. We also find that the common/weak RHH model is more consistent with our genome-wide estimate of the proportion of substitutions fixed by natural selection between D. melanogaster and D. simulans (19%). Finally, we examine how these models of RHH might bias demographic inference. We find that these RHH scenarios can bias demographic parameter estimation, but such biases are weaker for parameters relating recently diverged populations, and for the common/weak RHH model in general. Thus, even for species with important genome-wide impacts of selective sweeps, neutralist demographic inference can have some utility in understanding the histories of recently diverged populations.
Collapse
Affiliation(s)
- Jeremy D Lange
- Laboratory of Genetics, University of Wisconsin–Madison, Madison
| | - John E Pool
- Laboratory of Genetics, University of Wisconsin–Madison, Madison
| |
Collapse
|
22
|
Comeron JM. Background selection as null hypothesis in population genomics: insights and challenges from Drosophila studies. Philos Trans R Soc Lond B Biol Sci 2018; 372:rstb.2016.0471. [PMID: 29109230 PMCID: PMC5698629 DOI: 10.1098/rstb.2016.0471] [Citation(s) in RCA: 73] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/04/2017] [Indexed: 12/11/2022] Open
Abstract
The consequences of selection at linked sites are multiple and widespread across the genomes of most species. Here, I first review the main concepts behind models of selection and linkage in recombining genomes, present the difficulty in parametrizing these models simply as a reduction in effective population size (Ne) and discuss the predicted impact of recombination rates on levels of diversity across genomes. Arguments are then put forward in favour of using a model of selection and linkage with neutral and deleterious mutations (i.e. the background selection model, BGS) as a sensible null hypothesis for investigating the presence of other forms of selection, such as balancing or positive. I also describe and compare two studies that have generated high-resolution landscapes of the predicted consequences of selection at linked sites in Drosophila melanogaster. Both studies show that BGS can explain a very large fraction of the observed variation in diversity across the whole genome, thus supporting its use as null model. Finally, I identify and discuss a number of caveats and challenges in studies of genetic hitchhiking that have been often overlooked, with several of them sharing a potential bias towards overestimating the evidence supporting recent selective sweeps to the detriment of a BGS explanation. One potential source of bias is the analysis of non-equilibrium populations: it is precisely because models of selection and linkage predict variation in Ne across chromosomes that demographic dynamics are not expected to be equivalent chromosome- or genome-wide. Other challenges include the use of incomplete genome annotations, the assumption of temporally stable recombination landscapes, the presence of genes under balancing selection and the consequences of ignoring non-crossover (gene conversion) recombination events. This article is part of the themed issue ‘Evolutionary causes and consequences of recombination rate variation in sexual organisms’.
Collapse
Affiliation(s)
- Josep M Comeron
- Department of Biology, University of Iowa, Iowa City, IA 52242, USA .,Interdisciplinary Program in Genetics, University of Iowa, Iowa City, IA 52242, USA
| |
Collapse
|
23
|
Evolutionary Toxicology as a Tool to Assess the Ecotoxicological Risk in Freshwater Ecosystems. WATER 2018. [DOI: 10.3390/w10040490] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
24
|
Abstract
The degree to which adaptation in recent human evolution shapes genetic variation remains controversial. This is in part due to the limited evidence in humans for classic "hard selective sweeps", wherein a novel beneficial mutation rapidly sweeps through a population to fixation. However, positive selection may often proceed via "soft sweeps" acting on mutations already present within a population. Here, we examine recent positive selection across six human populations using a powerful machine learning approach that is sensitive to both hard and soft sweeps. We found evidence that soft sweeps are widespread and account for the vast majority of recent human adaptation. Surprisingly, our results also suggest that linked positive selection affects patterns of variation across much of the genome, and may increase the frequencies of deleterious mutations. Our results also reveal insights into the role of sexual selection, cancer risk, and central nervous system development in recent human evolution.
Collapse
Affiliation(s)
- Daniel R. Schrider
- Department of Genetics, Rutgers University, Piscataway, NJ
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ
| | - Andrew D. Kern
- Department of Genetics, Rutgers University, Piscataway, NJ
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ
| |
Collapse
|
25
|
Abstract
Molecular population genetics aims to explain genetic variation and molecular evolution from population genetics principles. The field was born 50 years ago with the first measures of genetic variation in allozyme loci, continued with the nucleotide sequencing era, and is currently in the era of population genomics. During this period, molecular population genetics has been revolutionized by progress in data acquisition and theoretical developments. The conceptual elegance of the neutral theory of molecular evolution or the footprint carved by natural selection on the patterns of genetic variation are two examples of the vast number of inspiring findings of population genetics research. Since the inception of the field, Drosophila has been the prominent model species: molecular variation in populations was first described in Drosophila and most of the population genetics hypotheses were tested in Drosophila species. In this review, we describe the main concepts, methods, and landmarks of molecular population genetics, using the Drosophila model as a reference. We describe the different genetic data sets made available by advances in molecular technologies, and the theoretical developments fostered by these data. Finally, we review the results and new insights provided by the population genomics approach, and conclude by enumerating challenges and new lines of inquiry posed by increasingly large population scale sequence data.
Collapse
|
26
|
Schrider DR, Shanku AG, Kern AD. Effects of Linked Selective Sweeps on Demographic Inference and Model Selection. Genetics 2016; 204:1207-1223. [PMID: 27605051 PMCID: PMC5105852 DOI: 10.1534/genetics.116.190223] [Citation(s) in RCA: 103] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Accepted: 09/02/2016] [Indexed: 01/06/2023] Open
Abstract
The availability of large-scale population genomic sequence data has resulted in an explosion in efforts to infer the demographic histories of natural populations across a broad range of organisms. As demographic events alter coalescent genealogies, they leave detectable signatures in patterns of genetic variation within and between populations. Accordingly, a variety of approaches have been designed to leverage population genetic data to uncover the footprints of demographic change in the genome. The vast majority of these methods make the simplifying assumption that the measures of genetic variation used as their input are unaffected by natural selection. However, natural selection can dramatically skew patterns of variation not only at selected sites, but at linked, neutral loci as well. Here we assess the impact of recent positive selection on demographic inference by characterizing the performance of three popular methods through extensive simulation of data sets with varying numbers of linked selective sweeps. In particular, we examined three different demographic models relevant to a number of species, finding that positive selection can bias parameter estimates of each of these models-often severely. We find that selection can lead to incorrect inferences of population size changes when none have occurred. Moreover, we show that linked selection can lead to incorrect demographic model selection, when multiple demographic scenarios are compared. We argue that natural populations may experience the amount of recent positive selection required to skew inferences. These results suggest that demographic studies conducted in many species to date may have exaggerated the extent and frequency of population size changes.
Collapse
Affiliation(s)
- Daniel R Schrider
- Department of Genetics, Rutgers University, Piscataway, New Jersey 08854
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, New Jersey 08554
| | - Alexander G Shanku
- Department of Genetics, Rutgers University, Piscataway, New Jersey 08854
- Institute for Quantitative Biomedicine, Rutgers University, Piscataway, New Jersey 08554
| | - Andrew D Kern
- Department of Genetics, Rutgers University, Piscataway, New Jersey 08854
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, New Jersey 08554
| |
Collapse
|
27
|
Matuszewski S, Hildebrandt ME, Ghenu AH, Jensen JD, Bank C. A Statistical Guide to the Design of Deep Mutational Scanning Experiments. Genetics 2016; 204:77-87. [PMID: 27412710 PMCID: PMC5012406 DOI: 10.1534/genetics.116.190462] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2016] [Accepted: 06/29/2016] [Indexed: 12/21/2022] Open
Abstract
The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deep-sequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates.
Collapse
Affiliation(s)
- Sebastian Matuszewski
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marcel E Hildebrandt
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland School of Basic Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | | | - Jeffrey D Jensen
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Claudia Bank
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland Swiss Institute of Bioinformatics, Lausanne, Switzerland Instituto Gulbenkian de Ciência, Oeiras, Portugal
| |
Collapse
|
28
|
Moura de Sousa JA, Alpedrinha J, Campos PRA, Gordo I. Competition and fixation of cohorts of adaptive mutations under Fisher geometrical model. PeerJ 2016; 4:e2256. [PMID: 27547562 PMCID: PMC4975028 DOI: 10.7717/peerj.2256] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Accepted: 06/23/2016] [Indexed: 11/24/2022] Open
Abstract
One of the simplest models of adaptation to a new environment is Fisher’s Geometric Model (FGM), in which populations move on a multidimensional landscape defined by the traits under selection. The predictions of this model have been found to be consistent with current observations of patterns of fitness increase in experimentally evolved populations. Recent studies investigated the dynamics of allele frequency change along adaptation of microbes to simple laboratory conditions and unveiled a dramatic pattern of competition between cohorts of mutations, i.e., multiple mutations simultaneously segregating and ultimately reaching fixation. Here, using simulations, we study the dynamics of phenotypic and genetic change as asexual populations under clonal interference climb a Fisherian landscape, and ask about the conditions under which FGM can display the simultaneous increase and fixation of multiple mutations—mutation cohorts—along the adaptive walk. We find that FGM under clonal interference, and with varying levels of pleiotropy, can reproduce the experimentally observed competition between different cohorts of mutations, some of which have a high probability of fixation along the adaptive walk. Overall, our results show that the surprising dynamics of mutation cohorts recently observed during experimental adaptation of microbial populations can be expected under one of the oldest and simplest theoretical models of adaptation—FGM.
Collapse
Affiliation(s)
| | | | - Paulo R A Campos
- Departamento de Fisica, Cidade Universitária, Universidade Federal de Pernambuco , Recife , Pernambuco , Brazil
| | - Isabel Gordo
- Instituto Gulbenkian de Ciência , Oeiras , Portugal
| |
Collapse
|
29
|
Ortega-Del Vecchyo D, Marsden CD, Lohmueller KE. PReFerSim: fast simulation of demography and selection under the Poisson Random Field model. Bioinformatics 2016; 32:3516-3518. [PMID: 27436562 DOI: 10.1093/bioinformatics/btw478] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2016] [Revised: 06/10/2016] [Accepted: 07/03/2016] [Indexed: 01/06/2023] Open
Abstract
The Poisson Random Field (PRF) model has become an important tool in population genetics to study weakly deleterious genetic variation under complicated demographic scenarios. Currently, there are no freely available software applications that allow simulation of genetic variation data under this model. Here we present PReFerSim, an ANSI C program that performs forward simulations under the PRF model. PReFerSim models changes in population size, arbitrary amounts of inbreeding, dominance and distributions of selective effects. Users can track summaries of genetic variation over time and output trajectories of selected alleles. AVAILABILITY AND IMPLEMENTATION PReFerSim is freely available at: https://github.com/LohmuellerLab/PReFerSim CONTACT: klohmueller@ucla.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Diego Ortega-Del Vecchyo
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, CA 90095, USA
| | - Clare D Marsden
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095, USA
| | - Kirk E Lohmueller
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, CA 90095, USA.,Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095, USA.,Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
| |
Collapse
|
30
|
Kousathanas A, Leuenberger C, Helfer J, Quinodoz M, Foll M, Wegmann D. Likelihood-Free Inference in High-Dimensional Models. Genetics 2016; 203:893-904. [PMID: 27052569 PMCID: PMC4896201 DOI: 10.1534/genetics.116.187567] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2016] [Accepted: 04/04/2016] [Indexed: 11/18/2022] Open
Abstract
Methods that bypass analytical evaluations of the likelihood function have become an indispensable tool for statistical inference in many fields of science. These so-called likelihood-free methods rely on accepting and rejecting simulations based on summary statistics, which limits them to low-dimensional models for which the value of the likelihood is large enough to result in manageable acceptance rates. To get around these issues, we introduce a novel, likelihood-free Markov chain Monte Carlo (MCMC) method combining two key innovations: updating only one parameter per iteration and accepting or rejecting this update based on subsets of statistics approximately sufficient for this parameter. This increases acceptance rates dramatically, rendering this approach suitable even for models of very high dimensionality. We further derive that for linear models, a one-dimensional combination of statistics per parameter is sufficient and can be found empirically with simulations. Finally, we demonstrate that our method readily scales to models of very high dimensionality, using toy models as well as by jointly inferring the effective population size, the distribution of fitness effects (DFE) of segregating mutations, and selection coefficients for each locus from data of a recent experiment on the evolution of drug resistance in influenza.
Collapse
Affiliation(s)
- Athanasios Kousathanas
- Department of Biology and Biochemistry, University of Fribourg, 1700 Fribourg, Switzerland Swiss Institute of Bioinformatics, 1700 Fribourg, Switzerland
| | | | - Jonas Helfer
- Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge Massachusetts 02139
| | - Mathieu Quinodoz
- Department of Computational Biology, University of Lausanne, 1200 Lausanne, Switzerland
| | - Matthieu Foll
- International Agency for Research on Cancer, 69372 Lyon, France
| | - Daniel Wegmann
- Department of Biology and Biochemistry, University of Fribourg, 1700 Fribourg, Switzerland Swiss Institute of Bioinformatics, 1700 Fribourg, Switzerland
| |
Collapse
|
31
|
Recombination hotspots: Models and tools for detection. DNA Repair (Amst) 2016; 40:47-56. [PMID: 26991854 DOI: 10.1016/j.dnarep.2016.02.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2016] [Accepted: 02/09/2016] [Indexed: 11/22/2022]
Abstract
Recombination hotspots are the regions within the genome where the rate, and the frequency of recombination are optimum with a size varying from 1 to 2kb. The recombination event is mediated by the double-stranded break formation, guided by the combined enzymatic action of DNA topoisomerase and Spo 11 endonuclease. These regions are distributed non-uniformly throughout the human genome and cause distortions in the genetic map. Numerous lines of evidence suggest that the number of hotspots known in humans has increased manifold in recent years. A few facts about the hotspot evolutions were also put forward, indicating the differences in the hotspot position between chimpanzees and humans. In mice, recombination hot spots were found to be clustered within the major histocompatibility complex (MHC) region. Several models, that help explain meiotic recombination has been proposed. Moreover, scientists also developed some computational tools to locate the hotspot position and estimate their recombination rate in humans is of great interest to population and medical geneticists. Here we reviewed the molecular mechanisms, models and in silico prediction techniques of hot spot residues.
Collapse
|
32
|
Sheehan S, Song YS. Deep Learning for Population Genetic Inference. PLoS Comput Biol 2016; 12:e1004845. [PMID: 27018908 PMCID: PMC4809617 DOI: 10.1371/journal.pcbi.1004845] [Citation(s) in RCA: 156] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Accepted: 03/02/2016] [Indexed: 02/05/2023] Open
Abstract
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.
Collapse
Affiliation(s)
- Sara Sheehan
- Department of Computer Science, Smith College, Northampton, Massachusetts, United States of America
- Computer Science Division, UC Berkeley, Berkeley, California, United States of America
| | - Yun S. Song
- Computer Science Division, UC Berkeley, Berkeley, California, United States of America
- Department of Statistics, UC Berkeley, Berkeley, California, United States of America
- Department of Integrative Biology, UC Berkeley, Berkeley, California, United States of America
- Department of Mathematics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
33
|
Jones MR, Good JM. Targeted capture in evolutionary and ecological genomics. Mol Ecol 2016; 25:185-202. [PMID: 26137993 PMCID: PMC4823023 DOI: 10.1111/mec.13304] [Citation(s) in RCA: 203] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2015] [Revised: 06/19/2015] [Accepted: 06/24/2015] [Indexed: 12/17/2022]
Abstract
The rapid expansion of next-generation sequencing has yielded a powerful array of tools to address fundamental biological questions at a scale that was inconceivable just a few years ago. Various genome-partitioning strategies to sequence select subsets of the genome have emerged as powerful alternatives to whole-genome sequencing in ecological and evolutionary genomic studies. High-throughput targeted capture is one such strategy that involves the parallel enrichment of preselected genomic regions of interest. The growing use of targeted capture demonstrates its potential power to address a range of research questions, yet these approaches have yet to expand broadly across laboratories focused on evolutionary and ecological genomics. In part, the use of targeted capture has been hindered by the logistics of capture design and implementation in species without established reference genomes. Here we aim to (i) increase the accessibility of targeted capture to researchers working in nonmodel taxa by discussing capture methods that circumvent the need of a reference genome, (ii) highlight the evolutionary and ecological applications where this approach is emerging as a powerful sequencing strategy and (iii) discuss the future of targeted capture and other genome-partitioning approaches in the light of the increasing accessibility of whole-genome sequencing. Given the practical advantages and increasing feasibility of high-throughput targeted capture, we anticipate an ongoing expansion of capture-based approaches in evolutionary and ecological research, synergistic with an expansion of whole-genome sequencing.
Collapse
Affiliation(s)
- Matthew R. Jones
- University of Montana, Division of Biological Sciences, 32 Campus Dr. HS104, Missoula, MT 59812, USA
| | - Jeffrey M. Good
- University of Montana, Division of Biological Sciences, 32 Campus Dr. HS104, Missoula, MT 59812, USA
| |
Collapse
|
34
|
Stephan W. Signatures of positive selection: from selective sweeps at individual loci to subtle allele frequency changes in polygenic adaptation. Mol Ecol 2015; 25:79-88. [PMID: 26108992 DOI: 10.1111/mec.13288] [Citation(s) in RCA: 101] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2015] [Revised: 06/19/2015] [Accepted: 06/19/2015] [Indexed: 12/16/2022]
Abstract
In the past 15 years, numerous methods have been developed to detect selective sweeps underlying adaptations. These methods are based on relatively simple population genetic models, including one or two loci at which positive directional selection occurs, and one or two marker loci at which the impact of selection on linked neutral variation is quantified. Information about the phenotype under selection is not included in these models (except for fitness). In contrast, in the quantitative genetic models of adaptation, selection acts on one or more phenotypic traits, such that a genotype-phenotype map is required to bridge the gap to population genetics theory. Here I describe the range of population genetic models from selective sweeps in a panmictic population of constant size to evolutionary traffic when simultaneous sweeps at multiple loci interfere, and I also consider the case of polygenic selection characterized by subtle allele frequency shifts at many loci. Furthermore, I present an overview of the statistical tests that have been proposed based on these population genetics models to detect evidence for positive selection in the genome.
Collapse
Affiliation(s)
- Wolfgang Stephan
- Biocenter, Department of Biology, Ludwig-Maximilian University Munich, Grosshaderner Str. 2, 82152, Planegg-Martinsried, Germany.,Museum für Naturkunde, Berlin, Germany
| |
Collapse
|
35
|
Rogers RL, Cridland JM, Shao L, Hu TT, Andolfatto P, Thornton KR. Tandem Duplications and the Limits of Natural Selection in Drosophila yakuba and Drosophila simulans. PLoS One 2015; 10:e0132184. [PMID: 26176952 PMCID: PMC4503668 DOI: 10.1371/journal.pone.0132184] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2015] [Accepted: 06/10/2015] [Indexed: 11/30/2022] Open
Abstract
Tandem duplications are an essential source of genetic novelty, and their variation in natural populations is expected to influence adaptive walks. Here, we describe evolutionary impacts of recently-derived, segregating tandem duplications in Drosophila yakuba and Drosophila simulans. We observe an excess of duplicated genes involved in defense against pathogens, insecticide resistance, chorion development, cuticular peptides, and lipases or endopeptidases associated with the accessory glands across both species. The observed agreement is greater than expectations on chance alone, suggesting large amounts of convergence across functional categories. We document evidence of widespread selection on the D. simulans X, suggesting adaptation through duplication is common on the X. Despite the evidence for positive selection, duplicates display an excess of low frequency variants consistent with largely detrimental impacts, limiting the variation that can effectively facilitate adaptation. Standing variation for tandem duplications spans less than 25% of the genome in D. yakuba and D. simulans, indicating that evolution will be strictly limited by mutation, even in organisms with large population sizes. Effective whole gene duplication rates are low at 1.17 × 10-9 per gene per generation in D. yakuba and 6.03 × 10-10 per gene per generation in D. simulans, suggesting long wait times for new mutations on the order of thousands of years for the establishment of sweeps. Hence, in cases where adaptation depends on individual tandem duplications, evolution will be severely limited by mutation. We observe low levels of parallel recruitment of the same duplicated gene in different species, suggesting that the span of standing variation will define evolutionary outcomes in spite of convergence across gene ontologies consistent with rapidly evolving phenotypes.
Collapse
Affiliation(s)
- Rebekah L. Rogers
- Ecology and Evolutionary Biology, University of California, Berkeley, California, United States of America
| | - Julie M. Cridland
- Ecology and Evolutionary Biology, University of California, Davis, Davis, California, United States of America
| | - Ling Shao
- Ecology and Evolutionary Biology, University of California, Irvine, Irvine, California, United States of America
| | - Tina T. Hu
- Ecology and Evolutionary Biology and the Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Peter Andolfatto
- Ecology and Evolutionary Biology and the Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Kevin R. Thornton
- Ecology and Evolutionary Biology, University of California, Irvine, Irvine, California, United States of America
| |
Collapse
|
36
|
Savage AE, Becker CG, Zamudio KR. Linking genetic and environmental factors in amphibian disease risk. Evol Appl 2015; 8:560-72. [PMID: 26136822 PMCID: PMC4479512 DOI: 10.1111/eva.12264] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2014] [Accepted: 04/07/2015] [Indexed: 01/13/2023] Open
Abstract
A central question in evolutionary biology is how interactions between organisms and the environment shape genetic differentiation. The pathogen Batrachochytrium dendrobatidis (Bd) has caused variable population declines in the lowland leopard frog (Lithobates yavapaiensis); thus, disease has potentially shaped, or been shaped by, host genetic diversity. Environmental factors can also influence both amphibian immunity and Bd virulence, confounding our ability to assess the genetic effects on disease dynamics. Here, we used genetics, pathogen dynamics, and environmental data to characterize L. yavapaiensis populations, estimate migration, and determine relative contributions of genetic and environmental factors in predicting Bd dynamics. We found that the two uninfected populations belonged to a single genetic deme, whereas each infected population was genetically unique. We detected an outlier locus that deviated from neutral expectations and was significantly correlated with mortality within populations. Across populations, only environmental variables predicted infection intensity, whereas environment and genetics predicted infection prevalence, and genetic diversity alone predicted mortality. At one locality with geothermally elevated water temperatures, migration estimates revealed source-sink dynamics that have likely prevented local adaptation. We conclude that integrating genetic and environmental variation among populations provides a better understanding of Bd spatial epidemiology, generating more effective conservation management strategies for mitigating amphibian declines.
Collapse
Affiliation(s)
- Anna E Savage
- Department of Ecology and Evolutionary Biology, Cornell University Ithaca, NY, USA ; Department of Biology, University of Central Florida 4110 Libra Drive, Orlando, FL 32816, USA
| | - Carlos G Becker
- Department of Ecology and Evolutionary Biology, Cornell University Ithaca, NY, USA ; Department of Zoology, State University of Sao Paulo Av. 24A No. 1515, Rio Claro, SP 13506-900, Brazil
| | - Kelly R Zamudio
- Department of Ecology and Evolutionary Biology, Cornell University Ithaca, NY, USA
| |
Collapse
|
37
|
Roux C, Pannell JR. Inferring the mode of origin of polyploid species from next-generation sequence data. Mol Ecol 2015; 24:1047-59. [DOI: 10.1111/mec.13078] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2014] [Revised: 01/06/2015] [Accepted: 01/08/2015] [Indexed: 11/29/2022]
Affiliation(s)
- Camille Roux
- Department of Ecology and Evolution; University of Lausanne; Lausanne 1015 Switzerland
| | - John R. Pannell
- Department of Ecology and Evolution; University of Lausanne; Lausanne 1015 Switzerland
| |
Collapse
|
38
|
On the unfounded enthusiasm for soft selective sweeps. Nat Commun 2014; 5:5281. [DOI: 10.1038/ncomms6281] [Citation(s) in RCA: 111] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Accepted: 09/17/2014] [Indexed: 11/09/2022] Open
|
39
|
Besnier F, Kent M, Skern-Mauritzen R, Lien S, Malde K, Edvardsen RB, Taylor S, Ljungfeldt LER, Nilsen F, Glover KA. Human-induced evolution caught in action: SNP-array reveals rapid amphi-atlantic spread of pesticide resistance in the salmon ecotoparasite Lepeophtheirus salmonis. BMC Genomics 2014; 15:937. [PMID: 25344698 PMCID: PMC4223847 DOI: 10.1186/1471-2164-15-937] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2014] [Accepted: 10/16/2014] [Indexed: 12/23/2022] Open
Abstract
Background The salmon louse, Lepeophtheirus salmonis, is an ectoparasite of salmonids that causes huge economic losses in salmon farming, and has also been causatively linked with declines of wild salmonid populations. Lice control on farms is reliant upon a few groups of pesticides that have all shown time-limited efficiency due to resistance development. However, to date, this example of human-induced evolution is poorly documented at the population level due to the lack of molecular tools. As such, important evolutionary and management questions, linked to the development and dispersal of pesticide resistance in this parasite, remain unanswered. Here, we introduce the first Single Nucleotide Polymorphism (SNP) array for the salmon louse, which includes 6000 markers, and present a population genomic scan using this array on 576 lice from twelve farms distributed across the North Atlantic. Results Our results support the hypothesis of a single panmictic population of lice in the Atlantic, and importantly, revealed very strong selective sweeps on linkage groups 1 and 5. These sweeps included candidate genes potentially connected to pesticide resistance. After genotyping a further 576 lice from 12 full sibling families, a genome-wide association analysis established a highly significant association between the major sweep on linkage group 5 and resistance to emamectin benzoate, the most widely used pesticide in salmonid aquaculture for more than a decade. Conclusions The analysis of conserved haplotypes across samples from the Atlantic strongly suggests that emamectin benzoate resistance developed at a single source, and rapidly spread across the Atlantic within the period 1999 when the chemical was first introduced, to 2010 when samples for the present study were obtained. These results provide unique insights into the development and spread of pesticide resistance in the marine environment, and identify a small genomic region strongly linked to emamectin benzoate resistance. Finally, these results have highly significant implications for the way pesticide resistance is considered and managed within the aquaculture industry. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-937) contains supplementary material, which is available to authorized users.
Collapse
|
40
|
Background selection as baseline for nucleotide variation across the Drosophila genome. PLoS Genet 2014; 10:e1004434. [PMID: 24968283 PMCID: PMC4072542 DOI: 10.1371/journal.pgen.1004434] [Citation(s) in RCA: 92] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2013] [Accepted: 04/28/2014] [Indexed: 11/21/2022] Open
Abstract
The constant removal of deleterious mutations by natural selection causes a reduction in neutral diversity and efficacy of selection at genetically linked sites (a process called Background Selection, BGS). Population genetic studies, however, often ignore BGS effects when investigating demographic events or the presence of other types of selection. To obtain a more realistic evolutionary expectation that incorporates the unavoidable consequences of deleterious mutations, we generated high-resolution landscapes of variation across the Drosophila melanogaster genome under a BGS scenario independent of polymorphism data. We find that BGS plays a significant role in shaping levels of variation across the entire genome, including long introns and intergenic regions distant from annotated genes. We also find that a very large percentage of the observed variation in diversity across autosomes can be explained by BGS alone, up to 70% across individual chromosome arms at 100-kb scale, thus indicating that BGS predictions can be used as baseline to infer additional types of selection and demographic events. This approach allows detecting several outlier regions with signal of recent adaptive events and selective sweeps. The use of a BGS baseline, however, is particularly appropriate to investigate the presence of balancing selection and our study exposes numerous genomic regions with the predicted signature of higher polymorphism than expected when a BGS context is taken into account. Importantly, we show that these conclusions are robust to the mutation and selection parameters of the BGS model. Finally, analyses of protein evolution together with previous comparisons of genetic maps between Drosophila species, suggest temporally variable recombination landscapes and, thus, local BGS effects that may differ between extant and past phases. Because genome-wide BGS and temporal changes in linkage effects can skew approaches to estimate demographic and selective events, future analyses should incorporate BGS predictions and capture local recombination variation across genomes and along lineages. The removal of deleterious mutations from natural populations has potential consequences on patterns of variation across genomes. Population genetic analyses, however, often assume that such effects are negligible across recombining regions of species like Drosophila. We use simple models of purifying selection and current knowledge of recombination rates and gene distribution across the genome to obtain a baseline of variation predicted by the constant input and removal of deleterious mutations. We find that purifying selection alone can explain a major fraction of the observed variance in nucleotide diversity across the genome. The use of a baseline of variation predicted by linkage to deleterious mutations as null expectation exposes genomic regions under other selective regimes, including more regions showing the signature of balancing selection than would be evident when using traditional approaches. Our study also indicates that most, if not all, nucleotides across the D. melanogaster genome are significantly influenced by the removal of deleterious mutations, even when located in the middle of highly recombining regions and distant from genes. Additionally, the study of rates of protein evolution confirms previous analyses suggesting that the recombination landscape across the genome has changed in the recent history of D. melanogaster. All these reported factors can skew current analyses designed to capture demographic events or estimate the strength and frequency of adaptive mutations, and illustrate the need for new and more realistic theoretical and modeling approaches to study naturally occurring genetic variation.
Collapse
|
41
|
Schumer M, Cui R, Powell DL, Dresner R, Rosenthal GG, Andolfatto P. High-resolution mapping reveals hundreds of genetic incompatibilities in hybridizing fish species. eLife 2014; 3. [PMID: 24898754 PMCID: PMC4080447 DOI: 10.7554/elife.02535] [Citation(s) in RCA: 83] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2014] [Accepted: 06/02/2014] [Indexed: 12/18/2022] Open
Abstract
Hybridization is increasingly being recognized as a common process in both animal and plant species. Negative epistatic interactions between genes from different parental genomes decrease the fitness of hybrids and can limit gene flow between species. However, little is known about the number and genome-wide distribution of genetic incompatibilities separating species. To detect interacting genes, we perform a high-resolution genome scan for linkage disequilibrium between unlinked genomic regions in naturally occurring hybrid populations of swordtail fish. We estimate that hundreds of pairs of genomic regions contribute to reproductive isolation between these species, despite them being recently diverged. Many of these incompatibilities are likely the result of natural or sexual selection on hybrids, since intrinsic isolation is known to be weak. Patterns of genomic divergence at these regions imply that genetic incompatibilities play a significant role in limiting gene flow even in young species.
Collapse
Affiliation(s)
- Molly Schumer
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, United States
| | - Rongfeng Cui
- Department of Biology, Texas A&M University, College Station, United States
| | - Daniel L Powell
- Department of Biology, Texas A&M University, College Station, United States
| | - Rebecca Dresner
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, United States
| | - Gil G Rosenthal
- Department of Biology, Texas A&M University, College Station, United States
| | - Peter Andolfatto
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, United States
| |
Collapse
|
42
|
Sandoval-Castellanos E, Palkopoulou E, Dalén L. Back to BaySICS: a user-friendly program for Bayesian Statistical Inference from Coalescent Simulations. PLoS One 2014; 9:e98011. [PMID: 24865457 PMCID: PMC4035278 DOI: 10.1371/journal.pone.0098011] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2013] [Accepted: 04/28/2014] [Indexed: 12/02/2022] Open
Abstract
Inference of population demographic history has vastly improved in recent years due to a number of technological and theoretical advances including the use of ancient DNA. Approximate Bayesian computation (ABC) stands among the most promising methods due to its simple theoretical fundament and exceptional flexibility. However, limited availability of user-friendly programs that perform ABC analysis renders it difficult to implement, and hence programming skills are frequently required. In addition, there is limited availability of programs able to deal with heterochronous data. Here we present the software BaySICS: Bayesian Statistical Inference of Coalescent Simulations. BaySICS provides an integrated and user-friendly platform that performs ABC analyses by means of coalescent simulations from DNA sequence data. It estimates historical demographic population parameters and performs hypothesis testing by means of Bayes factors obtained from model comparisons. Although providing specific features that improve inference from datasets with heterochronous data, BaySICS also has several capabilities making it a suitable tool for analysing contemporary genetic datasets. Those capabilities include joint analysis of independent tables, a graphical interface and the implementation of Markov-chain Monte Carlo without likelihoods.
Collapse
Affiliation(s)
- Edson Sandoval-Castellanos
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden; Department of Zoology, Stockholm University, Stockholm, Sweden
| | - Eleftheria Palkopoulou
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden; Department of Zoology, Stockholm University, Stockholm, Sweden
| | - Love Dalén
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden
| |
Collapse
|
43
|
Abstract
Evolutionary forces shape patterns of genetic diversity within populations and contribute to phenotypic variation. In particular, recurrent positive selection has attracted significant interest in both theoretical and empirical studies. However, most existing theoretical models of recurrent positive selection cannot easily incorporate realistic confounding effects such as interference between selected sites, arbitrary selection schemes, and complicated demographic processes. It is possible to quantify the effects of arbitrarily complex evolutionary models by performing forward population genetic simulations, but forward simulations can be computationally prohibitive for large population sizes (>105). A common approach for overcoming these computational limitations is rescaling of the most computationally expensive parameters, especially population size. Here, we show that ad hoc approaches to parameter rescaling under the recurrent hitchhiking model do not always provide sufficiently accurate dynamics, potentially skewing patterns of diversity in simulated DNA sequences. We derive an extension of the recurrent hitchhiking model that is appropriate for strong selection in small population sizes and use it to develop a method for parameter rescaling that provides the best possible computational performance for a given error tolerance. We perform a detailed theoretical analysis of the robustness of rescaling across the parameter space. Finally, we apply our rescaling algorithms to parameters that were previously inferred for Drosophila and discuss practical considerations such as interference between selected sites.
Collapse
|
44
|
Campos JL, Halligan DL, Haddrill PR, Charlesworth B. The relation between recombination rate and patterns of molecular evolution and variation in Drosophila melanogaster. Mol Biol Evol 2014; 31:1010-28. [PMID: 24489114 PMCID: PMC3969569 DOI: 10.1093/molbev/msu056] [Citation(s) in RCA: 100] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Genetic recombination associated with sexual reproduction increases the efficiency of natural selection by reducing the strength of Hill–Robertson interference. Such interference can be caused either by selective sweeps of positively selected alleles or by background selection (BGS) against deleterious mutations. Its consequences can be studied by comparing patterns of molecular evolution and variation in genomic regions with different rates of crossing over. We carried out a comprehensive study of the benefits of recombination in Drosophila melanogaster, both by contrasting five independent genomic regions that lack crossing over with the rest of the genome and by comparing regions with different rates of crossing over, using data on DNA sequence polymorphisms from an African population that is geographically close to the putatively ancestral population for the species, and on sequence divergence from a related species. We observed reductions in sequence diversity in noncrossover (NC) regions that are inconsistent with the effects of hard selective sweeps in the absence of recombination. Overall, the observed patterns suggest that the recombination rate experienced by a gene is positively related to an increase in the efficiency of both positive and purifying selection. The results are consistent with a BGS model with interference among selected sites in NC regions, and joint effects of BGS, selective sweeps, and a past population expansion on variability in regions of the genome that experience crossing over. In such crossover regions, the X chromosome exhibits a higher rate of adaptive protein sequence evolution than the autosomes, implying a Faster-X effect.
Collapse
Affiliation(s)
- José L Campos
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | | | | | | |
Collapse
|
45
|
Moura de Sousa JA, Campos PRA, Gordo I. An ABC method for estimating the rate and distribution of effects of beneficial mutations. Genome Biol Evol 2013; 5:794-806. [PMID: 23542207 PMCID: PMC3673657 DOI: 10.1093/gbe/evt045] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Determining the distribution of adaptive mutations available to natural selection is a
difficult task. These are rare events and most of them are lost by chance. Some
theoretical works propose that the distribution of newly arising beneficial mutations
should be close to exponential. Empirical data are scarce and do not always support an
exponential distribution. Analysis of the dynamics of adaptation in asexual populations of
microorganisms has revealed that these can be summarized by two effective parameters, the
effective mutation rate, Ue, and the effective selection
coefficient of a beneficial mutation, Se. Here, we show that
these effective parameters will not always reflect the rate and mean effect of beneficial
mutations, especially when the distribution of arising mutations has high variance, and
the mutation rate is high. We propose a method to estimate the distribution of arising
beneficial mutations, which is motivated by a common experimental setup. The method, which
we call One Biallelic Marker Approximate Bayesian Computation, makes use of experimental
data consisting of periodic measures of neutral marker frequencies and mean population
fitness. Using simulations, we find that this method allows the discrimination of the
shape of the distribution of arising mutations and that it provides reasonable estimates
of their rates and mean effects in ranges of the parameter space that may be of biological
relevance.
Collapse
|
46
|
Hietpas RT, Bank C, Jensen JD, Bolon DNA. Shifting fitness landscapes in response to altered environments. Evolution 2013; 67:3512-22. [PMID: 24299404 PMCID: PMC3855258 DOI: 10.1111/evo.12207] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2013] [Accepted: 06/25/2013] [Indexed: 12/14/2022]
Abstract
The role of adaptation in molecular evolution has been contentious for decades. Here, we shed light on the adaptive potential in Saccharomyces cerevisiae by presenting systematic fitness measurements for all possible point mutations in a region of Hsp90 under four environmental conditions. Under elevated salinity, we observe numerous beneficial mutations with growth advantages up to 7% relative to the wild type. All of these beneficial mutations were observed to be associated with high costs of adaptation. We thus demonstrate that an essential protein can harbor adaptive potential upon an environmental challenge, and report a remarkable fit of the data to a version of Fisher's geometric model that focuses on the fitness trade-offs between mutations in different environments.
Collapse
Affiliation(s)
- Ryan T. Hietpas
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Claudia Bank
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB)
| | - Jeffrey D. Jensen
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB)
| | - Daniel N. A. Bolon
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| |
Collapse
|
47
|
Joost S, Vuilleumier S, Jensen JD, Schoville S, Leempoel K, Stucki S, Widmer I, Melodelima C, Rolland J, Manel S. Uncovering the genetic basis of adaptive change: on the intersection of landscape genomics and theoretical population genetics. Mol Ecol 2013; 22:3659-65. [DOI: 10.1111/mec.12352] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Stéphane Joost
- Laboratory of Geographic Information Systems (LASIG); School of Civil and Environmental Engineering (ENAC); École Polytechnique Fédérale de Lausanne (EPFL); Bâtiment GC Station 18 1015 Lausanne Switzerland
| | - Séverine Vuilleumier
- Department of Ecology and Evolution; University of Lausanne; Biophore Building 1015 Lausanne Switzerland
| | - Jeffrey D. Jensen
- Institute of Bioengineering, School of Life Sciences; École Polytechnique Fédérale de Lausanne (EPFL); Lausanne Switzerland
- Swiss Institute of Bioinformatics; 1015 Lausanne Switzerland
| | - Sean Schoville
- CNRS, TIMC-IMAG UMR 5525; Université Joseph Fourier; 38041 Grenoble France
| | - Kevin Leempoel
- Laboratory of Geographic Information Systems (LASIG); School of Civil and Environmental Engineering (ENAC); École Polytechnique Fédérale de Lausanne (EPFL); Bâtiment GC Station 18 1015 Lausanne Switzerland
| | - Sylvie Stucki
- Laboratory of Geographic Information Systems (LASIG); School of Civil and Environmental Engineering (ENAC); École Polytechnique Fédérale de Lausanne (EPFL); Bâtiment GC Station 18 1015 Lausanne Switzerland
| | - Ivo Widmer
- Laboratory of Geographic Information Systems (LASIG); School of Civil and Environmental Engineering (ENAC); École Polytechnique Fédérale de Lausanne (EPFL); Bâtiment GC Station 18 1015 Lausanne Switzerland
| | - Christelle Melodelima
- Laboratoire d'Ecologie Alpine; UMR-CNRS 5553; Université Joseph Fourier; 38041 Grenoble France
| | - Jonathan Rolland
- Centre de mathématiques appliquées; Ecole Polytechnique; 91128 Palaiseau Cedex France
| | - Stéphanie Manel
- Laboratoire Population Environnement Développement; UMR 151 UP/IRD; Université Aix Marseille; 3 place Victor Hugo 13331 Marseille Cedex 03 France
- UMR BotAnique et BioinforMatique de l'Architecture des Plantes (AMAP); TA A51/PS2 34398 Montpellier Cedex 5 France
| |
Collapse
|
48
|
Ilinsky Y. Coevolution of Drosophila melanogaster mtDNA and Wolbachia genotypes. PLoS One 2013; 8:e54373. [PMID: 23349865 PMCID: PMC3547870 DOI: 10.1371/journal.pone.0054373] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2012] [Accepted: 12/11/2012] [Indexed: 11/20/2022] Open
Abstract
Maternally inherited microorganisms can influence the mtDNA pattern of variation in hosts. This influence is driven by selection among symbionts and can cause the frequency of mitochondrial variants in the population to eventually increase or decrease. Wolbachia infection is common and widespread in Drosophila melanogaster populations. We compared genetic variability of D. melanogaster mitotypes with Wolbachia genotypes among isofemale lines associated with different geographic locations and time intervals to study coevolution of the mtDNA and Wolbachia. Phylogenetic analysis of D. melanogaster mtDNA revealed two clades diverged in Africa, each associated with one of the two Wolbachia genotype groups. No evidence of horizontal transmission of Wolbachia between maternal lineages has been found. All the mtDNA variants that occur in infected isofemale lines are found in uninfected isofemale lines and vice versa, which is indicative of a recent loss of infection from some maternal fly lineages and confirms a significant role of Wolbachia in the D. melanogaster mtDNA pattern of variation. Finally, we present a comparative analysis of biogeographic distribution of D. melanogaster mitotypes all over the world.
Collapse
Affiliation(s)
- Yury Ilinsky
- Laboratory of Populations Genetics, Institute of Cytology and Genetics of Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia.
| |
Collapse
|
49
|
Duchen P, Zivkovic D, Hutter S, Stephan W, Laurent S. Demographic inference reveals African and European admixture in the North American Drosophila melanogaster population. Genetics 2013; 193:291-301. [PMID: 23150605 PMCID: PMC3527251 DOI: 10.1534/genetics.112.145912] [Citation(s) in RCA: 97] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2012] [Accepted: 10/30/2012] [Indexed: 11/18/2022] Open
Abstract
Drosophila melanogaster spread from sub-Saharan Africa to the rest of the world colonizing new environments. Here, we modeled the joint demography of African (Zimbabwe), European (The Netherlands), and North American (North Carolina) populations using an approximate Bayesian computation (ABC) approach. By testing different models (including scenarios with continuous migration), we found that admixture between Africa and Europe most likely generated the North American population, with an estimated proportion of African ancestry of 15%. We also revisited the demography of the ancestral population (Africa) and found-in contrast to previous work-that a bottleneck fits the history of the population of Zimbabwe better than expansion. Finally, we compared the site-frequency spectrum of the ancestral population to analytical predictions under the estimated bottleneck model.
Collapse
Affiliation(s)
- Pablo Duchen
- Evolutionary Biology, University of Munich, 82152 Planegg-Martinsried, Germany.
| | | | | | | | | |
Collapse
|
50
|
Chan AH, Jenkins PA, Song YS. Genome-wide fine-scale recombination rate variation in Drosophila melanogaster. PLoS Genet 2012; 8:e1003090. [PMID: 23284288 PMCID: PMC3527307 DOI: 10.1371/journal.pgen.1003090] [Citation(s) in RCA: 191] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2012] [Accepted: 09/29/2012] [Indexed: 01/18/2023] Open
Abstract
Estimating fine-scale recombination maps of Drosophila from population genomic data is a challenging problem, in particular because of the high background recombination rate. In this paper, a new computational method is developed to address this challenge. Through an extensive simulation study, it is demonstrated that the method allows more accurate inference, and exhibits greater robustness to the effects of natural selection and noise, compared to a well-used previous method developed for studying fine-scale recombination rate variation in the human genome. As an application, a genome-wide analysis of genetic variation data is performed for two Drosophila melanogaster populations, one from North America (Raleigh, USA) and the other from Africa (Gikongoro, Rwanda). It is shown that fine-scale recombination rate variation is widespread throughout the D. melanogaster genome, across all chromosomes and in both populations. At the fine-scale, a conservative, systematic search for evidence of recombination hotspots suggests the existence of a handful of putative hotspots each with at least a tenfold increase in intensity over the background rate. A wavelet analysis is carried out to compare the estimated recombination maps in the two populations and to quantify the extent to which recombination rates are conserved. In general, similarity is observed at very broad scales, but substantial differences are seen at fine scales. The average recombination rate of the X chromosome appears to be higher than that of the autosomes in both populations, and this pattern is much more pronounced in the African population than the North American population. The correlation between various genomic features—including recombination rates, diversity, divergence, GC content, gene content, and sequence quality—is examined using the wavelet analysis, and it is shown that the most notable difference between D. melanogaster and humans is in the correlation between recombination and diversity. Recombination is a process by which chromosomes exchange genetic material during meiosis. It is important in evolution because it provides offspring with new combinations of genes, and so estimating the rate of recombination is of fundamental importance in various population genomic inference problems. In this paper, we develop a new statistical method to enable robust estimation of fine-scale recombination maps of Drosophila, a genus of common fruit flies, in which the background recombination rate is high and natural selection has been prevalent. We apply our method to produce fine-scale recombination maps for a North American population and an African population of D. melanogaster. For both populations, we find extensive fine-scale variation in recombination rate throughout the genome. We provide a quantitative characterization of the similarities and differences between the recombination maps of the two populations; our study reveals high correlation at broad scales and low correlation at fine scales, as has been documented among human populations. We also examine the correlation between various genomic features. Furthermore, using a conservative approach, we find a handful of putative recombination “hotspot” regions with solid statistical support for a local elevation of at least 10 times the background recombination rate.
Collapse
Affiliation(s)
- Andrew H. Chan
- Computer Science Division, University of California Berkeley, Berkeley, California, United States of America
| | - Paul A. Jenkins
- Computer Science Division, University of California Berkeley, Berkeley, California, United States of America
| | - Yun S. Song
- Computer Science Division, University of California Berkeley, Berkeley, California, United States of America
- Department of Statistics, University of California Berkeley, Berkeley, California, United States of America
- * E-mail:
| |
Collapse
|