1
|
McDonough Y, Ruzicka F, Connallon T. Reconciling theories of dominance with the relative rates of adaptive substitution on sex chromosomes and autosomes. Proc Natl Acad Sci U S A 2024; 121:e2406335121. [PMID: 39436652 PMCID: PMC11536091 DOI: 10.1073/pnas.2406335121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Accepted: 09/16/2024] [Indexed: 10/23/2024] Open
Abstract
The dominance of beneficial mutations is a key evolutionary parameter affecting the rate and genetic basis of adaptation, yet it is notoriously difficult to estimate. A leading method to infer it is to compare the relative rates of adaptive substitution for X-linked and autosomal genes, which-according to a classic model by Charlesworth et al. (1987)-is a simple function of the dominance of new beneficial mutations. Recent evidence that rates of adaptive substitution are faster for X-linked genes implies, accordingly, that beneficial mutations are usually recessive. However, this conclusion is incompatible with leading theories of dominance, which predict that beneficial mutations tend to be dominant or overdominant with respect to fitness. To address this incompatibility, we use Fisher's geometric model to predict the distribution of fitness effects of new mutations and the relative rates of positively selected substitution on the X and autosomes. Previous predictions of faster-X theory emerge as a special case of our model in which the phenotypic effects of mutations are small relative to the distance to the phenotypic optimum. But as mutational effects become large relative to the optimum, we observe an elevated tempo of positively selected substitutions on the X relative to the autosomes across a broader range of dominance conditions, including those predicted by theories of dominance. Our results imply that, contrary to previous models, dominant and overdominant beneficial mutations can plausibly generate patterns of faster-X adaptation. We discuss resulting implications for genomic studies of adaptation and inferences of dominance.
Collapse
Affiliation(s)
- Yasmine McDonough
- School of Biological Sciences, Monash University, Clayton, VIC3800, Australia
| | - Filip Ruzicka
- School of Biological Sciences, Monash University, Clayton, VIC3800, Australia
- Institute of Science and Technology Austria, Klosterneuburg3400, Austria
| | - Tim Connallon
- School of Biological Sciences, Monash University, Clayton, VIC3800, Australia
| |
Collapse
|
2
|
Kyriazis CC, Lohmueller KE. Constraining models of dominance for nonsynonymous mutations in the human genome. PLoS Genet 2024; 20:e1011198. [PMID: 39302992 PMCID: PMC11446423 DOI: 10.1371/journal.pgen.1011198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Revised: 10/02/2024] [Accepted: 09/04/2024] [Indexed: 09/22/2024] Open
Abstract
Dominance is a fundamental parameter in genetics, determining the dynamics of natural selection on deleterious and beneficial mutations, the patterns of genetic variation in natural populations, and the severity of inbreeding depression in a population. Despite this importance, dominance parameters remain poorly known, particularly in humans or other non-model organisms. A key reason for this lack of information about dominance is that it is extremely challenging to disentangle the selection coefficient (s) of a mutation from its dominance coefficient (h). Here, we explore dominance and selection parameters in humans by fitting models to the site frequency spectrum (SFS) for nonsynonymous mutations. When assuming a single dominance coefficient for all nonsynonymous mutations, we find that numerous h values can fit the data, so long as h is greater than ~0.15. Moreover, we also observe that theoretically-predicted models with a negative relationship between h and s can also fit the data well, including models with h = 0.05 for strongly deleterious mutations. Finally, we use our estimated dominance and selection parameters to inform simulations revisiting the question of whether the out-of-Africa bottleneck has led to differences in genetic load between African and non-African human populations. These simulations suggest that the relative burden of genetic load in non-African populations depends on the dominance model assumed, with slight increases for more weakly recessive models and slight decreases shown for more strongly recessive models. Moreover, these results also demonstrate that models of partially recessive nonsynonymous mutations can explain the observed severity of inbreeding depression in humans, bridging the gap between molecular population genetics and direct measures of fitness in humans. Our work represents a comprehensive assessment of dominance and deleterious variation in humans, with implications for parameterizing models of deleterious variation in humans and other mammalian species.
Collapse
Affiliation(s)
- Christopher C. Kyriazis
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, United States of America
| | - Kirk E. Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, United States of America
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, California, United States of America
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California, United States of America
| |
Collapse
|
3
|
Di C, Lohmueller KE. Revisiting Dominance in Population Genetics. Genome Biol Evol 2024; 16:evae147. [PMID: 39114967 PMCID: PMC11306932 DOI: 10.1093/gbe/evae147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/24/2024] [Indexed: 08/11/2024] Open
Abstract
Dominance refers to the effect of a heterozygous genotype relative to that of the two homozygous genotypes. The degree of dominance of mutations for fitness can have a profound impact on how deleterious and beneficial mutations change in frequency over time as well as on the patterns of linked neutral genetic variation surrounding such selected alleles. Since dominance is such a fundamental concept, it has received immense attention throughout the history of population genetics. Early work from Fisher, Wright, and Haldane focused on understanding the conceptual basis for why dominance exists. More recent work has attempted to test these theories and conceptual models by estimating dominance effects of mutations. However, estimating dominance coefficients has been notoriously challenging and has only been done in a few species in a limited number of studies. In this review, we first describe some of the early theoretical and conceptual models for understanding the mechanisms for the existence of dominance. Second, we discuss several approaches used to estimate dominance coefficients and summarize estimates of dominance coefficients. We note trends that have been observed across species, types of mutations, and functional categories of genes. By comparing estimates of dominance coefficients for different types of genes, we test several hypotheses for the existence of dominance. Lastly, we discuss how dominance influences the dynamics of beneficial and deleterious mutations in populations and how the degree of dominance of deleterious mutations influences the impact of inbreeding on fitness.
Collapse
Affiliation(s)
- Chenlu Di
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA, USA
| | - Kirk E Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA, USA
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, Los Angeles, CA, USA
| |
Collapse
|
4
|
Jiang J, Xu YC, Zhang ZQ, Chen JF, Niu XM, Hou XH, Li XT, Wang L, Zhang YE, Ge S, Guo YL. Forces driving transposable element load variation during Arabidopsis range expansion. THE PLANT CELL 2024; 36:840-862. [PMID: 38036296 PMCID: PMC10980350 DOI: 10.1093/plcell/koad296] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 10/25/2023] [Accepted: 11/06/2023] [Indexed: 12/02/2023]
Abstract
Genetic load refers to the accumulated and potentially life-threatening deleterious mutations in populations. Understanding the mechanisms underlying genetic load variation of transposable element (TE) insertion, a major large-effect mutation, during range expansion is an intriguing question in biology. Here, we used 1,115 global natural accessions of Arabidopsis (Arabidopsis thaliana) to study the driving forces of TE load variation during its range expansion. TE load increased with range expansion, especially in the recently established Yangtze River basin population. Effective population size, which explains 62.0% of the variance in TE load, high transposition rate, and selective sweeps contributed to TE accumulation in the expanded populations. We genetically mapped and identified multiple candidate causal genes and TEs, and revealed the genetic architecture of TE load variation. Overall, this study reveals the variation in TE genetic load during Arabidopsis expansion and highlights the causes of TE load variation from the perspectives of both population genetics and quantitative genetics.
Collapse
Affiliation(s)
- Juan Jiang
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yong-Chao Xu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
| | - Zhi-Qin Zhang
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jia-Fu Chen
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiao-Min Niu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
| | - Xing-Hui Hou
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
| | - Xin-Tong Li
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Li Wang
- Agricultural Synthetic Biology Center, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518000, China
| | - Yong E Zhang
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- State Key Laboratory of Integrated Management of Pest Insects and Rodents & Key Laboratory of the Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Song Ge
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ya-Long Guo
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- China National Botanical Garden, Beijing 100093, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
5
|
Kyriazis CC, Lohmueller KE. Constraining models of dominance for nonsynonymous mutations in the human genome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.25.582010. [PMID: 38463985 PMCID: PMC10925099 DOI: 10.1101/2024.02.25.582010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Dominance is a fundamental parameter in genetics, determining the dynamics of natural selection on deleterious and beneficial mutations, the patterns of genetic variation in natural populations, and the severity of inbreeding depression in a population. Despite this importance, dominance parameters remain poorly known, particularly in humans or other non-model organisms. A key reason for this lack of information about dominance is that it is extremely challenging to disentangle the selection coefficient (s) of a mutation from its dominance coefficient (h). Here, we explore dominance and selection parameters in humans by fitting models to the site frequency spectrum (SFS) for nonsynonymous mutations. When assuming a single dominance coefficient for all nonsynonymous mutations, we find that numerous h values can fit the data, so long as h is greater than ~0.15. Moreover, we also observe that theoretically-predicted models with a negative relationship between h and s can also fit the data well, including models with h=0.05 for strongly deleterious mutations. Finally, we use our estimated dominance and selection parameters to inform simulations revisiting the question of whether the out-of-Africa bottleneck has led to differences in genetic load between African and non-African human populations. These simulations suggest that the relative burden of genetic load in non-African populations depends on the dominance model assumed, with slight increases for more weakly recessive models and slight decreases shown for more strongly recessive models. Moreover, these results also demonstrate that models of partially recessive nonsynonymous mutations can explain the observed severity of inbreeding depression in humans, bridging the gap between molecular population genetics and direct measures of fitness in humans. Our work represents a comprehensive assessment of dominance and deleterious variation in humans, with implications for parameterizing models of deleterious variation in humans and other mammalian species.
Collapse
Affiliation(s)
| | - Kirk E. Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, USA
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, USA
- Department of Human Genetics, David Geffen School of Medicine, Los Angeles, USA
| |
Collapse
|
6
|
Jain K, Kaushik S. Joint effect of changing selection and demography on the site frequency spectrum. Theor Popul Biol 2022; 146:46-60. [PMID: 35809866 DOI: 10.1016/j.tpb.2022.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 06/14/2022] [Accepted: 07/03/2022] [Indexed: 10/17/2022]
Abstract
The site frequency spectrum (SFS) is an important statistic that summarizes the molecular variation in a population, and is used to estimate population-genetic parameters and detect natural selection. Here, we study the SFS in a randomly mating, diploid population in which both the population size and selection coefficient vary periodically with time using a diffusion theory approach, and derive simple analytical expressions for the time-averaged SFS in slowly and rapidly changing environments. We show that for strong selection and in slowly changing environments where the population experiences both positive and negative cycles of the selection coefficient, the time-averaged SFS differs significantly from the equilibrium SFS in a constant environment. The deviation is found to depend on the time spent by the population in the deleterious part of the selection cycle and the phase difference between the selection coefficient and population size, and can be captured by an effective population size.
Collapse
Affiliation(s)
- Kavita Jain
- Theoretical Sciences Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore 560064, India.
| | - Sachin Kaushik
- Theoretical Sciences Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore 560064, India
| |
Collapse
|
7
|
Balick DJ, Jordan DM, Sunyaev S, Do R. Overcoming constraints on the detection of recessive selection in human genes from population frequency data. Am J Hum Genet 2022; 109:33-49. [PMID: 34951958 DOI: 10.1016/j.ajhg.2021.12.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 11/30/2021] [Indexed: 11/01/2022] Open
Abstract
The identification of genes that evolve under recessive natural selection is a long-standing goal of population genetics research that has important applications to the discovery of genes associated with disease. We found that commonly used methods to evaluate selective constraint at the gene level are highly sensitive to genes under heterozygous selection but ubiquitously fail to detect recessively evolving genes. Additionally, more sophisticated likelihood-based methods designed to detect recessivity similarly lack power for a human gene of realistic length from current population sample sizes. However, extensive simulations suggested that recessive genes may be detectable in aggregate. Here, we offer a method informed by population genetics simulations designed to detect recessive purifying selection in gene sets. Applying this to empirical gene sets produced significant enrichments for strong recessive selection in genes previously inferred to be under recessive selection in a consanguineous cohort and in genes involved in autosomal recessive monogenic disorders.
Collapse
|
8
|
Amei A, Xu J. Inference of genetic forces using a Poisson random field model with non-constant population size. J Stat Plan Inference 2019. [DOI: 10.1016/j.jspi.2019.02.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
9
|
Amei A, Zhou S. Inferring the distribution of selective effects from a time inhomogeneous model. PLoS One 2019; 14:e0194709. [PMID: 30657757 PMCID: PMC6338356 DOI: 10.1371/journal.pone.0194709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2017] [Accepted: 03/08/2018] [Indexed: 11/18/2022] Open
Abstract
We have developed a Poisson random field model for estimating the distribution of selective effects of newly arisen nonsynonymous mutations that could be observed as polymorphism or divergence in samples of two related species under the assumption that the two species populations are not at mutation-selection-drift equilibrium. The model is applied to 91Drosophila genes by comparing levels of polymorphism in an African population of D. melanogaster with divergence to a reference strain of D. simulans. Based on the difference of gene expression level between testes and ovaries, the 91 genes were classified as 33 male-biased, 28 female-biased, and 30 sex-unbiased genes. Under a Bayesian framework, Markov chain Monte Carlo simulations are implemented to the model in which the distribution of selective effects is assumed to be Gaussian with a mean that may differ from one gene to the other to sample key parameters. Based on our estimates, the majority of newly-arisen nonsynonymous mutations that could contribute to polymorphism or divergence in Drosophila species are mildly deleterious with a mean scaled selection coefficient of -2.81, while almost 86% of the fixed differences between species are driven by positive selection. There are only 16.6% of the nonsynonymous mutations observed in sex-unbiased genes that are under positive selection in comparison to 30% of male-biased and 46% of female-biased genes that are beneficial. We also estimated that D. melanogaster and D. simulans may have diverged 1.72 million years ago.
Collapse
Affiliation(s)
- Amei Amei
- Department of Mathematical Sciences, University of Nevada, Las Vegas, Nevada, United States of America
- * E-mail:
| | - Shilei Zhou
- 54 Crescent Ave, Apt G, Dorchester, Massachusetts, United States of America
| |
Collapse
|
10
|
Huber CD, Durvasula A, Hancock AM, Lohmueller KE. Gene expression drives the evolution of dominance. Nat Commun 2018; 9:2750. [PMID: 30013096 PMCID: PMC6048131 DOI: 10.1038/s41467-018-05281-7] [Citation(s) in RCA: 66] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Accepted: 06/08/2018] [Indexed: 12/30/2022] Open
Abstract
Dominance is a fundamental concept in molecular genetics and has implications for understanding patterns of genetic variation, evolution, and complex traits. However, despite its importance, the degree of dominance in natural populations is poorly quantified. Here, we leverage multiple mating systems in natural populations of Arabidopsis to co-estimate the distribution of fitness effects and dominance coefficients of new amino acid changing mutations. We find that more deleterious mutations are more likely to be recessive than less deleterious mutations. Further, this pattern holds across gene categories, but varies with the connectivity and expression patterns of genes. Our work argues that dominance arises as a consequence of the functional importance of genes and their optimal expression levels. Dominance is difficult to measure in natural populations as it is confounded with fitness. Here, Huber et al. developed a new approach to co-estimate dominance and selection coefficients, and found that the observed relationship is best fit by a new model of dominance based on gene expression level.
Collapse
Affiliation(s)
- Christian D Huber
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA, 90095, USA.
| | - Arun Durvasula
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | - Angela M Hancock
- Department of Plant Developmental Biology, Max Planck Institute for Plant Breeding Research, 50829, Cologne, Germany
| | - Kirk E Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA, 90095, USA. .,Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA. .,Interdepartmental Program in Bioinformatics, University of California, Los Angeles, CA, 90095, USA.
| |
Collapse
|
11
|
Accelerating Wright-Fisher Forward Simulations on the Graphics Processing Unit. G3-GENES GENOMES GENETICS 2017; 7:3229-3236. [PMID: 28768689 PMCID: PMC5592947 DOI: 10.1534/g3.117.300103] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Forward Wright–Fisher simulations are powerful in their ability to model complex demography and selection scenarios, but suffer from slow execution on the Central Processor Unit (CPU), thus limiting their usefulness. However, the single-locus Wright–Fisher forward algorithm is exceedingly parallelizable, with many steps that are so-called “embarrassingly parallel,” consisting of a vast number of individual computations that are all independent of each other and thus capable of being performed concurrently. The rise of modern Graphics Processing Units (GPUs) and programming languages designed to leverage the inherent parallel nature of these processors have allowed researchers to dramatically speed up many programs that have such high arithmetic intensity and intrinsic concurrency. The presented GPU Optimized Wright–Fisher simulation, or “GO Fish” for short, can be used to simulate arbitrary selection and demographic scenarios while running over 250-fold faster than its serial counterpart on the CPU. Even modest GPU hardware can achieve an impressive speedup of over two orders of magnitude. With simulations so accelerated, one can not only do quick parametric bootstrapping of previously estimated parameters, but also use simulated results to calculate the likelihoods and summary statistics of demographic and selection models against real polymorphism data, all without restricting the demographic and selection scenarios that can be modeled or requiring approximations to the single-locus forward algorithm for efficiency. Further, as many of the parallel programming techniques used in this simulation can be applied to other computationally intensive algorithms important in population genetics, GO Fish serves as an exciting template for future research into accelerating computation in evolution. GO Fish is part of the Parallel PopGen Package available at: http://dl42.github.io/ParallelPopGen/.
Collapse
|
12
|
Estimating the selective effects of heterozygous protein-truncating variants from human exome data. Nat Genet 2017; 49:806-810. [PMID: 28369035 PMCID: PMC5618255 DOI: 10.1038/ng.3831] [Citation(s) in RCA: 112] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Accepted: 03/07/2017] [Indexed: 12/14/2022]
Abstract
The dispensability of individual genes for viability has interested generations of geneticists. For some genes it is essential to maintain two functional chromosomal copies, while others may tolerate the loss of one or both copies. Exome sequence data from 60,706 individuals provide sufficient observations of rare protein truncating variants (PTVs) to make genome-wide estimates of selection against heterozygous loss of gene function. The cumulative frequency of rare deleterious PTVs is primarily determined by the balance between incoming mutations and purifying selection rather than genetic drift. This enables the estimation of the genome-wide distribution of selection coefficients for heterozygous PTVs and corresponding Bayesian estimates for individual genes. The strength of selection can discriminate the severity, age of onset, and mode of inheritance in Mendelian exome sequencing cases. We find that genes under the strongest selection are enriched in embryonic lethal mouse knockouts, putatively cell-essential genes, Mendelian disease genes, and regulators of transcription. Screening by essentiality, we find a large set of genes under strong selection that likely have critical function but have not yet been extensively annotated in published literature.
Collapse
|
13
|
A flexible method for estimating the fraction of fitness influencing mutations from large sequencing data sets. Genome Res 2016; 26:834-43. [PMID: 27197222 PMCID: PMC4889975 DOI: 10.1101/gr.203059.115] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Accepted: 04/14/2016] [Indexed: 01/07/2023]
Abstract
A continuing challenge in the analysis of massively large sequencing data sets is quantifying and interpreting non-neutrally evolving mutations. Here, we describe a flexible and robust approach based on the site frequency spectrum to estimate the fraction of deleterious and adaptive variants from large-scale sequencing data sets. We applied our method to approximately 1 million single nucleotide variants (SNVs) identified in high-coverage exome sequences of 6515 individuals. We estimate that the fraction of deleterious nonsynonymous SNVs is higher than previously reported; quantify the effects of genomic context, codon bias, chromatin accessibility, and number of protein-protein interactions on deleterious protein-coding SNVs; and identify pathways and networks that have likely been influenced by positive selection. Furthermore, we show that the fraction of deleterious nonsynonymous SNVs is significantly higher for Mendelian versus complex disease loci and in exons harboring dominant versus recessive Mendelian mutations. In summary, as genome-scale sequencing data accumulate in progressively larger sample sizes, our method will enable increasingly high-resolution inferences into the characteristics and determinants of non-neutral variation.
Collapse
|
14
|
Computation of the Likelihood of Joint Site Frequency Spectra Using Orthogonal Polynomials. COMPUTATION 2016. [DOI: 10.3390/computation4010006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
15
|
Vogl C, Bergman J. Inference of directional selection and mutation parameters assuming equilibrium. Theor Popul Biol 2015; 106:71-82. [PMID: 26597774 DOI: 10.1016/j.tpb.2015.10.003] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2015] [Revised: 09/30/2015] [Accepted: 10/07/2015] [Indexed: 01/15/2023]
Abstract
In a classical study, Wright (1931) proposed a model for the evolution of a biallelic locus under the influence of mutation, directional selection and drift. He derived the equilibrium distribution of the allelic proportion conditional on the scaled mutation rate, the mutation bias and the scaled strength of directional selection. The equilibrium distribution can be used for inference of these parameters with genome-wide datasets of "site frequency spectra" (SFS). Assuming that the scaled mutation rate is low, Wright's model can be approximated by a boundary-mutation model, where mutations are introduced into the population exclusively from sites fixed for the preferred or unpreferred allelic states. With the boundary-mutation model, inference can be partitioned: (i) the shape of the SFS distribution within the polymorphic region is determined by random drift and directional selection, but not by the mutation parameters, such that inference of the selection parameter relies exclusively on the polymorphic sites in the SFS; (ii) the mutation parameters can be inferred from the amount of polymorphic and monomorphic preferred and unpreferred alleles, conditional on the selection parameter. Herein, we derive maximum likelihood estimators for the mutation and selection parameters in equilibrium and apply the method to simulated SFS data as well as empirical data from a Madagascar population of Drosophila simulans.
Collapse
Affiliation(s)
- Claus Vogl
- Institute of Animal Breeding and Genetics, Veterinärmedizinische Universität Wien, Veterinärplatz 1, A-1210 Vienna, Austria.
| | - Juraj Bergman
- Institute of Population Genetics, Veterinärmedizinische Universität Wien, Veterinärplatz 1, A-1210 Vienna, Austria; Vienna Graduate School of Population Genetics, Veterinärmedizinische Universität Wien, Veterinärplatz 1, A-1210 Vienna, Austria.
| |
Collapse
|
16
|
Veeramah KR, Gutenkunst RN, Woerner AE, Watkins JC, Hammer MF. Evidence for increased levels of positive and negative selection on the X chromosome versus autosomes in humans. Mol Biol Evol 2014; 31:2267-82. [PMID: 24830675 DOI: 10.1093/molbev/msu166] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Partially recessive variants under positive selection are expected to go to fixation more quickly on the X chromosome as a result of hemizygosity, an effect known as faster-X. Conversely, purifying selection is expected to reduce substitution rates more effectively on the X chromosome. Previous work in humans contrasted divergence on the autosomes and X chromosome, with results tending to support the faster-X effect. However, no study has yet incorporated both divergence and polymorphism to quantify the effects of both purifying and positive selection, which are opposing forces with respect to divergence. In this study, we develop a framework that integrates previously developed theory addressing differential rates of X and autosomal evolution with methods that jointly estimate the level of purifying and positive selection via modeling of the distribution of fitness effects (DFE). We then utilize this framework to estimate the proportion of nonsynonymous substitutions fixed by positive selection (α) using exome sequence data from a West African population. We find that varying the female to male breeding ratio (β) has minimal impact on the DFE for the X chromosome, especially when compared with the effect of varying the dominance coefficient of deleterious alleles (h). Estimates of α range from 46% to 51% and from 4% to 24% for the X chromosome and autosomes, respectively. While dependent on h, the magnitude of the difference between α values estimated for these two systems is highly statistically significant over a range of biologically realistic parameter values, suggesting faster-X has been operating in humans.
Collapse
Affiliation(s)
- Krishna R Veeramah
- Arizona Research Laboratories Division of Biotechnology, University of ArizonaDepartment of Ecology and Evolution, Stony Brook University
| | | | - August E Woerner
- Arizona Research Laboratories Division of Biotechnology, University of Arizona
| | | | - Michael F Hammer
- Arizona Research Laboratories Division of Biotechnology, University of Arizona
| |
Collapse
|
17
|
Amei A, Smith BT. Robust estimates of divergence times and selection with a poisson random field model: a case study of comparative phylogeographic data. Genetics 2014; 196:225-33. [PMID: 24142896 PMCID: PMC3872187 DOI: 10.1534/genetics.113.157776] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2013] [Accepted: 10/11/2013] [Indexed: 11/18/2022] Open
Abstract
Mutation frequencies can be modeled as a Poisson random field (PRF) to estimate speciation times and the degree of selection on newly arisen mutations. This approach provides a quantitative theory for comparing intraspecific polymorphism with interspecific divergence in the presence of selection and can be used to estimate population genetic parameters. Although the original PRF model has been extended to more general biological settings to make statistical inference about selection and divergence among model organisms, it has not been incorporated into phylogeographic studies that focus on estimating population genetic parameters for nonmodel organisms. Here, we modified a recently developed time-dependent PRF model to independently estimate genetic parameters from a nuclear and mitochondrial DNA data set of 22 sister pairs of birds that have diverged across a biogeographic barrier. We found that species that inhabit humid habitats had more recent divergence times and larger effective population sizes than those that inhabit drier habitats, and divergence time estimated from the PRF model were similar to estimates from a coalescent species-tree approach. Selection coefficients were higher in sister pairs that inhabited drier habitats than in those in humid habitats, but overall the mitochondrial DNA was under weak selection. Our study indicates that PRF models are useful for estimating various population genetic parameters and serve as a framework for incorporating estimates of selection into comparative phylogeographic studies.
Collapse
Affiliation(s)
- Amei Amei
- Department of Mathematical Sciences, University of Nevada, Las Vegas, Nevada 89154
| | - Brian Tilston Smith
- Museum of Natural Science, Louisiana State University, Baton Rouge, Louisiana 70803
| |
Collapse
|
18
|
Eilertson KE, Booth JG, Bustamante CD. SnIPRE: selection inference using a Poisson random effects model. PLoS Comput Biol 2012; 8:e1002806. [PMID: 23236270 PMCID: PMC3516574 DOI: 10.1371/journal.pcbi.1002806] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2012] [Accepted: 10/17/2012] [Indexed: 12/28/2022] Open
Abstract
We present an approach for identifying genes under natural selection using polymorphism and divergence data from synonymous and non-synonymous sites within genes. A generalized linear mixed model is used to model the genome-wide variability among categories of mutations and estimate its functional consequence. We demonstrate how the model's estimated fixed and random effects can be used to identify genes under selection. The parameter estimates from our generalized linear model can be transformed to yield population genetic parameter estimates for quantities including the average selection coefficient for new mutations at a locus, the synonymous and non-synynomous mutation rates, and species divergence times. Furthermore, our approach incorporates stochastic variation due to the evolutionary process and can be fit using standard statistical software. The model is fit in both the empirical Bayes and Bayesian settings using the lme4 package in R, and Markov chain Monte Carlo methods in WinBUGS. Using simulated data we compare our method to existing approaches for detecting genes under selection: the McDonald-Kreitman test, and two versions of the Poisson random field based method MKprf. Overall, we find our method universally outperforms existing methods for detecting genes subject to selection using polymorphism and divergence data.
Collapse
Affiliation(s)
- Kirsten E Eilertson
- Bioinformatics Core, J David Gladstone Institutes, San Francisco, California, United States of America.
| | | | | |
Collapse
|
19
|
Hedrick PW. What is the evidence for heterozygote advantage selection? Trends Ecol Evol 2012; 27:698-704. [PMID: 22975220 DOI: 10.1016/j.tree.2012.08.012] [Citation(s) in RCA: 126] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2012] [Revised: 08/09/2012] [Accepted: 08/10/2012] [Indexed: 10/27/2022]
Abstract
Recent genomic data have found that many genes show the signal of selection. How many of these genes are undergoing heterozygote advantage selection is only beginning to be known. Initial genomic surveys have suggested that only a small proportion of loci have polymorphisms maintained by heterozygote advantage and this is consistent with the few examples generated from other approaches within given species. Unless further studies provide large numbers of loci with heterozygote advantage, it appears that loci with heterozygote advantage must be considered only a small minority of all loci in a species. This is not to say that some heterozygote advantage loci do not have important adaptive functions, but that their role in overall evolutionary change might be more of an unusual phenomenon than a major player in adaptation.
Collapse
|
20
|
Amei A, Sawyer S. A time-dependent Poisson random field model for polymorphism within and between two related biological species. ANN APPL PROBAB 2010. [DOI: 10.1214/09-aap668] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
21
|
Elyashiv E, Bullaughey K, Sattath S, Rinott Y, Przeworski M, Sella G. Shifts in the intensity of purifying selection: an analysis of genome-wide polymorphism data from two closely related yeast species. Genome Res 2010; 20:1558-73. [PMID: 20817943 DOI: 10.1101/gr.108993.110] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
How much does the intensity of purifying selection vary among populations and species? How uniform are the shifts in selective pressures across the genome? To address these questions, we took advantage of a recent, whole-genome polymorphism data set from two closely related species of yeast, Saccharomyces cerevisiae and S. paradoxus, paying close attention to the population structure within these species. We found that the average intensity of purifying selection on amino acid sites varies markedly among populations and between species. As expected in the presence of extensive weakly deleterious mutations, the effect of purifying selection is substantially weaker on single nucleotide polymorphisms (SNPs) segregating within populations than on SNPs fixed between population samples. Also in accordance with a Nearly Neutral model, the variation in the intensity of purifying selection across populations corresponds almost perfectly to simple measures of their effective size. As a first step toward understanding the processes generating these patterns, we sought to tease apart the relative importance of systematic, genome-wide changes in the efficacy of selection, such as those expected from demographic processes and of gene-specific changes, which may be expected after a shift in selective pressures. For that purpose, we developed a new model for the evolution of purifying selection between populations and inferred its parameters from the genome-wide data using a likelihood approach. We found that most, but not all changes seem to be explained by systematic shifts in the efficacy of selection. One population, the sake-derived strains of S. cerevisiae, however, also shows extensive gene-specific changes, plausibly associated with domestication. These findings have important implications for our understanding of purifying selection as well as for estimates of the rate of molecular adaptation in yeast and in other species.
Collapse
Affiliation(s)
- Eyal Elyashiv
- Department of Evolution, Systematics, and Ecology, Hebrew University of Jerusalem, Jerusalem 91905, Israel
| | | | | | | | | | | |
Collapse
|
22
|
RoyChoudhury A, Wakeley J. Sufficiency of the number of segregating sites in the limit under finite-sites mutation. Theor Popul Biol 2010; 78:118-22. [DOI: 10.1016/j.tpb.2010.05.003] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2010] [Revised: 05/17/2010] [Accepted: 05/19/2010] [Indexed: 10/19/2022]
|
23
|
Kern AD, Haussler D. A population genetic hidden Markov model for detecting genomic regions under selection. Mol Biol Evol 2010; 27:1673-85. [PMID: 20185453 DOI: 10.1093/molbev/msq053] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Recently, hidden Markov models have been applied to numerous problems in genomics. Here, we introduce an explicit population genetics hidden Markov model (popGenHMM) that uses single nucleotide polymorphism (SNP) frequency data to identify genomic regions that have experienced recent selection. Our popGenHMM assumes that SNP frequencies are emitted independently following diffusion approximation expectations but that neighboring SNP frequencies are partially correlated by selective state. We give results from the training and application of our popGenHMM to a set of early release data from the Drosophila Population Genomics Project (dpgp.org) that consists of approximately 7.8 Mb of resequencing from 32 North American Drosophila melanogaster lines. These results demonstrate the potential utility of our model, making predictions based on the site frequency spectrum (SFS) for regions of the genome that represent selected elements.
Collapse
Affiliation(s)
- Andrew D Kern
- Department of Biological Sciences, Dartmouth College, Hanover, NH, USA.
| | | |
Collapse
|
24
|
Andrés AM, Hubisz MJ, Indap A, Torgerson DG, Degenhardt JD, Boyko AR, Gutenkunst RN, White TJ, Green ED, Bustamante CD, Clark AG, Nielsen R. Targets of balancing selection in the human genome. Mol Biol Evol 2009; 26:2755-64. [PMID: 19713326 PMCID: PMC2782326 DOI: 10.1093/molbev/msp190] [Citation(s) in RCA: 190] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/20/2009] [Indexed: 12/29/2022] Open
Abstract
Balancing selection is potentially an important biological force for maintaining advantageous genetic diversity in populations, including variation that is responsible for long-term adaptation to the environment. By serving as a means to maintain genetic variation, it may be particularly relevant to maintaining phenotypic variation in natural populations. Nevertheless, its prevalence and specific targets in the human genome remain largely unknown. We have analyzed the patterns of diversity and divergence of 13,400 genes in two human populations using an unbiased single-nucleotide polymorphism data set, a genome-wide approach, and a method that incorporates demography in neutrality tests. We identified an unbiased catalog of genes with signatures of long-term balancing selection, which includes immunity genes as well as genes encoding keratins and membrane channels; the catalog also shows enrichment in functional categories involved in cellular structure. Patterns are mostly concordant in the two populations, with a small fraction of genes showing population-specific signatures of selection. Power considerations indicate that our findings represent a subset of all targets in the genome, suggesting that although balancing selection may not have an obvious impact on a large proportion of human genes, it is a key force affecting the evolution of a number of genes in humans.
Collapse
Affiliation(s)
- Aida M Andrés
- Department of Molecular Biology and Genetics, Cornell University, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Hollister JD, Gaut BS. Epigenetic silencing of transposable elements: a trade-off between reduced transposition and deleterious effects on neighboring gene expression. Genes Dev 2009; 19:1419-28. [PMID: 19478138 PMCID: PMC2720190 DOI: 10.1101/gr.091678.109] [Citation(s) in RCA: 457] [Impact Index Per Article: 28.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2009] [Accepted: 05/20/2009] [Indexed: 12/25/2022]
Abstract
Transposable elements (TEs) are ubiquitous genomic parasites. The deleterious consequences of the presence and activity of TEs have fueled debate about the evolutionary forces countering their expansion. Purifying selection is thought to purge TE insertions from the genome, and TE sequences are targeted by hosts for epigenetic silencing. However, the interplay between epigenetic and evolutionary forces countering TE expansion remains unexplored. Here we analyze genomic, epigenetic, and population genetic data from Arabidopsis thaliana to yield three observations. First, gene expression is negatively correlated with the density of methylated TEs. Second, the signature of purifying selection is detectable for methylated TEs near genes but not for unmethylated TEs or for TEs far from genes. Third, TE insertions are distributed by age and methylation status, such that older, methylated TEs are farther from genes. Based on these observations, we present a model in which host silencing of TEs near genes has deleterious effects on neighboring gene expression, resulting in the preferential loss of methylated TEs from gene-rich chromosomal regions. This mechanism implies an evolutionary tradeoff in which the benefit of TE silencing imposes a fitness cost via deleterious effects on the expression of nearby genes.
Collapse
Affiliation(s)
- Jesse D. Hollister
- Department of Ecology and Evolutionary Biology, University of Californina, Irvine, Irvine, California 92697-2525, USA
| | - Brandon S. Gaut
- Department of Ecology and Evolutionary Biology, University of Californina, Irvine, Irvine, California 92697-2525, USA
| |
Collapse
|
26
|
Thornton KR. Automating approximate Bayesian computation by local linear regression. BMC Genet 2009; 10:35. [PMID: 19583871 PMCID: PMC2712468 DOI: 10.1186/1471-2156-10-35] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2009] [Accepted: 07/07/2009] [Indexed: 11/17/2022] Open
Abstract
Background In several biological contexts, parameter inference often relies on computationally-intensive techniques. "Approximate Bayesian Computation", or ABC, methods based on summary statistics have become increasingly popular. A particular flavor of ABC based on using a linear regression to approximate the posterior distribution of the parameters, conditional on the summary statistics, is computationally appealing, yet no standalone tool exists to automate the procedure. Here, I describe a program to implement the method. Results The software package ABCreg implements the local linear-regression approach to ABC. The advantages are: 1. The code is standalone, and fully-documented. 2. The program will automatically process multiple data sets, and create unique output files for each (which may be processed immediately in R), facilitating the testing of inference procedures on simulated data, or the analysis of multiple data sets. 3. The program implements two different transformation methods for the regression step. 4. Analysis options are controlled on the command line by the user, and the program is designed to output warnings for cases where the regression fails. 5. The program does not depend on any particular simulation machinery (coalescent, forward-time, etc.), and therefore is a general tool for processing the results from any simulation. 6. The code is open-source, and modular. Examples of applying the software to empirical data from Drosophila melanogaster, and testing the procedure on simulated data, are shown. Conclusion In practice, the ABCreg simplifies implementing ABC based on local-linear regression.
Collapse
Affiliation(s)
- Kevin R Thornton
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA, USA.
| |
Collapse
|
27
|
Nielsen R, Hubisz MJ, Hellmann I, Torgerson D, Andrés AM, Albrechtsen A, Gutenkunst R, Adams MD, Cargill M, Boyko A, Indap A, Bustamante CD, Clark AG. Darwinian and demographic forces affecting human protein coding genes. Genes Dev 2009; 19:838-49. [PMID: 19279335 PMCID: PMC2675972 DOI: 10.1101/gr.088336.108] [Citation(s) in RCA: 115] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2008] [Accepted: 02/23/2009] [Indexed: 11/24/2022]
Abstract
Past demographic changes can produce distortions in patterns of genetic variation that can mimic the appearance of natural selection unless the demographic effects are explicitly removed. Here we fit a detailed model of human demography that incorporates divergence, migration, admixture, and changes in population size to directly sequenced data from 13,400 protein coding genes from 20 European-American and 19 African-American individuals. Based on this demographic model, we use several new and established statistical methods for identifying genes with extreme patterns of polymorphism likely to be caused by Darwinian selection, providing the first genome-wide analysis of allele frequency distributions in humans based on directly sequenced data. The tests are based on observations of excesses of high frequency-derived alleles, excesses of low frequency-derived alleles, and excesses of differences in allele frequencies between populations. We detect numerous new genes with strong evidence of selection, including a number of genes related to psychiatric and other diseases. We also show that microRNA controlled genes evolve under extremely high constraints and are more likely to undergo negative selection than other genes. Furthermore, we show that genes involved in muscle development have been subject to positive selection during recent human history. In accordance with previous studies, we find evidence for negative selection against mutations in genes associated with Mendelian disease and positive selection acting on genes associated with several complex diseases.
Collapse
Affiliation(s)
- Rasmus Nielsen
- Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Abstract
Human genes responsible for inherited diseases are important for the understanding of human disease. We investigated the degree of polymorphism and divergence in the human disease genes to elucidate the effect of natural selection on human disease genes. In particular, the effect of disease dominance was incorporated into the analysis. Both dominant disease genes (DDG) and recessive disease genes (RDG) had a higher mutation rate per site and encoded longer proteins than the nondisease genes, which exposed the disease genes to a faster flux of new mutations. Using an unbiased polymorphism dataset, we found that, proportionally, RDG harbor more nonsynonymous polymorphisms compared with DDG. We estimated the selection intensity on the disease genes using polymorphism and divergence data and determined whether the different patterns of polymorphism and divergence between DDG and RDG could be explained by the difference in only dominance. Even after the dominance effect was considered, the selection intensity on RDG was significantly different from DDG, suggesting that the deleterious effect of the dominant and recessive disease mutations are fundamentally different.
Collapse
|
29
|
Abstract
The distribution of genetic polymorphisms in a population contains information about evolutionary processes. The Poisson random field (PRF) model uses the polymorphism frequency spectrum to infer the mutation rate and the strength of directional selection. The PRF model relies on an infinite-sites approximation that is reasonable for most eukaryotic populations, but that becomes problematic when is large ( greater, similar 0.05). Here, we show that at large mutation rates characteristic of microbes and viruses the infinite-sites approximation of the PRF model induces systematic biases that lead it to underestimate negative selection pressures and mutation rates and erroneously infer positive selection. We introduce two new methods that extend our ability to infer selection pressures and mutation rates at large : a finite-site modification of the PRF model and a new technique based on diffusion theory. Our methods can be used to infer not only a "weighted average" of selection pressures acting on a gene sequence, but also the distribution of selection pressures across sites. We evaluate the accuracy of our methods, as well that of the original PRF approach, by comparison with Wright-Fisher simulations.
Collapse
|
30
|
Divergence and Polymorphism Under the Nearly Neutral Theory of Molecular Evolution. J Mol Evol 2008; 67:418-26. [DOI: 10.1007/s00239-008-9146-9] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2008] [Revised: 05/26/2008] [Accepted: 07/14/2008] [Indexed: 11/26/2022]
|
31
|
Blekhman R, Man O, Herrmann L, Boyko AR, Indap A, Kosiol C, Bustamante CD, Teshima KM, Przeworski M. Natural selection on genes that underlie human disease susceptibility. Curr Biol 2008; 18:883-9. [PMID: 18571414 DOI: 10.1016/j.cub.2008.04.074] [Citation(s) in RCA: 174] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2007] [Revised: 04/27/2008] [Accepted: 04/30/2008] [Indexed: 11/16/2022]
Abstract
What evolutionary forces shape genes that contribute to the risk of human disease? Do similar selective pressures act on alleles that underlie simple versus complex disorders [1-3]? Answers to these questions will shed light onto the origin of human disorders (e.g., [4]) and help to predict the population frequencies of alleles that contribute to disease risk, with important implications for the efficient design of mapping studies [5-7]. As a first step toward addressing these questions, we created a hand-curated version of the Mendelian Inheritance in Man database (OMIM). We then examined selective pressures on Mendelian-disease genes, genes that contribute to complex-disease risk, and genes known to be essential in mouse by analyzing patterns of human polymorphism and of divergence between human and rhesus macaque. We found that Mendelian-disease genes appear to be under widespread purifying selection, especially when the disease mutations are dominant (rather than recessive). In contrast, the class of genes that influence complex-disease risk shows little signs of evolutionary conservation, possibly because this category includes targets of both purifying and positive selection.
Collapse
Affiliation(s)
- Ran Blekhman
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Sethupathy P, Hannenhalli S. A tutorial of the poisson random field model in population genetics. Adv Bioinformatics 2008; 2008:257864. [PMID: 19920987 PMCID: PMC2775679 DOI: 10.1155/2008/257864] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2008] [Accepted: 05/15/2008] [Indexed: 11/18/2022] Open
Abstract
Population genetics is the study of allele frequency changes driven by various evolutionary forces such as mutation, natural selection, and random genetic drift. Although natural selection is widely recognized as a bona-fide phenomenon, the extent to which it drives evolution continues to remain unclear and controversial. Various qualitative techniques, or so-called "tests of neutrality", have been introduced to detect signatures of natural selection. A decade and a half ago, Stanley Sawyer and Daniel Hartl provided a mathematical framework, referred to as the Poisson random field (PRF), with which to determine quantitatively the intensity of selection on a particular gene or genomic region. The recent availability of large-scale genetic polymorphism data has sparked widespread interest in genome-wide investigations of natural selection. To that end, the original PRF model is of particular interest for geneticists and evolutionary genomicists. In this article, we will provide a tutorial of the mathematical derivation of the original Sawyer and Hartl PRF model.
Collapse
Affiliation(s)
- Praveen Sethupathy
- Department of Genetics, School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Sridhar Hannenhalli
- Department of Genetics, School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Computer and Information Sciences, School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
33
|
Boyko AR, Williamson SH, Indap AR, Degenhardt JD, Hernandez RD, Lohmueller KE, Adams MD, Schmidt S, Sninsky JJ, Sunyaev SR, White TJ, Nielsen R, Clark AG, Bustamante CD. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet 2008; 4:e1000083. [PMID: 18516229 PMCID: PMC2377339 DOI: 10.1371/journal.pgen.1000083] [Citation(s) in RCA: 471] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2007] [Accepted: 04/29/2008] [Indexed: 11/19/2022] Open
Abstract
Quantifying the distribution of fitness effects among newly arising mutations in the human genome is key to resolving important debates in medical and evolutionary genetics. Here, we present a method for inferring this distribution using Single Nucleotide Polymorphism (SNP) data from a population with non-stationary demographic history (such as that of modern humans). Application of our method to 47,576 coding SNPs found by direct resequencing of 11,404 protein coding-genes in 35 individuals (20 European Americans and 15 African Americans) allows us to assess the relative contribution of demographic and selective effects to patterning amino acid variation in the human genome. We find evidence of an ancient population expansion in the sample with African ancestry and a relatively recent bottleneck in the sample with European ancestry. After accounting for these demographic effects, we find strong evidence for great variability in the selective effects of new amino acid replacing mutations. In both populations, the patterns of variation are consistent with a leptokurtic distribution of selection coefficients (e.g., gamma or log-normal) peaked near neutrality. Specifically, we predict 27-29% of amino acid changing (nonsynonymous) mutations are neutral or nearly neutral (|s|<0.01%), 30-42% are moderately deleterious (0.01%<|s|<1%), and nearly all the remainder are highly deleterious or lethal (|s|>1%). Our results are consistent with 10-20% of amino acid differences between humans and chimpanzees having been fixed by positive selection with the remainder of differences being neutral or nearly neutral. Our analysis also predicts that many of the alleles identified via whole-genome association mapping may be selectively neutral or (formerly) positively selected, implying that deleterious genetic variation affecting disease phenotype may be missed by this widely used approach for mapping genes underlying complex traits.
Collapse
Affiliation(s)
- Adam R. Boyko
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, United States of America
| | - Scott H. Williamson
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Amit R. Indap
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Jeremiah D. Degenhardt
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Ryan D. Hernandez
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Kirk E. Lohmueller
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, United States of America
| | - Mark D. Adams
- Department of Genetics, BRB-624, Case Western Reserve University, Cleveland, Ohio, United States of America
| | - Steffen Schmidt
- Division of Genetics, Department of Medicine, Brigham & Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - John J. Sninsky
- Celera Diagnostics, Alameda, California, United States of America
| | - Shamil R. Sunyaev
- Division of Genetics, Department of Medicine, Brigham & Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Thomas J. White
- Celera Diagnostics, Alameda, California, United States of America
| | - Rasmus Nielsen
- Center for Comparative Genomics, University of Copenhagen, Copenhagen, Denmark
| | - Andrew G. Clark
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, United States of America
| | - Carlos D. Bustamante
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| |
Collapse
|
34
|
Huerta-Sanchez E, Durrett R, Bustamante CD. Population genetics of polymorphism and divergence under fluctuating selection. Genetics 2008; 178:325-37. [PMID: 17947441 PMCID: PMC2206081 DOI: 10.1534/genetics.107.073361] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2007] [Accepted: 10/02/2007] [Indexed: 11/18/2022] Open
Abstract
Current methods for detecting fluctuating selection require time series data on genotype frequencies. Here, we propose an alternative approach that makes use of DNA polymorphism data from a sample of individuals collected at a single point in time. Our method uses classical diffusion approximations to model temporal fluctuations in the selection coefficients to find the expected distribution of mutation frequencies in the population. Using the Poisson random-field setting we derive the site-frequency spectrum (SFS) for three different models of fluctuating selection. We find that the general effect of fluctuating selection is to produce a more "U"-shaped site-frequency spectrum with an excess of high-frequency derived mutations at the expense of middle-frequency variants. We present likelihood-ratio tests, comparing the fluctuating selection models to the neutral model using SFS data, and use Monte Carlo simulations to assess their power. We find that we have sufficient power to reject a neutral hypothesis using samples on the order of a few hundred SNPs and a sample size of approximately 20 and power to distinguish between selection that varies in time and constant selection for a sample of size 20. We also find that fluctuating selection increases the probability of fixation of selected sites even if, on average, there is no difference in selection among a pair of alleles segregating at the locus. Fluctuating selection will, therefore, lead to an increase in the ratio of divergence to polymorphism similar to that observed under positive directional selection.
Collapse
Affiliation(s)
- Emilia Huerta-Sanchez
- Center for Applied Mathematics, Deparmtent of Mathematics, Cornell University, Ithaca, New York 14853, USA
| | | | | |
Collapse
|
35
|
Caicedo AL, Williamson SH, Hernandez RD, Boyko A, Fledel-Alon A, York TL, Polato NR, Olsen KM, Nielsen R, McCouch SR, Bustamante CD, Purugganan MD. Genome-wide patterns of nucleotide polymorphism in domesticated rice. PLoS Genet 2007; 3:1745-56. [PMID: 17907810 PMCID: PMC1994709 DOI: 10.1371/journal.pgen.0030163] [Citation(s) in RCA: 294] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2007] [Accepted: 08/06/2007] [Indexed: 11/18/2022] Open
Abstract
Domesticated Asian rice (Oryza sativa) is one of the oldest domesticated crop species in the world, having fed more people than any other plant in human history. We report the patterns of DNA sequence variation in rice and its wild ancestor, O. rufipogon, across 111 randomly chosen gene fragments, and use these to infer the evolutionary dynamics that led to the origins of rice. There is a genome-wide excess of high-frequency derived single nucleotide polymorphisms (SNPs) in O. sativa varieties, a pattern that has not been reported for other crop species. We developed several alternative models to explain contemporary patterns of polymorphisms in rice, including a (i) selectively neutral population bottleneck model, (ii) bottleneck plus migration model, (iii) multiple selective sweeps model, and (iv) bottleneck plus selective sweeps model. We find that a simple bottleneck model, which has been the dominant demographic model for domesticated species, cannot explain the derived nucleotide polymorphism site frequency spectrum in rice. Instead, a bottleneck model that incorporates selective sweeps, or a more complex demographic model that includes subdivision and gene flow, are more plausible explanations for patterns of variation in domesticated rice varieties. If selective sweeps are indeed the explanation for the observed nucleotide data of domesticated rice, it suggests that strong selection can leave its imprint on genome-wide polymorphism patterns, contrary to expectations that selection results only in a local signature of variation.
Collapse
Affiliation(s)
- Ana L Caicedo
- Department of Genetics, North Carolina State University, Raleigh, North Carolina, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Chen CTL, Wang JC, Cohen BA. The strength of selection on ultraconserved elements in the human genome. Am J Hum Genet 2007; 80:692-704. [PMID: 17357075 PMCID: PMC1852725 DOI: 10.1086/513149] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2006] [Accepted: 01/25/2007] [Indexed: 11/03/2022] Open
Abstract
Ultraconserved elements are stretches of consecutive nucleotides that are perfectly conserved in multiple mammalian genomes. Although these sequences are identical in the reference human, mouse, and rat genomes, we identified numerous polymorphisms within these regions in the human population. To determine whether polymorphisms in ultraconserved elements affect fitness, we genotyped unrelated human DNA samples at loci within these sequences. For all single-nucleotide polymorphisms tested in ultraconserved regions, individuals homozygous for derived alleles (alleles that differ from the rodent reference genomes) were present, viable, and healthy. The distribution of allele frequencies in these samples argues against strong, ongoing selection as the force maintaining the conservation of these sequences. We then used two methods to determine the minimum level of selection required to generate these sequences. Despite the lack of fixed differences in these sequences between humans and rodents, the average level of selection on ultraconserved elements is less than that on essential genes. The strength of selection associated with ultraconserved elements suggests that mutations in these regions may have subtle phenotypic consequences that are not easily detected in the laboratory.
Collapse
Affiliation(s)
- Christina T L Chen
- Department of Genetics, Center for Genome Sciences, Washington University School of Medicine, 4444 Forest Park Parkway, St. Louis, MO 63108, USA
| | | | | |
Collapse
|
37
|
|
38
|
Johnson PLF, Slatkin M. Inference of population genetic parameters in metagenomics: a clean look at messy data. Genes Dev 2006; 16:1320-7. [PMID: 16954540 PMCID: PMC1581441 DOI: 10.1101/gr.5431206] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2006] [Accepted: 07/17/2006] [Indexed: 12/21/2022]
Abstract
Metagenomic projects generate short, overlapping fragments of DNA sequence, each deriving from a different individual. We report a new method for inferring the scaled mutation rate, theta = 2Neu, and the scaled exponential growth rate, R = Ner, from the site-frequency spectrum of these data while accounting for sequencing error via Phred quality scores. After obtaining maximum likelihood parameter estimates for theta and R, we calculate empirical Bayes quality scores reflecting the posterior probability that each apparently polymorphic site is truly polymorphic; these scores can then be used for other applications such as SNP discovery. For realistic parameter ranges, analytic and simulation results show our estimates to be essentially unbiased with tight confidence intervals. In contrast, choosing an arbitrary quality score cutoff (e.g., trimming reads) and ignoring further quality information during inference yields biased estimates with greater variance. We illustrate the use of our technique on a new project analyzing activated sludge from a lab-scale bioreactor seeded by a wastewater treatment plant.
Collapse
Affiliation(s)
- Philip L F Johnson
- Biophysics Graduate Group, University of California, Berkeley, California 94720, USA.
| | | |
Collapse
|
39
|
Abstract
Our understanding of balancing selection is currently becoming greatly clarified by new sequence data being gathered from genes in which polymorphisms are known to be maintained by selection. The data can be interpreted in conjunction with results from population genetics models that include recombination between selected sites and nearby neutral marker variants. This understanding is making possible tests for balancing selection using molecular evolutionary approaches. Such tests do not necessarily require knowledge of the functional types of the different alleles at a locus, but such information, as well as information about the geographic distribution of alleles and markers near the genes, can potentially help towards understanding what form of balancing selection is acting, and how long alleles have been maintained.
Collapse
Affiliation(s)
- Deborah Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom.
| |
Collapse
|
40
|
Evans SN, Shvets Y, Slatkin M. Non-equilibrium theory of the allele frequency spectrum. Theor Popul Biol 2006; 71:109-19. [PMID: 16887160 DOI: 10.1016/j.tpb.2006.06.005] [Citation(s) in RCA: 70] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2006] [Revised: 06/05/2006] [Accepted: 06/07/2006] [Indexed: 10/24/2022]
Abstract
A forward diffusion equation describing the evolution of the allele frequency spectrum is presented. The influx of mutations is accounted for by imposing a suitable boundary condition. For a Wright-Fisher diffusion with or without selection and varying population size, the boundary condition is lim(x downward arrow0)xf(x,t)=thetarho(t), where f(.,t) is the frequency spectrum of derived alleles at independent loci at time t and rho(t) is the relative population size at time t. When population size and selection intensity are independent of time, the forward equation is equivalent to the backwards diffusion usually used to derive the frequency spectrum, but this approach allows computation of the time dependence of the spectrum both before an equilibrium is attained and when population size and selection intensity vary with time. From the diffusion equation, a set of ordinary differential equations for the moments of f(.,t) is derived and the expected spectrum of a finite sample is expressed in terms of those moments. The use of the forward equation is illustrated by considering neutral and selected alleles in a highly simplified model of human history. For example, it is shown that approximately 30% of the expected total heterozygosity of neutral loci is attributable to mutations that arose since the onset of population growth in roughly the last 150,000 years.
Collapse
Affiliation(s)
- Steven N Evans
- Department of Statistics #3860, University of California at Berkeley, 367 Evans Hall, Berkeley, CA 94720-3860, USA.
| | | | | |
Collapse
|
41
|
Comeron JM. Weak selection and recent mutational changes influence polymorphic synonymous mutations in humans. Proc Natl Acad Sci U S A 2006; 103:6940-5. [PMID: 16632609 PMCID: PMC1458998 DOI: 10.1073/pnas.0510638103] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Recent large-scale genomic and evolutionary studies have revealed the small but detectable signature of weak selection on synonymous mutations during mammalian evolution, likely acting at the level of translational efficacy (i.e., translational selection). To investigate whether weak selection, and translational selection in particular, plays any role in shaping the fate of synonymous mutations that are present today in human populations, we studied genetic variation at the polymorphic level and patterns of evolution in the human lineage after human-chimpanzee separation. We find evidence that neutral mechanisms are influencing the frequency of polymorphic mutations in humans. Our results suggest a recent increase in mutational tendencies toward AT, observed in all isochores, that is responsible for AT mutations segregating at lower frequencies than GC mutations. In all, however, changes in mutational tendencies and other neutral scenarios are not sufficient to explain a difference between synonymous and noncoding mutations or a difference between synonymous mutations potentially advantageous or deleterious under a translational selection model. Furthermore, several estimates of selection intensity on synonymous mutations all suggest a detectable influence of weak selection acting at the level of translational selection. Thus, random genetic drift, recent changes in mutational tendencies, and weak selection influence the fate of synonymous mutations that are present today as polymorphisms. All of these features, neutral and selective, should be taken into account in evolutionary analyses that often assume constancy of mutational tendencies and complete neutrality of synonymous mutations.
Collapse
Affiliation(s)
- Josep M Comeron
- Department of Biological Sciences, University of Iowa, 212 Biology Building, Iowa City, IA 52242, USA.
| |
Collapse
|
42
|
Nishino J, Tajima F. Effect of population structure on the amount of polymorphism and the fixation probability under overdominant selection. Genes Genet Syst 2006; 80:287-95. [PMID: 16284422 DOI: 10.1266/ggs.80.287] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Under overdominant selection, mutants substantially contribute to increase the amount of polymorphism. It is also known that under neutrality as the migration rates among demes decrease in a subdivided population, the amount of polymorphism increases along with the increase of the effective population size, N(e). In this study, under overdominant selection the effect of population subdivision on the amount of polymorphism was investigated using the diffusion approximation and the low migration approximation. It was shown that if selection is medium or strong (e.g., N(T)s > 1, where N(T) is the population size and s is the selective advantage of heterozygotes), the nucleotide diversity, pi, decreases along with the decrease of Nm against the increase of N(e), where N is the size of demes and m is the migration rate per deme. In addition, the ratio of the nucleotide diversity to the evolutionary rate also decreases along with the decrease of Nm. In some cases the ratio becomes smaller than that expected under neutrality as Nm decreases.
Collapse
Affiliation(s)
- Jo Nishino
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Japan
| | | |
Collapse
|
43
|
Abstract
The distribution of mutational effects on fitness is of fundamental importance for many aspects of evolution. We develop two methods for characterizing the fitness effects of deleterious, nonsynonymous mutations, using polymorphism data from two related species. These methods also provide estimates of the proportion of amino acid substitutions that are selectively favorable, when combined with data on between-species sequence divergence. The methods are applicable to species with different effective population sizes, but that share the same distribution of mutational effects. The first, simpler, method assumes that diversity for all nonneutral mutations is given by the value under mutation-selection balance, while the second method allows for stronger effects of genetic drift and yields estimates of the parameters of the probability distribution of mutational effects. We apply these methods to data on populations of Drosophila miranda and D. pseudoobscura and find evidence for the presence of deleterious nonsynonymous mutations, mostly with small heterozygous selection coefficients (a mean of the order of 10(-5) for segregating variants). A leptokurtic gamma distribution of mutational effects with a shape parameter between 0.1 and 1 can explain observed diversities, in the absence of a separate class of completely neutral nonsynonymous mutations. We also describe a simple approximate method for estimating the harmonic mean selection coefficient from diversity data on a single species.
Collapse
Affiliation(s)
- Laurence Loewe
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom.
| | | | | | | |
Collapse
|
44
|
Bustamante CD, Fledel-Alon A, Williamson S, Nielsen R, Hubisz MT, Glanowski S, Tanenbaum DM, White TJ, Sninsky JJ, Hernandez RD, Civello D, Adams MD, Cargill M, Clark AG. Natural selection on protein-coding genes in the human genome. Nature 2005; 437:1153-7. [PMID: 16237444 DOI: 10.1038/nature04240] [Citation(s) in RCA: 578] [Impact Index Per Article: 28.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2005] [Accepted: 09/14/2005] [Indexed: 11/09/2022]
Abstract
Comparisons of DNA polymorphism within species to divergence between species enables the discovery of molecular adaptation in evolutionarily constrained genes as well as the differentiation of weak from strong purifying selection. The extent to which weak negative and positive darwinian selection have driven the molecular evolution of different species varies greatly, with some species, such as Drosophila melanogaster, showing strong evidence of pervasive positive selection, and others, such as the selfing weed Arabidopsis thaliana, showing an excess of deleterious variation within local populations. Here we contrast patterns of coding sequence polymorphism identified by direct sequencing of 39 humans for over 11,000 genes to divergence between humans and chimpanzees, and find strong evidence that natural selection has shaped the recent molecular evolution of our species. Our analysis discovered 304 (9.0%) out of 3,377 potentially informative loci showing evidence of rapid amino acid evolution. Furthermore, 813 (13.5%) out of 6,033 potentially informative loci show a paucity of amino acid differences between humans and chimpanzees, indicating weak negative selection and/or balancing selection operating on mutations at these loci. We find that the distribution of negatively and positively selected genes varies greatly among biological processes and molecular functions, and that some classes, such as transcription factors, show an excess of rapidly evolving genes, whereas others, such as cytoskeletal proteins, show an excess of genes with extensive amino acid polymorphism within humans and yet little amino acid divergence between humans and chimpanzees.
Collapse
Affiliation(s)
- Carlos D Bustamante
- Department of Biological Statistics and Computational Biology, 101 Biotechnology Building, Cornell University, Ithaca, New York 14853, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Zhu L, Bustamante CD. A composite-likelihood approach for detecting directional selection from DNA sequence data. Genetics 2005; 170:1411-21. [PMID: 15879513 PMCID: PMC1451173 DOI: 10.1534/genetics.104.035097] [Citation(s) in RCA: 77] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2004] [Accepted: 03/30/2005] [Indexed: 11/18/2022] Open
Abstract
We present a novel composite-likelihood-ratio test (CLRT) for detecting genes and genomic regions that are subject to recurrent natural selection (either positive or negative). The method uses the likelihood functions of Hartl et al. (1994) for inference in a Wright-Fisher genic selection model and corrects for nonindependence among sites by application of coalescent simulations with recombination. Here, we (1) characterize the distribution of the CLRT statistic (Lambda) as a function of the population recombination rate (R=4Ner); (2) explore the effects of bias in estimation of R on the size (type I error) of the CLRT; (3) explore the robustness of the model to population growth, bottlenecks, and migration; (4) explore the power of the CLRT under varying levels of mutation, selection, and recombination; (5) explore the discriminatory power of the test in distinguishing negative selection from population growth; and (6) evaluate the performance of maximum composite-likelihood estimation (MCLE) of the selection coefficient. We find that the test has excellent power to detect weak negative selection and moderate power to detect positive selection. Moreover, the test is quite robust to bias in the estimate of local recombination rate, but not to certain demographic scenarios such as population growth or a recent bottleneck. Last, we demonstrate that the MCLE of the selection parameter has little bias for weak negative selection and has downward bias for positively selected mutations.
Collapse
Affiliation(s)
| | - Carlos D. Bustamante
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853
| |
Collapse
|
46
|
Williamson SH, Hernandez R, Fledel-Alon A, Zhu L, Nielsen R, Bustamante CD. Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc Natl Acad Sci U S A 2005; 102:7882-7. [PMID: 15905331 PMCID: PMC1142382 DOI: 10.1073/pnas.0502300102] [Citation(s) in RCA: 249] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2004] [Indexed: 11/18/2022] Open
Abstract
Natural selection and demographic forces can have similar effects on patterns of DNA polymorphism. Therefore, to infer selection from samples of DNA sequences, one must simultaneously account for demographic effects. Here we take a model-based approach to this problem by developing predictions for patterns of polymorphism in the presence of both population size change and natural selection. If data are available from different functional classes of variation, and a priori information suggests that mutations in one of those classes are selectively neutral, then the putatively neutral class can be used to infer demographic parameters, and inferences regarding selection on other classes can be performed given demographic parameter estimates. This procedure is more robust to assumptions regarding the true underlying demography than previous approaches to detecting and analyzing selection. We apply this method to a large polymorphism data set from 301 human genes and find (i) widespread negative selection acting on standing nonsynonymous variation, (ii) that the fitness effects of nonsynonymous mutations are well predicted by several measures of amino acid exchangeability, especially site-specific methods, and (iii) strong evidence for very recent population growth.
Collapse
Affiliation(s)
- Scott H Williamson
- Department of Biological Statistics and Computational Biology, 101 Biotechnology Building, Cornell University, Ithaca, NY 14853, USA
| | | | | | | | | | | |
Collapse
|