1
|
The fitness consequences of genetic divergence between polymorphic gene arrangements. Genetics 2024; 226:iyad218. [PMID: 38147527 PMCID: PMC11090464 DOI: 10.1093/genetics/iyad218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 12/15/2023] [Accepted: 12/20/2023] [Indexed: 12/28/2023] Open
Abstract
Inversions restrict recombination when heterozygous with standard arrangements, but often have few noticeable phenotypic effects. Nevertheless, there are several examples of inversions that can be maintained polymorphic by strong selection under laboratory conditions. A long-standing model for the source of such selection is divergence between arrangements with respect to recessive or partially recessive deleterious mutations, resulting in a selective advantage to heterokaryotypic individuals over homokaryotypes. This paper uses a combination of analytical and numerical methods to investigate this model, for the simple case of an autosomal inversion with multiple independent nucleotide sites subject to mildly deleterious mutations. A complete lack of recombination in heterokaryotypes is assumed, as well as constancy of the frequency of the inversion over space and time. It is shown that a significantly higher mutational load will develop for the less frequent arrangement. A selective advantage to heterokaryotypes is only expected when the two alternative arrangements are nearly equal in frequency, so that their mutational loads are very similar in size. The effects of some Drosophila pseudoobscura polymorphic inversions on fitness traits seem to be too large to be explained by this process, although it may contribute to some of the observed effects. Several population genomic statistics can provide evidence for signatures of a reduced efficacy of selection associated with the rarer of two arrangements, but there is currently little published data that are relevant to the theoretical predictions.
Collapse
|
2
|
Effects of Selection at Linked Sites on Patterns of Genetic Variability. ANNUAL REVIEW OF ECOLOGY, EVOLUTION, AND SYSTEMATICS 2021; 52:177-197. [PMID: 37089401 PMCID: PMC10120885 DOI: 10.1146/annurev-ecolsys-010621-044528] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Patterns of variation and evolution at a given site in a genome can be strongly influenced by the effects of selection at genetically linked sites. In particular, the recombination rates of genomic regions correlate with their amount of within-population genetic variability, the degree to which the frequency distributions of DNA sequence variants differ from their neutral expectations, and the levels of adaptation of their functional components. We review the major population genetic processes that are thought to lead to these patterns, focusing on their effects on patterns of variability: selective sweeps, background selection, associative overdominance, and Hill–Robertson interference among deleterious mutations. We emphasize the difficulties in distinguishing among the footprints of these processes and disentangling them from the effects of purely demographic factors such as population size changes. We also discuss how interactions between selective and demographic processes can significantly affect patterns of variability within genomes.
Collapse
|
3
|
Impact of Genetic Variation in Gene Regulatory Sequences: A Population Genomics Perspective. Front Genet 2021; 12:660899. [PMID: 34276769 PMCID: PMC8282999 DOI: 10.3389/fgene.2021.660899] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 05/31/2021] [Indexed: 01/06/2023] Open
Abstract
The unprecedented rise of high-throughput sequencing and assay technologies has provided a detailed insight into the non-coding sequences and their potential role as gene expression regulators. These regulatory non-coding sequences are also referred to as cis-regulatory elements (CREs). Genetic variants occurring within CREs have been shown to be associated with altered gene expression and phenotypic changes. Such variants are known to occur spontaneously and ultimately get fixed, due to selection and genetic drift, in natural populations and, in some cases, pave the way for speciation. Hence, the study of genetic variation at CREs has improved our overall understanding of the processes of local adaptation and evolution. Recent advances in high-throughput sequencing and better annotations of CREs have enabled the evaluation of the impact of such variation on gene expression, phenotypic alteration and fitness. Here, we review recent research on the evolution of CREs and concentrate on studies that have investigated genetic variation occurring in these regulatory sequences within the context of population genetics.
Collapse
|
4
|
Abstract
Drosophila melanogaster, a small dipteran of African origin, represents one of the best-studied model organisms. Early work in this system has uniquely shed light on the basic principles of genetics and resulted in a versatile collection of genetic tools that allow to uncover mechanistic links between genotype and phenotype. Moreover, given its worldwide distribution in diverse habitats and its moderate genome-size, Drosophila has proven very powerful for population genetics inference and was one of the first eukaryotes whose genome was fully sequenced. In this book chapter, we provide a brief historical overview of research in Drosophila and then focus on recent advances during the genomic era. After describing different types and sources of genomic data, we discuss mechanisms of neutral evolution including the demographic history of Drosophila and the effects of recombination and biased gene conversion. Then, we review recent advances in detecting genome-wide signals of selection, such as soft and hard selective sweeps. We further provide a brief introduction to background selection, selection of noncoding DNA and codon usage and focus on the role of structural variants, such as transposable elements and chromosomal inversions, during the adaptive process. Finally, we discuss how genomic data helps to dissect neutral and adaptive evolutionary mechanisms that shape genetic and phenotypic variation in natural populations along environmental gradients. In summary, this book chapter serves as a starting point to Drosophila population genomics and provides an introduction to the system and an overview to data sources, important population genetic concepts and recent advances in the field.
Collapse
|
5
|
Abstract
New species arise as the genomes of populations diverge. The developmental 'alarm clock' of speciation sounds off when sufficient divergence in genetic control of development leads hybrid individuals to infertility or inviability, the world awoken to the dawn of new species with intrinsic post-zygotic reproductive isolation. Some developmental stages will be more prone to hybrid dysfunction due to how molecular evolution interacts with the ontogenetic timing of gene expression. Considering the ontogeny of hybrid incompatibilities provides a profitable connection between 'evo-devo' and speciation genetics to better link macroevolutionary pattern, microevolutionary process, and molecular mechanisms. Here, we explore speciation alongside development, emphasizing their mutual dependence on genetic network features, fitness landscapes, and developmental system drift. We assess models for how ontogenetic timing of reproductive isolation can be predictable. Experiments and theory within this synthetic perspective can help identify new rules of speciation as well as rules in the molecular evolution of development.
Collapse
|
6
|
Prioritizing sequence variants in conserved non-coding elements in the chicken genome using chCADD. PLoS Genet 2020; 16:e1009027. [PMID: 32966296 PMCID: PMC7535126 DOI: 10.1371/journal.pgen.1009027] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 10/05/2020] [Accepted: 08/05/2020] [Indexed: 11/30/2022] Open
Abstract
The availability of genomes for many species has advanced our understanding of the non-protein-coding fraction of the genome. Comparative genomics has proven itself to be an invaluable approach for the systematic, genome-wide identification of conserved non-protein-coding elements (CNEs). However, for many non-mammalian model species, including chicken, our capability to interpret the functional importance of variants overlapping CNEs has been limited by current genomic annotations, which rely on a single information type (e.g. conservation). We here studied CNEs in chicken using a combination of population genomics and comparative genomics. To investigate the functional importance of variants found in CNEs we develop a ch(icken) Combined Annotation-Dependent Depletion (chCADD) model, a variant effect prediction tool first introduced for humans and later on for mouse and pig. We show that 73 Mb of the chicken genome has been conserved across more than 280 million years of vertebrate evolution. The vast majority of the conserved elements are in non-protein-coding regions, which display SNP densities and allele frequency distributions characteristic of genomic regions constrained by purifying selection. By annotating SNPs with the chCADD score we are able to pinpoint specific subregions of the CNEs to be of higher functional importance, as supported by SNPs found in these subregions are associated with known disease genes in humans, mice, and rats. Taken together, our findings indicate that CNEs harbor variants of functional significance that should be object of further investigation along with protein-coding mutations. We therefore anticipate chCADD to be of great use to the scientific community and breeding companies in future functional studies in chicken.
Collapse
|
7
|
iMKT: the integrative McDonald and Kreitman test. Nucleic Acids Res 2020; 47:W283-W288. [PMID: 31081014 PMCID: PMC6602517 DOI: 10.1093/nar/gkz372] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Revised: 04/18/2019] [Accepted: 05/03/2019] [Indexed: 01/07/2023] Open
Abstract
The McDonald and Kreitman test (MKT) is one of the most powerful and widely used methods to detect and quantify recurrent natural selection using DNA sequence data. Here we present iMKT (acronym for integrative McDonald and Kreitman test), a novel web-based service performing four distinct MKT types. It allows the detection and estimation of four different selection regimes −adaptive, neutral, strongly deleterious and weakly deleterious− acting on any genomic sequence. iMKT can analyze both user's own population genomic data and pre-loaded Drosophila melanogaster and human sequences of protein-coding genes obtained from the largest population genomic datasets to date. Advanced options in the website allow testing complex hypotheses such as the application example showed here: do genes located in high recombination regions undergo higher rates of adaptation? We aim that iMKT will become a reference site tool for the study of evolutionary adaptation in massive population genomics datasets, especially in Drosophila and humans. iMKT is a free resource online at https://imkt.uab.cat.
Collapse
|
8
|
Toward an Evolutionarily Appropriate Null Model: Jointly Inferring Demography and Purifying Selection. Genetics 2020; 215:173-192. [PMID: 32152045 PMCID: PMC7198275 DOI: 10.1534/genetics.119.303002] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 03/05/2020] [Indexed: 01/27/2023] Open
Abstract
The question of the relative evolutionary roles of adaptive and nonadaptive processes has been a central debate in population genetics for nearly a century. While advances have been made in the theoretical development of the underlying models, and statistical methods for estimating their parameters from large-scale genomic data, a framework for an appropriate null model remains elusive. A model incorporating evolutionary processes known to be in constant operation, genetic drift (as modulated by the demographic history of the population) and purifying selection, is lacking. Without such a null model, the role of adaptive processes in shaping within- and between-population variation may not be accurately assessed. Here, we investigate how population size changes and the strength of purifying selection affect patterns of variation at "neutral" sites near functional genomic components. We propose a novel statistical framework for jointly inferring the contribution of the relevant selective and demographic parameters. By means of extensive performance analyses, we quantify the utility of the approach, identify the most important statistics for parameter estimation, and compare the results with existing methods. Finally, we reanalyze genome-wide population-level data from a Zambian population of Drosophila melanogaster, and find that it has experienced a much slower rate of population growth than was inferred when the effects of purifying selection were neglected. Our approach represents an appropriate null model, against which the effects of positive selection can be assessed.
Collapse
|
9
|
Abstract
Conservation genomics aims to preserve the viability of populations and the biodiversity of living organisms. Invertebrate organisms represent 95% of animal biodiversity; however, few genomic resources currently exist for the group. The subset of marine invertebrates includes the most ancient metazoan lineages and possesses codes for unique gene products and possible keys to adaptation. The benefits of supporting invertebrate conservation genomics research (e.g., likely discovery of novel genes, protein regulatory mechanisms, genomic innovations, and transposable elements) outweigh the various hurdles (rare, small, or polymorphic starting materials). Here we review best conservation genomics practices in the laboratory and in silico when applied to marine invertebrates and also showcase unique features in several case studies of acroporid corals, crown-of-thorns starfish, apple snails, and abalone. Marine conservation genomics should also address how diversity can lead to unique marine innovations, the impact of deleterious variation, and how genomic monitoring and profiling could positively affect broader conservation goals (e.g., value of baseline data for in situ/ex situ genomic stocks).
Collapse
|
10
|
Conserved Noncoding Elements Influence the Transposable Element Landscape in Drosophila. Genome Biol Evol 2018; 10:1533-1545. [PMID: 29850787 PMCID: PMC6007792 DOI: 10.1093/gbe/evy104] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/22/2018] [Indexed: 12/15/2022] Open
Abstract
Highly conserved noncoding elements (CNEs) constitute a significant proportion of the genomes of multicellular eukaryotes. The function of most CNEs remains elusive, but growing evidence indicates they are under some form of purifying selection. Noncoding regions in many species also harbor large numbers of transposable element (TE) insertions, which are typically lineage specific and depleted in exons because of their deleterious effects on gene function or expression. However, it is currently unknown whether the landscape of TE insertions in noncoding regions is random or influenced by purifying selection on CNEs. Here, we combine comparative and population genomic data in Drosophila melanogaster to show that the abundance of TE insertions in intronic and intergenic CNEs is reduced relative to random expectation, supporting the idea that selective constraints on CNEs eliminate a proportion of TE insertions in noncoding regions. However, we find no evidence for differences in the allele frequency spectra for polymorphic TE insertions in CNEs versus those in unconstrained spacer regions, suggesting that the distribution of fitness effects acting on observable TE insertions is similar across different functional compartments in noncoding DNA. Our results provide evidence that selective constraints on CNEs contribute to shaping the landscape of TE insertion in eukaryotic genomes, and provide further evidence that CNEs are indeed functionally constrained and not simply mutational cold spots.
Collapse
|
11
|
Conserved non-coding elements: developmental gene regulation meets genome organization. Nucleic Acids Res 2018; 45:12611-12624. [PMID: 29121339 PMCID: PMC5728398 DOI: 10.1093/nar/gkx1074] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Accepted: 10/24/2017] [Indexed: 12/20/2022] Open
Abstract
Comparative genomics has revealed a class of non-protein-coding genomic sequences that display an extraordinary degree of conservation between two or more organisms, regularly exceeding that found within protein-coding exons. These elements, collectively referred to as conserved non-coding elements (CNEs), are non-randomly distributed across chromosomes and tend to cluster in the vicinity of genes with regulatory roles in multicellular development and differentiation. CNEs are organized into functional ensembles called genomic regulatory blocks–dense clusters of elements that collectively coordinate the expression of shared target genes, and whose span in many cases coincides with topologically associated domains. CNEs display sequence properties that set them apart from other sequences under constraint, and have recently been proposed as useful markers for the reconstruction of the evolutionary history of organisms. Disruption of several of these elements is known to contribute to diseases linked with development, and cancer. The emergence, evolutionary dynamics and functions of CNEs still remain poorly understood, and new approaches are required to enable comprehensive CNE identification and characterization. Here, we review current knowledge and identify challenges that need to be tackled to resolve the impasse in understanding extreme non-coding conservation.
Collapse
|
12
|
Abstract
Molecular population genetics aims to explain genetic variation and molecular evolution from population genetics principles. The field was born 50 years ago with the first measures of genetic variation in allozyme loci, continued with the nucleotide sequencing era, and is currently in the era of population genomics. During this period, molecular population genetics has been revolutionized by progress in data acquisition and theoretical developments. The conceptual elegance of the neutral theory of molecular evolution or the footprint carved by natural selection on the patterns of genetic variation are two examples of the vast number of inspiring findings of population genetics research. Since the inception of the field, Drosophila has been the prominent model species: molecular variation in populations was first described in Drosophila and most of the population genetics hypotheses were tested in Drosophila species. In this review, we describe the main concepts, methods, and landmarks of molecular population genetics, using the Drosophila model as a reference. We describe the different genetic data sets made available by advances in molecular technologies, and the theoretical developments fostered by these data. Finally, we review the results and new insights provided by the population genomics approach, and conclude by enumerating challenges and new lines of inquiry posed by increasingly large population scale sequence data.
Collapse
|
13
|
GC Content Heterogeneity Transition of Conserved Noncoding Sequences Occurred at the Emergence of Vertebrates. Genome Biol Evol 2016; 8:3377-3392. [PMID: 28040773 PMCID: PMC5203776 DOI: 10.1093/gbe/evw231] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Conserved non-coding sequences (CNSs) of Eukaryotes are known to be significantly enriched in regulatory sequences. CNSs of diverse lineages follow different patterns in abundance, sequence composition, and location. Here, we report a thorough analysis of CNSs in diverse groups of Eukaryotes with respect to GC content heterogeneity. We examined 24 fungi, 19 invertebrates, and 12 non-mammalian vertebrates so as to find lineage specific features of CNSs. We found that fungi and invertebrate CNSs are predominantly GC rich as in plants we previously observed, whereas vertebrate CNSs are GC poor. This result suggests that the CNS GC content transition occurred from the ancestral GC rich state of Eukaryotes to GC poor in the vertebrate lineage due to the enrollment of GC poor transcription factor binding sites that are lineage specific. CNS GC content is closely linked with the nucleosome occupancy that determines the location and structural architecture of DNAs.
Collapse
|
14
|
Abstract
Natural selection at one site shapes patterns of genetic variation at linked sites. Quantifying the effects of "linked selection" on levels of genetic diversity is key to making reliable inference about demography, building a null model in scans for targets of adaptation, and learning about the dynamics of natural selection. Here, we introduce the first method that jointly infers parameters of distinct modes of linked selection, notably background selection and selective sweeps, from genome-wide diversity data, functional annotations and genetic maps. The central idea is to calculate the probability that a neutral site is polymorphic given local annotations, substitution patterns, and recombination rates. Information is then combined across sites and samples using composite likelihood in order to estimate genome-wide parameters of distinct modes of selection. In addition to parameter estimation, this approach yields a map of the expected neutral diversity levels along the genome. To illustrate the utility of our approach, we apply it to genome-wide resequencing data from 125 lines in Drosophila melanogaster and reliably predict diversity levels at the 1Mb scale. Our results corroborate estimates of a high fraction of beneficial substitutions in proteins and untranslated regions (UTR). They allow us to distinguish between the contribution of sweeps and other modes of selection around amino acid substitutions and to uncover evidence for pervasive sweeps in untranslated regions (UTRs). Our inference further suggests a substantial effect of other modes of linked selection and of adaptation in particular. More generally, we demonstrate that linked selection has had a larger effect in reducing diversity levels and increasing their variance in D. melanogaster than previously appreciated.
Collapse
|
15
|
Changes in selective pressures associated with human population expansion may explain metabolic and immune related pathways enriched for signatures of positive selection. BMC Genomics 2016; 17:504. [PMID: 27444955 PMCID: PMC4955149 DOI: 10.1186/s12864-016-2783-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Accepted: 05/26/2016] [Indexed: 12/14/2022] Open
Abstract
Background The study of local adaptation processes is a very important research topic in the field of population genomics. There is a particular interest in the study of human populations because they underwent a process of rapid spatial expansion and faced important environmental changes that translated into changes in selective pressures. New mutations may have been selected for in the new environment and previously existing genetic variants may have become detrimental. Immune related genes may have been released from the selective pressure exerted by pathogens in the ancestral environment and new variants may have been positively selected due to pathogens present in the newly colonized habitat. Also, variants that had a selective advantage in past environments may have become deleterious in the modern world due to external stimuli including climatic, dietary and behavioral changes, which could explain the high prevalence of some polygenic diseases such as diabetes and obesity. Results We performed an enrichment analysis to identify gene sets enriched for signals of positive selection in humans. We used two genome scan methods, XPCLR and iHS to detect selection using a dense coverage of SNP markers combined with two gene set enrichment approaches. We identified immune related gene sets that could be involved in the protection against pathogens especially in the African population. We also identified the glycolysis & gluconeogenesis gene set, related to metabolism, which supports the thrifty genotype hypothesis invoked to explain the current high prevalence of diseases such as diabetes and obesity. Extending our analysis to the gene level, we found signals for 23 candidate genes linked to metabolic syndrome, 13 of which are new candidates for positive selection. Conclusions Our study provides a list of genes and gene sets associated with immunity and metabolic syndrome that are enriched for signals of positive selection in three human populations (Europeans, Africans and Asians). Our results highlight differences in the relative importance of pathogens as drivers of local adaptation in different continents and provide new insights into the evolution and high incidence of metabolic syndrome in modern human populations. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2783-2) contains supplementary material, which is available to authorized users.
Collapse
|
16
|
Abstract
Eukaryotes contain short (∼80-200 bp) regions that have few or no substitutions among species that represent hundreds of millions of years of evolutionary divergence. These ultraconserved elements (UCEs) are candidates for containing essential functions, but their biological roles remain largely unknown. Here, we report the discovery and characterization of UCEs from 12 sequenced Drosophila species. We identified 98 elements ≥80 bp long with very high conservation across the Drosophila phylogeny. Population genetic analyses reveal that these UCEs are not present in mutational cold spots. Instead we infer that they experience a level of selective constraint almost 10-fold higher compared with missense mutations in protein-coding sequences, which is substantially higher than that observed previously for human UCEs. About one-half of these Drosophila UCEs overlap the transcribed portion of genes, with many of those that are within coding sequences likely to correspond to sites of ADAR-dependent RNA editing. For the remaining UCEs that are in nongenic regions, we find that many are potentially capable of forming RNA secondary structures. Among ten chosen for further analysis, we discovered that the majority are transcribed in multiple tissues of Drosophila melanogaster. We conclude that Drosophila species are rich with UCEs and that many of them may correspond to novel noncoding RNAs.
Collapse
|
17
|
Causes of natural variation in fitness: evidence from studies of Drosophila populations. Proc Natl Acad Sci U S A 2015; 112:1662-9. [PMID: 25572964 DOI: 10.1073/pnas.1423275112] [Citation(s) in RCA: 133] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
DNA sequencing has revealed high levels of variability within most species. Statistical methods based on population genetics theory have been applied to the resulting data and suggest that most mutations affecting functionally important sequences are deleterious but subject to very weak selection. Quantitative genetic studies have provided information on the extent of genetic variation within populations in traits related to fitness and the rate at which variability in these traits arises by mutation. This paper attempts to combine the available information from applications of the two approaches to populations of the fruitfly Drosophila in order to estimate some important parameters of genetic variation, using a simple population genetics model of mutational effects on fitness components. Analyses based on this model suggest the existence of a class of mutations with much larger fitness effects than those inferred from sequence variability and that contribute most of the standing variation in fitness within a population caused by the input of mildly deleterious mutations. However, deleterious mutations explain only part of this standing variation, and other processes such as balancing selection appear to make a large contribution to genetic variation in fitness components in Drosophila.
Collapse
|
18
|
A new genome-wide method to track horizontally transferred sequences: application to Drosophila. Genome Biol Evol 2015; 6:416-32. [PMID: 24497602 PMCID: PMC3942030 DOI: 10.1093/gbe/evu026] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Because of methodological breakthroughs and the availability of an increasing amount of whole-genome sequence data, horizontal transfers (HTs) in eukaryotes have received much attention recently. Contrary to similar analyses in prokaryotes, most studies in eukaryotes usually investigate particular sequences corresponding to transposable elements (TEs), neglecting the other components of the genome. We present a new methodological framework for the genome-wide detection of all putative horizontally transferred sequences between two species that requires no prior knowledge of the transferred sequences. This method provides a broader picture of HTs in eukaryotes by fully exploiting complete-genome sequence data. In contrast to previous genome-wide approaches, we used a well-defined statistical framework to control for the number of false positives in the results, and we propose two new validation procedures to control for confounding factors. The first validation procedure relies on a comparative analysis with other species of the phylogeny to validate HTs for the nonrepeated sequences detected, whereas the second one built upon the study of the dynamics of the detected TEs. We applied our method to two closely related Drosophila species, Drosophila melanogaster and D. simulans, in which we discovered 10 new HTs in addition to all the HTs previously detected in different studies, which underscores our method’s high sensitivity and specificity. Our results favor the hypothesis of multiple independent HTs of TEs while unraveling a small portion of the network of HTs in the Drosophila phylogeny.
Collapse
|
19
|
De novo prediction of cis-regulatory elements and modules through integrative analysis of a large number of ChIP datasets. BMC Genomics 2014; 15:1047. [PMID: 25442502 PMCID: PMC4265420 DOI: 10.1186/1471-2164-15-1047] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2014] [Accepted: 11/19/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In eukaryotes, transcriptional regulation is usually mediated by interactions of multiple transcription factors (TFs) with their respective specific cis-regulatory elements (CREs) in the so-called cis-regulatory modules (CRMs) in DNA. Although the knowledge of CREs and CRMs in a genome is crucial to elucidate gene regulatory networks and understand many important biological phenomena, little is known about the CREs and CRMs in most eukaryotic genomes due to the difficulty to characterize them by either computational or traditional experimental methods. However, the exponentially increasing number of TF binding location data produced by the recent wide adaptation of chromatin immunoprecipitation coupled with microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) technologies has provided an unprecedented opportunity to identify CRMs and CREs in genomes. Nonetheless, how to effectively mine these large volumes of ChIP data to identify CREs and CRMs at nucleotide resolution is a highly challenging task. RESULTS We have developed a novel graph-theoretic based algorithm DePCRM for genome-wide de novo predictions of CREs and CRMs using a large number of ChIP datasets. DePCRM predicts CREs and CRMs by identifying overrepresented combinatorial CRE motif patterns in multiple ChIP datasets in an effective way. When applied to 168 ChIP datasets of 56 TFs from D. melanogaster, DePCRM identified 184 and 746 overrepresented CRE motifs and their combinatorial patterns, respectively, and predicted a total of 115,932 CRMs in the genome. The predictions recover 77.9% of known CRMs in the datasets and 89.3% of known CRMs containing at least one predicted CRE. We found that the putative CRMs as well as CREs as a whole in a CRM are more conserved than randomly selected sequences. CONCLUSION Our results suggest that the CRMs predicted by DePCRM are highly likely to be functional. Our algorithm is the first of its kind for de novo genome-wide prediction of CREs and CRMs using larger number of transcription factor ChIP datasets. The algorithm and predictions will hopefully facilitate the elucidation of gene regulatory networks in eukaryotes. All the predicted CREs, CRMs, and their target genes are available at http://bioinfo.uncc.edu/mniu/pcrms/www/.
Collapse
|
20
|
The effects of purifying selection on patterns of genetic differentiation between Drosophila melanogaster populations. Heredity (Edinb) 2014; 114:163-74. [PMID: 25227256 PMCID: PMC4270736 DOI: 10.1038/hdy.2014.80] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2014] [Revised: 06/16/2014] [Accepted: 07/22/2014] [Indexed: 01/21/2023] Open
Abstract
Using the data provided by the Drosophila Population Genomics Project, we investigate factors that affect the genetic differentiation between Rwandan and French populations of D. melanogaster. By examining within-population polymorphisms, we show that sites in long introns (especially those >2000 bp) have significantly lower π (nucleotide diversity) and more low-frequency variants (as measured by Tajima's D, minor allele frequencies, and prevalence of variants that are private to one of the two populations) than short introns, suggesting a positive relationship between intron length and selective constraint. A similar analysis of protein-coding polymorphisms shows that 0-fold (degenerate) sites in more conserved genes are under stronger purifying selection than those in less conserved genes. There is limited evidence that selection on codon bias has an effect on differentiation (as measured by FST) at 4-fold (degenerate) sites, and 4-fold sites and sites in 8–30 bp of short introns ⩽65 bp have comparable FST values. Consistent with the expected effect of purifying selection, sites in long introns and 0-fold sites in conserved genes are less differentiated than those in short introns and less conserved genes, respectively. Genes in non-crossover regions (for example, the fourth chromosome) have very high FST values at both 0-fold and 4-fold degenerate sites, which is probably because of the large reduction in within-population diversity caused by tight linkage between many selected sites. Our analyses also reveal subtle statistical properties of FST, which arise when information from multiple single nucleotide polymorphisms is combined and can lead to the masking of important signals of selection.
Collapse
|
21
|
Lineage-specific conserved noncoding sequences of plant genomes: their possible role in nucleosome positioning. Genome Biol Evol 2014; 6:2527-42. [PMID: 25364802 PMCID: PMC4202324 DOI: 10.1093/gbe/evu188] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/26/2014] [Indexed: 01/01/2023] Open
Abstract
Many studies on conserved noncoding sequences (CNSs) have found that CNSs are enriched significantly in regulatory sequence elements. We conducted whole-genome analysis on plant CNSs to identify lineage-specific CNSs in eudicots, monocots, angiosperms,and vascular plants based on the premise that lineage-specific CNSs define lineage-specific characters and functions in groups of organisms. We identified 27 eudicot, 204 monocot, 6,536 grass, 19 angiosperm, and 2 vascular plant lineage-specific CNSs(lengths range from 16 to 1,517 bp) that presumably originated in their respective common ancestors. A stronger constraint on the CNSs located in the untranslated regions was observed. The CNSs were often flanked by genes involved in transcription regulation. A drop of A+T content near the border of CNSs was observed and CNS regions showed a higher nucleosome occupancy probability. These CNSs are candidate regulatory elements, which are expected to define lineage-specific features of various plant groups.
Collapse
|
22
|
Nucleosomes shape DNA polymorphism and divergence. PLoS Genet 2014; 10:e1004457. [PMID: 24991813 PMCID: PMC4081404 DOI: 10.1371/journal.pgen.1004457] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2013] [Accepted: 05/12/2014] [Indexed: 11/30/2022] Open
Abstract
An estimated 80% of genomic DNA in eukaryotes is packaged as nucleosomes, which, together with the remaining interstitial linker regions, generate higher order chromatin structures [1]. Nucleosome sequences isolated from diverse organisms exhibit ∼10 bp periodic variations in AA, TT and GC dinucleotide frequencies. These sequence elements generate intrinsically curved DNA and help establish the histone-DNA interface. We investigated an important unanswered question concerning the interplay between chromatin organization and genome evolution: do the DNA sequence preferences inherent to the highly conserved histone core exert detectable natural selection on genomic divergence and polymorphism? To address this hypothesis, we isolated nucleosomal DNA sequences from Drosophila melanogaster embryos and examined the underlying genomic variation within and between species. We found that divergence along the D. melanogaster lineage is periodic across nucleosome regions with base changes following preferred nucleotides, providing new evidence for systematic evolutionary forces in the generation and maintenance of nucleosome-associated dinucleotide periodicities. Further, Single Nucleotide Polymorphism (SNP) frequency spectra show striking periodicities across nucleosomal regions, paralleling divergence patterns. Preferred alleles occur at higher frequencies in natural populations, consistent with a central role for natural selection. These patterns are stronger for nucleosomes in introns than in intergenic regions, suggesting selection is stronger in transcribed regions where nucleosomes undergo more displacement, remodeling and functional modification. In addition, we observe a large-scale (∼180 bp) periodic enrichment of AA/TT dinucleotides associated with nucleosome occupancy, while GC dinucleotide frequency peaks in linker regions. Divergence and polymorphism data also support a role for natural selection in the generation and maintenance of these super-nucleosomal patterns. Our results demonstrate that nucleosome-associated sequence periodicities are under selective pressure, implying that structural interactions between nucleosomes and DNA sequence shape sequence evolution, particularly in introns. In eukaryotic cells, the majority of DNA is packaged in nucleosomes comprised of ∼147 bp of DNA wound tightly around the highly conserved histone octamer. Nucleosomal DNA from diverse organisms shows an anti-correlated ∼10 bp periodicity of AT-rich and GC-rich dinucleotides. These sequence features influence DNA bending and shape, facilitating structural interactions. We asked whether natural selection mediated through the periodic sequence preferences of nucleosomes shapes the evolution of non-protein-coding regions of D. melanogaster by examining the inter- and intra-species genomic variation relative to these fundamental chromatin building blocks. The sequence changes across nucleosome-bound regions on the melanogaster lineage mirror the observed nucleosome dinucleotide periodicities. Importantly, we show that the frequencies of polymorphisms in natural populations vary across these regions, paralleling divergence, with higher frequencies of preferred alleles. These patterns are most evident for intronic regions and indicate that non-protein coding regions are evolving toward sequences that facilitate the canonical association with the histone core. This result is consistent with the hypothesis that interactions between DNA and the core have systematic impacts on function that are subject to natural selection and are not solely due to mutational bias. These ubiquitous interactions with the histone core partially account for the evolutionary constraint observed in unannotated genomic regions, and may drive broad changes in base composition.
Collapse
|
23
|
Background selection as baseline for nucleotide variation across the Drosophila genome. PLoS Genet 2014; 10:e1004434. [PMID: 24968283 PMCID: PMC4072542 DOI: 10.1371/journal.pgen.1004434] [Citation(s) in RCA: 108] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2013] [Accepted: 04/28/2014] [Indexed: 11/21/2022] Open
Abstract
The constant removal of deleterious mutations by natural selection causes a reduction in neutral diversity and efficacy of selection at genetically linked sites (a process called Background Selection, BGS). Population genetic studies, however, often ignore BGS effects when investigating demographic events or the presence of other types of selection. To obtain a more realistic evolutionary expectation that incorporates the unavoidable consequences of deleterious mutations, we generated high-resolution landscapes of variation across the Drosophila melanogaster genome under a BGS scenario independent of polymorphism data. We find that BGS plays a significant role in shaping levels of variation across the entire genome, including long introns and intergenic regions distant from annotated genes. We also find that a very large percentage of the observed variation in diversity across autosomes can be explained by BGS alone, up to 70% across individual chromosome arms at 100-kb scale, thus indicating that BGS predictions can be used as baseline to infer additional types of selection and demographic events. This approach allows detecting several outlier regions with signal of recent adaptive events and selective sweeps. The use of a BGS baseline, however, is particularly appropriate to investigate the presence of balancing selection and our study exposes numerous genomic regions with the predicted signature of higher polymorphism than expected when a BGS context is taken into account. Importantly, we show that these conclusions are robust to the mutation and selection parameters of the BGS model. Finally, analyses of protein evolution together with previous comparisons of genetic maps between Drosophila species, suggest temporally variable recombination landscapes and, thus, local BGS effects that may differ between extant and past phases. Because genome-wide BGS and temporal changes in linkage effects can skew approaches to estimate demographic and selective events, future analyses should incorporate BGS predictions and capture local recombination variation across genomes and along lineages. The removal of deleterious mutations from natural populations has potential consequences on patterns of variation across genomes. Population genetic analyses, however, often assume that such effects are negligible across recombining regions of species like Drosophila. We use simple models of purifying selection and current knowledge of recombination rates and gene distribution across the genome to obtain a baseline of variation predicted by the constant input and removal of deleterious mutations. We find that purifying selection alone can explain a major fraction of the observed variance in nucleotide diversity across the genome. The use of a baseline of variation predicted by linkage to deleterious mutations as null expectation exposes genomic regions under other selective regimes, including more regions showing the signature of balancing selection than would be evident when using traditional approaches. Our study also indicates that most, if not all, nucleotides across the D. melanogaster genome are significantly influenced by the removal of deleterious mutations, even when located in the middle of highly recombining regions and distant from genes. Additionally, the study of rates of protein evolution confirms previous analyses suggesting that the recombination landscape across the genome has changed in the recent history of D. melanogaster. All these reported factors can skew current analyses designed to capture demographic events or estimate the strength and frequency of adaptive mutations, and illustrate the need for new and more realistic theoretical and modeling approaches to study naturally occurring genetic variation.
Collapse
|
24
|
Abstract
Evolutionary conservation has been an accurate predictor of functional elements across the first decade of metazoan genomics. More recently, there has been a move to define functional elements instead from biochemical annotations. Evolutionary methods are, however, more comprehensive than biochemical approaches can be and can assess quantitatively, especially for subtle effects, how biologically important--how injurious after mutation--different types of elements are. Evolutionary methods are thus critical for understanding the large fraction (up to 10%) of the human genome that does not encode proteins and yet might convey function. These methods can also capture the ephemeral nature of much noncoding functional sequence, with large numbers of functional elements having been gained and lost rapidly along each mammalian lineage. Here, we review how different strengths of purifying selection have impacted on protein-coding and non-protein-coding loci and on transcription factor binding sites in mammalian and fruit fly genomes.
Collapse
|
25
|
Abstract
Regions of several dozen to several hundred base pairs of extreme conservation have been found in non-coding regions in all metazoan genomes. The distribution of these elements within and across genomes has suggested that many have roles as transcriptional regulatory elements in multi-cellular organization, differentiation and development. Currently, there is no known mechanism or function that would account for this level of conservation at the observed evolutionary distances. Previous studies have found that, while these regions are under strong purifying selection, and not mutational coldspots, deletion of entire regions in mice does not necessarily lead to identifiable changes in phenotype during development. These opposing findings lead to several questions regarding their functional importance and why they are under strong selection in the first place. In this perspective, we discuss the methods and techniques used in identifying and dissecting these regions, their observed patterns of conservation, and review the current hypotheses on their functional significance.
Collapse
|
26
|
Strong mutational bias toward deletions in the Drosophila melanogaster genome is compensated by selection. Genome Biol Evol 2013; 5:514-24. [PMID: 23395983 PMCID: PMC3622295 DOI: 10.1093/gbe/evt021] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Insertions and deletions (collectively indels) obviously have a major impact on genome evolution. However, before large-scale data on indel polymorphism became available, it was difficult to estimate the strength of selection acting on indel mutations. Here, we analyze indel polymorphism and divergence in different compartments of the Drosophila melanogaster genome: exons, introns of different lengths, and intergenic regions. Data on low-frequency polymorphisms indicate that 0.036–0.039 short (1–30 nt) insertion mutations and 0.085–0.092 short deletion mutations, with mean lengths 3.23 and 4.78, respectively, occur per single-nucleotide substitution. The excess of short deletion over short insertion mutations implies that indel mutations of these lengths should lead to a loss of approximately 0.30 nt per single-nucleotide replacement. However, polymorphism and divergence data show that this deletion bias is almost completely compensated by selection: Negative selection is stronger against deletions, whereas insertions are more likely to be favored by positive selection. Among the inframe low-frequency polymorphic mutations in exons, long introns, and intergenic regions, selection prevents a larger fraction of deletions (80–87%, depending on the type of the compartment) than of insertions (70–82%) or single-nucleotide substitutions (49–73%), from reaching high frequencies. The corresponding fractions were the lowest in short introns: 66%, 47%, and 15%, respectively, consistent with the weakest selective constraint in them. The McDonald–Kreitman test shows that 32–46% of the deletions and 60–73% of the insertions that were fixed in the recent evolution of D. melanogaster are adaptive, whereas this fraction is only 0–29% for single-nucleotide substitutions.
Collapse
|
27
|
An atlas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions. Nat Genet 2013; 45:891-8. [PMID: 23817568 DOI: 10.1038/ng.2684] [Citation(s) in RCA: 211] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2012] [Accepted: 06/04/2013] [Indexed: 12/17/2022]
Abstract
Despite the central importance of noncoding DNA to gene regulation and evolution, understanding of the extent of selection on plant noncoding DNA remains limited compared to that of other organisms. Here we report sequencing of genomes from three Brassicaceae species (Leavenworthia alabamica, Sisymbrium irio and Aethionema arabicum) and their joint analysis with six previously sequenced crucifer genomes. Conservation across orthologous bases suggests that at least 17% of the Arabidopsis thaliana genome is under selection, with nearly one-quarter of the sequence under selection lying outside of coding regions. Much of this sequence can be localized to approximately 90,000 conserved noncoding sequences (CNSs) that show evidence of transcriptional and post-transcriptional regulation. Population genomics analyses of two crucifer species, A. thaliana and Capsella grandiflora, confirm that most of the identified CNSs are evolving under medium to strong purifying selection. Overall, these CNSs highlight both similarities and several key differences between the regulatory DNA of plants and other species.
Collapse
|
28
|
Mutations within lncRNAs are effectively selected against in fruitfly but not in human. Genome Biol 2013; 14:R49. [PMID: 23710818 PMCID: PMC4053968 DOI: 10.1186/gb-2013-14-5-r49] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2013] [Accepted: 05/27/2013] [Indexed: 02/07/2023] Open
Abstract
Background Previous studies in Drosophila and mammals have revealed levels of long non-coding RNAs (lncRNAs) sequence conservation that are intermediate between neutrally evolving and protein-coding sequence. These analyses compared conservation between species that diverged up to 75 million years ago. However, analysis of sequence polymorphisms within a species' population can provide an understanding of essentially contemporaneous selective constraints that are acting on lncRNAs and can quantify the deleterious effect of mutations occurring within these loci. Results We took advantage of polymorphisms derived from the genome sequences of 163 Drosophila melanogaster strains and 174 human individuals to calculate the distribution of fitness effects of single nucleotide polymorphisms occurring within intergenic lncRNAs and compared this to distributions for SNPs present within putatively neutral or protein-coding sequences. Our observations show that in D.melanogaster there is a significant excess of rare frequency variants within intergenic lncRNAs relative to neutrally evolving sequences, whereas selection on human intergenic lncRNAs appears to be effectively neutral. Approximately 30% of mutations within these fruitfly lncRNAs are estimated as being weakly deleterious. Conclusions These contrasting results can be attributed to the large difference in effective population sizes between the two species. Our results suggest that while the sequences of lncRNAs will be well conserved across insect species, such loci in mammals will accumulate greater proportions of deleterious changes through genetic drift.
Collapse
|
29
|
Evolutionary Genomics of Colias Phosphoglucose Isomerase (PGI) Introns. J Mol Evol 2012; 74:96-111. [DOI: 10.1007/s00239-012-9492-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2011] [Accepted: 02/15/2012] [Indexed: 10/28/2022]
|
30
|
Abstract
The metabolome of a plant comprises all small molecule metabolites, which are produced during cellular processes. The genetic basis for metabolites in nonmodel plants is unknown, despite frequently observed correlations between metabolite concentrations and stress responses. A quantitative genetic analysis of metabolites in a nonmodel plant species is thus warranted. Here, we use standard association genetic methods to correlate 3563 single nucleotide polymorphisms (SNPs) to concentrations of 292 metabolites measured in a single loblolly pine (Pinus taeda) association population. A total of 28 single locus associations were detected, representing 24 and 20 unique SNPs and metabolites, respectively. Multilocus Bayesian mixed linear models identified 2998 additional associations for a total of 1617 unique SNPs associated to 255 metabolites. These SNPs explained sizeable fractions of metabolite heritabilities when considered jointly (56.6% on average) and had lower minor allele frequencies and magnitudes of population structure as compared with random SNPs. Modest sets of SNPs (n = 1-23) explained sizeable portions of genetic effects for many metabolites, thus highlighting the importance of multi-SNP models to association mapping, and exhibited patterns of polymorphism consistent with being linked to targets of natural selection. The implications for association mapping in forest trees are discussed.
Collapse
|
31
|
The role of background selection in shaping patterns of molecular evolution and variation: evidence from variability on the Drosophila X chromosome. Genetics 2012; 191:233-46. [PMID: 22377629 DOI: 10.1534/genetics.111.138073] [Citation(s) in RCA: 94] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
In the putatively ancestral population of Drosophila melanogaster, the ratio of silent DNA sequence diversity for X-linked loci to that for autosomal loci is approximately one, instead of the expected "null" value of 3/4. One possible explanation is that background selection (the hitchhiking effect of deleterious mutations) is more effective on the autosomes than on the X chromosome, because of the lack of crossing over in male Drosophila. The expected effects of background selection on neutral variability at sites in the middle of an X chromosome or an autosomal arm were calculated for different models of chromosome organization and methods of approximation, using current estimates of the deleterious mutation rate and distributions of the fitness effects of deleterious mutations. The robustness of the results to different distributions of fitness effects, dominance coefficients, mutation rates, mapping functions, and chromosome size was investigated. The predicted ratio of X-linked to autosomal variability is relatively insensitive to these variables, except for the mutation rate and map length. Provided that the deleterious mutation rate per genome is sufficiently large, it seems likely that background selection can account for the observed X to autosome ratio of variability in the ancestral population of D. melanogaster. The fact that this ratio is much less than one in D. pseudoobscura is also consistent with the model's predictions, since this species has a high rate of crossing over. The results suggest that background selection may play a major role in shaping patterns of molecular evolution and variation.
Collapse
|
32
|
Heterogeneity in genetic diversity among non-coding loci fails to fit neutral coalescent models of population history. PLoS One 2012; 7:e31972. [PMID: 22384117 PMCID: PMC3285185 DOI: 10.1371/journal.pone.0031972] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2011] [Accepted: 01/17/2012] [Indexed: 12/26/2022] Open
Abstract
Inferring aspects of the population histories of species using coalescent analyses of non-coding nuclear DNA has grown in popularity. These inferences, such as divergence, gene flow, and changes in population size, assume that genetic data reflect simple population histories and neutral evolutionary processes. However, violating model assumptions can result in a poor fit between empirical data and the models. We sampled 22 nuclear intron sequences from at least 19 different chromosomes (a genomic transect) to test for deviations from selective neutrality in the gadwall (Anas strepera), a Holarctic duck. Nucleotide diversity among these loci varied by nearly two orders of magnitude (from 0.0004 to 0.029), and this heterogeneity could not be explained by differences in substitution rates alone. Using two different coalescent methods to infer models of population history and then simulating neutral genetic diversity under these models, we found that the observed among-locus heterogeneity in nucleotide diversity was significantly higher than expected for these simple models. Defining more complex models of population history demonstrated that a pre-divergence bottleneck was also unlikely to explain this heterogeneity. However, both selection and interspecific hybridization could account for the heterogeneity observed among loci. Regardless of the cause of the deviation, our results illustrate that violating key assumptions of coalescent models can mislead inferences of population history.
Collapse
|
33
|
Abstract
A major challenge of biology is understanding the relationship between molecular genetic variation and variation in quantitative traits, including fitness. This relationship determines our ability to predict phenotypes from genotypes and to understand how evolutionary forces shape variation within and between species. Previous efforts to dissect the genotype-phenotype map were based on incomplete genotypic information. Here, we describe the Drosophila melanogaster Genetic Reference Panel (DGRP), a community resource for analysis of population genomics and quantitative traits. The DGRP consists of fully sequenced inbred lines derived from a natural population. Population genomic analyses reveal reduced polymorphism in centromeric autosomal regions and the X chromosome, evidence for positive and negative selection, and rapid evolution of the X chromosome. Many variants in novel genes, most at low frequency, are associated with quantitative traits and explain a large fraction of the phenotypic variance. The DGRP facilitates genotype-phenotype mapping using the power of Drosophila genetics.
Collapse
|
34
|
Abstract
Vast tracts of noncoding DNA contain elements that regulate gene expression in higher eukaryotes. Describing these regulatory elements and understanding how they evolve represent major challenges for biologists. Advances in the ability to survey genome-scale DNA sequence data are providing unprecedented opportunities to use evolutionary models and computational tools to identify functionally important elements and the mode of selection acting on them in multiple species. This chapter reviews some of the current methods that have been developed and applied on noncoding DNA, what they have shown us, and how they are limited. Results of several recent studies reveal that a significantly larger fraction of noncoding DNA in eukaryotic organisms is likely to be functional than previously believed, implying that the functional annotation of most noncoding DNA in these organisms is largely incomplete. In Drosophila, recent studies have further suggested that a large fraction of noncoding DNA divergence observed between species may be the product of recurrent adaptive substitution. Similar studies in humans have revealed a more complex pattern, with signatures of recurrent positive selection being largely concentrated in conserved noncoding DNA elements. Understanding these patterns and the extent to which they generalize to other organisms awaits the analysis of forthcoming genome-scale polymorphism and divergence data from more species.
Collapse
|
35
|
Abstract
We tested whether functionally important sites in bacterial, yeast, and animal promoters are more conserved than their neighbors. We found that substitutions are predominantly seen in less important sites and that those that occurred tended to have less impact on gene expression than possible alternatives. These results suggest that purifying selection operates on promoter sequences.
Collapse
|
36
|
Morphological evolution caused by many subtle-effect substitutions in regulatory DNA. Nature 2011; 474:598-603. [PMID: 21720363 PMCID: PMC3170772 DOI: 10.1038/nature10200] [Citation(s) in RCA: 147] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2011] [Accepted: 05/13/2011] [Indexed: 11/09/2022]
Abstract
Morphology evolves often through changes in developmental genes, but the causal mutations, and their effects, remain largely unknown. The evolution of naked cuticle—rather than trichomes—on larvae of Drosophila sechellia resulted from changes in five transcriptional enhancers of shavenbaby, a gene encoding a transcription factor that governs trichome morphogenesis. Here we show that the function of one of these enhancers evolved through multiple single nucleotide substitutions that altered both the timing and level of shavenbaby expression. The consequences of these nucleotide substitutions on larval morphology were quantified with a novel functional assay. We found that each substitution had a relatively small phenotypic effect, and that many nucleotide changes account for this large morphological difference. In addition, we observed that the substitutions displayed non-additive effects to generate a large phenotypic change. These data provide unprecedented resolution of the phenotypic effects of substitutions and show how individual nucleotide changes in a transcriptional enhancer have caused morphological evolution.
Collapse
|
37
|
Effective population size and the efficacy of selection on the X chromosomes of two closely related Drosophila species. Genome Biol Evol 2010; 3:114-28. [PMID: 21173424 PMCID: PMC3038356 DOI: 10.1093/gbe/evq086] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
The prevalence of natural selection relative to genetic drift is of central interest in evolutionary biology. Depending on the distribution of fitness effects of new mutations, the importance of these evolutionary forces may differ in species with different effective population sizes. Here, we survey population genetic variation at 105 orthologous X-linked protein coding regions in Drosophila melanogaster and its sister species D. simulans, two closely related species with distinct demographic histories. We observe significantly higher levels of polymorphism and evidence for stronger selection on codon usage bias in D. simulans, consistent with a larger historical effective population size on average for this species. Despite these differences, we estimate that <10% of newly arising nonsynonymous mutations have deleterious fitness effects in the nearly neutral range (i.e., −10 < Nes < 0) in both species. The inferred distributions of fitness effects and demographic models translate into surprisingly high estimates of the fraction of “adaptive” protein divergence in both species (∼85–90%). Despite evidence for different demographic histories, differences in population size have apparently played little role in the dynamics of protein evolution in these two species, and estimates of the adaptive fraction (α) of protein divergence in both species remain high even if we account for recent 10-fold growth. Furthermore, although several recent studies have noted strong signatures of recurrent adaptive protein evolution at genes involved in immunity, reproduction, sexual conflict, and intragenomic conflict, our finding of high levels of adaptive protein divergence at randomly chosen proteins (with respect to function) suggests that many other factors likely contribute to the adaptive protein divergence signature in Drosophila.
Collapse
|
38
|
When needles look like hay: how to find tissue-specific enhancers in model organism genomes. Dev Biol 2010; 350:239-54. [PMID: 21130761 DOI: 10.1016/j.ydbio.2010.11.026] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2010] [Revised: 11/11/2010] [Accepted: 11/22/2010] [Indexed: 01/22/2023]
Abstract
A major prerequisite for the investigation of tissue-specific processes is the identification of cis-regulatory elements. No generally applicable technique is available to distinguish them from any other type of genomic non-coding sequence. Therefore, researchers often have to identify these elements by elaborate in vivo screens, testing individual regions until the right one is found. Here, based on many examples from the literature, we summarize how functional enhancers have been isolated from other elements in the genome and how they have been characterized in transgenic animals. Covering computational and experimental studies, we provide an overview of the global properties of cis-regulatory elements, like their specific interactions with promoters and target gene distances. We describe conserved non-coding elements (CNEs) and their internal structure, nucleotide composition, binding site clustering and overlap, with a special focus on developmental enhancers. Conflicting data and unresolved questions on the nature of these elements are highlighted. Our comprehensive overview of the experimental shortcuts that have been found in the different model organism communities and the new field of high-throughput assays should help during the preparation phase of a screen for enhancers. The review is accompanied by a list of general guidelines for such a project.
Collapse
|
39
|
Genome-wide functional element detection using pairwise statistical alignment outperforms multiple genome footprinting techniques. Bioinformatics 2010; 26:2116-20. [PMID: 20610610 DOI: 10.1093/bioinformatics/btq360] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Comparative genomic sequence analysis is a powerful approach for identifying putative functional elements in silico. The availability of full-genome sequences from many vertebrate species has resulted in the development of popular tools, for example, the phastCons software package that search large numbers of genomes to identify conserved elements. While phastCons can analyze many genomes simultaneously, it ignores potentially informative insertion and deletion events and relies on a fixed, precomputed multiple sequence alignment. RESULTS We have developed a new method, GRAPeFoot, which simultaneously aligns two full genomes and annotates a set of conserved regions exhibiting reduced rates of insertion, deletion and substitution mutations. We tested GRAPeFoot using the human and mouse genomes and compared its performance to a set of phastCons predictions hosted on the UCSC genome browser. Our results demonstrate that despite the use of only two genomes, GRAPeFoot identified constrained elements at rates comparable with phastCons, which analyzed data from 28 vertebrate genomes. This study demonstrates how integrated modelling of substitutions, indels and purifying selection allows a pairwise analysis to exhibit a sensitivity similar to a heuristic analysis of many genomes. AVAILABILITY The GRAPeFoot software and set of genome-wide functional element predictions are freely available to download online at http://www.stats.ox.ac.uk/ approximately satija/GRAPeFoot/.
Collapse
|
40
|
On the utility of short intron sequences as a reference for the detection of positive and negative selection in Drosophila. Mol Biol Evol 2010; 27:1226-34. [PMID: 20150340 DOI: 10.1093/molbev/msq046] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The detection of selection, both positive and negative, acting on a DNA sequence or class of nucleotide sites requires comparison with a reference sequence that is unaffected by selection. In Drosophila, recent findings of widespread selective constraint, as well as adaptive evolution, in both coding and noncoding regions highlight the difficulties in choosing such a reference sequence. Here, we investigate the utility of short intron sequences as a reference for the detection of selection. For a set of 119 Drosophila melanogaster genes containing 195 short introns (<or=120 bp), we analyzed polymorphism and divergence at 1) 4-fold synonymous sites, 2) all sites of introns <or=120 bp, 3) all sites of introns <or=65 bp, 4) bases 8-30 of introns <or=120 bp, and 5) bases 8-30 of introns <or=65 bp. The last class of sites shows the highest levels of both interspecific divergence and intraspecific polymorphism, suggesting that these sites are under the least selective constraint. Bases 8-30 of introns <or=65 bp also have the lowest ratio of divergence to polymorphism, which may indicate that a small proportion of substitutions in the other classes of sites are the result of adaptive evolution. Although there is little signal of selection on the primary sequence of short introns, patterns of insertion-deletion polymorphism and divergence suggest that both positive and negative selection act to maintain an optimal intron length.
Collapse
|
41
|
Studying patterns of recent evolution at synonymous sites and intronic sites in Drosophila melanogaster. J Mol Evol 2009; 70:116-28. [PMID: 20041239 DOI: 10.1007/s00239-009-9314-6] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2009] [Accepted: 12/07/2009] [Indexed: 10/20/2022]
Abstract
Most previous studies of the evolution of codon usage bias (CUB) and intronic GC content (iGC) in Drosophila melanogaster were based on between-species comparisons, reflecting long-term evolutionary events. However, a complete picture of the evolution of CUB and iGC cannot be drawn without knowledge of their more recent evolutionary history. Here, we used a polymorphism dataset collected from Zimbabwe to study patterns of the recent evolution of CUB and iGC. Analyzing coding and intronic data jointly with a model which can simultaneously estimate selection, mutational, and demographic parameters, we have found that: (1) natural selection is probably acting on synonymous codons; (2) a constant population size model seems to be sufficient to explain most of the observed synonymous polymorphism patterns; (3) GC is favored over AT in introns. In agreement with the long-term evolutionary patterns, ongoing selection acting on X-linked synonymous codons is stronger than that acting on autosomal codons. The selective differences between preferred and unpreferred codons tend to be greater than the differences between GC and AT in introns, suggesting that natural selection, not just biased gene conversion, may have influenced the evolution of CUB. Interestingly, evidence for non-equilibrium evolution comes exclusively from the intronic data. However, three different models, an equilibrium model with two classes of selected sites and two non-equilibrium models with changes in either population size or mutational parameters, fit the intronic data equally well. These results show that using inadequate selection (or demographic) models can result in incorrect estimates of demographic (or selection) parameters.
Collapse
|
42
|
Patterns of DNA-sequence divergence between Drosophila miranda and D. pseudoobscura. J Mol Evol 2009; 69:601-11. [PMID: 19859648 DOI: 10.1007/s00239-009-9298-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2009] [Accepted: 10/07/2009] [Indexed: 12/22/2022]
Abstract
Contrary to the classical view, a large amount of non-coding DNA seems to be selectively constrained in Drosophila and other species. Here, using Drosophila miranda BAC sequences and the Drosophila pseudoobscura genome sequence, we aligned coding and non-coding sequences between D. pseudoobscura and D. miranda, and investigated their patterns of evolution. We found two patterns that have previously been observed in comparisons between Drosophila melanogaster and its relatives. First, there is a negative correlation between intron divergence and intron length, suggesting that longer non-coding sequences may contain more regulatory elements than shorter sequences. Our other main finding is a negative correlation between the rate of non-synonymous substitutions (d(N)) and codon usage bias (F(op)), showing that fast-evolving genes have a lower codon usage bias, consistent with strong positive selection interfering with weak selection for codon usage.
Collapse
|
43
|
Abstract
The genomes of vertebrates, flies, and nematodes contain highly conserved noncoding elements (CNEs). CNEs cluster around genes that regulate development, and where tested, they can act as transcriptional enhancers. Within an animal group CNEs are the most conserved sequences but between groups they are normally diverged beyond recognition. Alternative CNEs are, however, associated with an overlapping set of genes that control development in all animals. Here, we discuss the evidence that CNEs are part of the core gene regulatory networks (GRNs) that specify alternative animal body plans. The major animal groups arose >550 million years ago. We propose that the cis-regulatory inputs identified by CNEs arose during the "re-wiring" of regulatory interactions that occurred during early animal evolution. Consequently, different animal groups, with different core GRNs, contain alternative sets of CNEs. Due to the subsequent stability of animal body plans, these core regulatory sequences have been evolving in parallel under strong purifying selection in different animal groups.
Collapse
|
44
|
More radical amino acid replacements in primates than in rodents: support for the evolutionary role of effective population size. Gene 2009; 440:50-6. [PMID: 19332110 DOI: 10.1016/j.gene.2009.03.012] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2009] [Revised: 03/16/2009] [Accepted: 03/19/2009] [Indexed: 02/04/2023]
Abstract
We examined the pattern of nucleotide substitution in 4933 conserved single-copy orthologous protein-coding genes of human, rhesus, mouse, and rat. Consistent with previous studies, the median ratio of the number of nonsynonymous substitutions per nonsynonymous site (d(N)) to the number of synonymous substitutions per synonymous site (d(S)) was significantly higher in the comparison between the two primates than in the comparison between the two rodents. This pattern was particularly strong in the case of genes expressed in the immune system, but also occurred in other genes, including a set of highly conserved genes involved in the regulation of transcription. Both synonymous and nonsynonymous differences occurred independently in the same codons in the primates and in the rodents to a greater extent than expected by chance, but the extent of the deviation from random expectation was much greater in the case of nonsynonymous differences. Parallel amino acid replacements occurred at the same sites in the primates and rodents far more frequently than expected by chance, but tended to involve very conservative amino acid changes. Divergent amino acid changes involved more chemically different amino acids than parallel changes, and divergent amino acid replacements between the primates were significantly more radical than those between the rodents. These results are most easily explained on the hypothesis that the evolution of these genes has been shaped largely by purifying selection, which has been less effective in primates than in rodents, presumably as a consequence of lower long-term effective population sizes in the former.
Collapse
|
45
|
Elevated levels of expression associated with regions of the Drosophila genome that lack crossing over. Biol Lett 2009; 4:758-61. [PMID: 18782733 DOI: 10.1098/rsbl.2008.0376] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The recombinational environment influences patterns of molecular evolution through the effects of Hill-Robertson interference. Here, we examine genome-wide patterns of gene expression with respect to recombinational environment in Drosophila melanogaster. We find that regions of the genome lacking crossing over exhibit elevated levels of expression, and this is most pronounced for genes on the entirely non-crossing over fourth chromosome. We find no evidence for differences in the patterns of gene expression between regions of high, intermediate and low crossover frequencies. These results suggest that, in the absence of crossing over, selection to maintain control of expression may be compromised, perhaps due to the accumulation of deleterious mutations in regulatory regions. Alternatively, higher gene expression may be evolving to compensate for defective protein products or reduced translational efficiency.
Collapse
|
46
|
Genomic complexity of the variable region-containing chitin-binding proteins in amphioxus. BMC Genet 2008; 9:78. [PMID: 19046437 PMCID: PMC2632668 DOI: 10.1186/1471-2156-9-78] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2008] [Accepted: 12/01/2008] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND The variable region-containing chitin-binding proteins (VCBPs) are found in protochordates and consist of two tandem immunoglobulin variable (V)-type domains and a chitin-binding domain. We previously have shown that these polymorphic genes, which primarily are expressed in the gut, exhibit characteristics of immune genes. In this report, we describe VCBP genomic organization and characterize adjacent and intervening genetic features which may influence both their polymorphism and complex transcriptional repertoire. RESULTS VCBP genes 1, 2, 4, and 5 are encoded in a single contiguous gene-rich chromosomal region and VCBP3 is encoded in a separate locus. The VCBPs exhibit extensive haplotype variation, including copy number variation (CNV), indel polymorphism and a markedly elevated variation in repeat type and density. In at least one haplotype, inverted repeats occur more frequently than elsewhere in the genome. Multi-animal cDNA screening, as well as transcriptional profilingusing a novel transfection system, suggests that haplotype-specific transcriptional variants may contribute to VCBP genetic diversity. CONCLUSION The availability of the Branchiostoma floridae genome (Joint Genome Institute, Brafl1), along with BAC and PAC screening and sequencing described here, reveal that the relatively limited number of VCBP genes present in the amphioxus genome exhibit exceptionally high haplotype variation. These VCBP haplotypes contribute a diverse pool of allelic variants, which includes gene copy number variation, pseudogenes, and other polymorphisms, while contributing secondary effects on gene transcription as well.
Collapse
|
47
|
The Impact of Natural Selection on the Genome: Emerging Patterns inDrosophilaandArabidopsis. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2008. [DOI: 10.1146/annurev.ecolsys.39.110707.173342] [Citation(s) in RCA: 87] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
48
|
Controlling type-I error of the McDonald-Kreitman test in genomewide scans for selection on noncoding DNA. Genetics 2008; 180:1767-71. [PMID: 18791238 DOI: 10.1534/genetics.108.091850] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Departures from the assumption of homogenously interdigitated neutral and putatively selected sites in the McDonald-Kreitman test can lead to false rejections of the neutral model in the presence of intermediate levels of recombination. This problem is exacerbated by small sample sizes, nonequilibrium demography, recombination rate variation, and in comparisons involving more recently diverged species. I propose that establishing significance levels by coalescent simulation with recombination can improve the fidelity of the test in genomewide scans for selection on noncoding DNA.
Collapse
|
49
|
Accelerated sequence divergence of conserved genomic elements in Drosophila melanogaster. Genome Res 2008; 18:1592-601. [PMID: 18583644 DOI: 10.1101/gr.077131.108] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Recent genomic sequencing of 10 additional Drosophila genomes provides a rich resource for comparative genomics analyses aimed at understanding the similarities and differences between species and between Drosophila and mammals. Using a phylogenetic approach, we identified 64 genomic elements that have been highly conserved over most of the Drosophila tree, but that have experienced a recent burst of evolution along the Drosophila melanogaster lineage. Compared to similarly defined elements in humans, these regions of rapid lineage-specific evolution in Drosophila differ dramatically in location, mechanism of evolution, and functional properties of associated genes. Notably, the majority reside in protein-coding regions and primarily result from rapid adaptive synonymous site evolution. In fact, adaptive evolution appears to be driving substitutions to unpreferred codons. Our analysis also highlights interesting noncoding genomic regions, such as regulatory regions in the gene gooseberry-neuro and a putative novel miRNA.
Collapse
|
50
|
Standard and generalized McDonald-Kreitman test: a website to detect selection by comparing different classes of DNA sites. Nucleic Acids Res 2008; 36:W157-62. [PMID: 18515345 PMCID: PMC2447769 DOI: 10.1093/nar/gkn337] [Citation(s) in RCA: 107] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
The McDonald and Kreitman test (MKT) is one of the most powerful and extensively used tests to detect the signature of natural selection at the molecular level. Here, we present the standard and generalized MKT website, a novel website that allows performing MKTs not only for synonymous and nonsynonymous changes, as the test was initially described, but also for other classes of regions and/or several loci. The website has three different interfaces: (i) the standard MKT, where users can analyze several types of sites in a coding region, (ii) the advanced MKT, where users can compare two closely linked regions in the genome that can be either coding or noncoding, and (iii) the multi-locus MKT, where users can analyze many separate loci in a single multi-locus test. The website has already been used to show that selection efficiency is positively correlated with effective population size in the Drosophila genus and it has been applied to include estimates of selection in DPDB. This website is a timely resource, which will presumably be widely used by researchers in the field and will contribute to enlarge the catalogue of cases of adaptive evolution. It is available at http://mkt.uab.es.
Collapse
|