1
|
Jeffares DC, Tomiczek B, Sojo V, dos Reis M. A beginners guide to estimating the non-synonymous to synonymous rate ratio of all protein-coding genes in a genome. Methods Mol Biol 2015; 1201:65-90. [PMID: 25388108 DOI: 10.1007/978-1-4939-1438-8_4] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
The ratio of non-synonymous to synonymous substitutions (dN/dS) is a useful measure of the strength and mode of natural selection acting on protein-coding genes. It is widely used to study patterns of selection on protein genes on a genomic scale-from the small genomes of viruses, bacteria, and parasitic eukaryotes to the largest eukaryotic genomes. In this chapter we describe all the steps necessary to calculate the dN/dS of all the genes using at least two genomes. We include a brief discussion on assigning orthologs, and of codon-aware alignment of orthologs. We then describe how to use the CODEML program of the PAML package for phylogenetic analysis to calculate the dN/dS and how to perform some statistical tests for positive selection. We then outline some methods for interpreting output and describe how one may use this data to make discoveries about the biology of your species. Finally, as a worked example we show all the steps we used to calculate dN/dS for 3,261 orthologs from six Plasmodium species, including tests for adaptive evolution (see worked_example.pdf).
Collapse
Affiliation(s)
- Daniel C Jeffares
- Research Department of Genetics, Evolution and Environment, University College London, Gower Street, London, WC1E 6BT, UK,
| | | | | | | |
Collapse
|
2
|
Wang Y, Wang X, Tang H, Tan X, Ficklin SP, Feltus FA, Paterson AH. Modes of gene duplication contribute differently to genetic novelty and redundancy, but show parallels across divergent angiosperms. PLoS One 2011; 6:e28150. [PMID: 22164235 PMCID: PMC3229532 DOI: 10.1371/journal.pone.0028150] [Citation(s) in RCA: 114] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2011] [Accepted: 11/02/2011] [Indexed: 11/18/2022] Open
Abstract
Background Both single gene and whole genome duplications (WGD) have recurred in angiosperm evolution. However, the evolutionary effects of different modes of gene duplication, especially regarding their contributions to genetic novelty or redundancy, have been inadequately explored. Results In Arabidopsis thaliana and Oryza sativa (rice), species that deeply sample botanical diversity and for which expression data are available from a wide range of tissues and physiological conditions, we have compared expression divergence between genes duplicated by six different mechanisms (WGD, tandem, proximal, DNA based transposed, retrotransposed and dispersed), and between positional orthologs. Both neo-functionalization and genetic redundancy appear to contribute to retention of duplicate genes. Genes resulting from WGD and tandem duplications diverge slowest in both coding sequences and gene expression, and contribute most to genetic redundancy, while other duplication modes contribute more to evolutionary novelty. WGD duplicates may more frequently be retained due to dosage amplification, while inferred transposon mediated gene duplications tend to reduce gene expression levels. The extent of expression divergence between duplicates is discernibly related to duplication modes, different WGD events, amino acid divergence, and putatively neutral divergence (time), but the contribution of each factor is heterogeneous among duplication modes. Gene loss may retard inter-species expression divergence. Members of different gene families may have non-random patterns of origin that are similar in Arabidopsis and rice, suggesting the action of pan-taxon principles of molecular evolution. Conclusion Gene duplication modes differ in contribution to genetic novelty and redundancy, but show some parallels in taxa separated by hundreds of millions of years of evolution.
Collapse
Affiliation(s)
- Yupeng Wang
- Plant Genome Mapping Laboratory, University of Georgia, Athens, Georgia, United States of America
- Institute of Bioinformatics, University of Georgia, Athens, Georgia, United States of America
| | - Xiyin Wang
- Plant Genome Mapping Laboratory, University of Georgia, Athens, Georgia, United States of America
- College of Life Sciences, Hebei United University, Tangshan, Hebei, China
| | - Haibao Tang
- Plant Genome Mapping Laboratory, University of Georgia, Athens, Georgia, United States of America
- Department of Plant Biology, University of Georgia, Athens, Georgia, United States of America
| | - Xu Tan
- Plant Genome Mapping Laboratory, University of Georgia, Athens, Georgia, United States of America
- Department of Plant Biology, University of Georgia, Athens, Georgia, United States of America
| | - Stephen P. Ficklin
- Plant and Environmental Sciences, Clemson University, Clemson, South Carolina, United States of America
| | - F. Alex Feltus
- Plant and Environmental Sciences, Clemson University, Clemson, South Carolina, United States of America
- Department of Genetics and Biochemistry, Clemson University, Clemson, South Carolina, United States of America
| | - Andrew H. Paterson
- Plant Genome Mapping Laboratory, University of Georgia, Athens, Georgia, United States of America
- Institute of Bioinformatics, University of Georgia, Athens, Georgia, United States of America
- Department of Plant Biology, University of Georgia, Athens, Georgia, United States of America
- Department of Crop and Soil Sciences, University of Georgia, Athens, Georgia, United States of America
- Department of Genetics, University of Georgia, Athens, Georgia, United States of America
- * E-mail:
| |
Collapse
|
3
|
Vishnoi A, Sethupathy P, Simola D, Plotkin JB, Hannenhalli S. Genome-wide survey of natural selection on functional, structural, and network properties of polymorphic sites in Saccharomyces paradoxus. Mol Biol Evol 2011; 28:2615-27. [PMID: 21478372 DOI: 10.1093/molbev/msr085] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND To characterize the genetic basis of phenotypic evolution, numerous studies have identified individual genes that have likely evolved under natural selection. However, phenotypic changes may represent the cumulative effect of similar evolutionary forces acting on functionally related groups of genes. Phylogenetic analyses of divergent yeast species have identified functional groups of genes that have evolved at significantly different rates, suggestive of differential selection on the functional properties. However, due to environmental heterogeneity over long evolutionary timescales, selection operating within a single lineage may be dramatically different, and it is not detectable via interspecific comparisons alone. Moreover, interspecific studies typically quantify selection on protein-coding regions using the D(n)/D(s) ratio, which cannot be extended easily to study selection on noncoding regions or synonymous sites. The population genetic-based analysis of selection operating within a single lineage ameliorates these limitations. FINDINGS We investigated selection on several properties associated with genes, promoters, or polymorphic sites, by analyzing the derived allele frequency spectrum of single nucleotide polymorphisms (SNPs) in 28 strains of Saccharomyces paradoxus. We found evidence for significant differential selection between many functionally relevant categories of SNPs, underscoring the utility of function-centric approaches for discovering signatures of natural selection. When comparable, our findings are largely consistent with previous studies based on interspecific comparisons, with one notable exception: our study finds that mutations from an ancient amino acid to a relatively new amino acid are selectively disfavored, whereas interspecific comparisons have found selection against ancient amino acids. Several of our findings have not been addressed through prior interspecific studies: we find that synonymous mutations from preferred to unpreferred codons are selected against and that synonymous SNPs in the linker regions of proteins are relatively less constrained than those within protein domains. CONCLUSIONS We present the first global survey of selection acting on various functional properties in S. paradoxus. We found that selection pressures previously detected over long evolutionary timescales have also shaped the evolution of S. paradoxus. Importantly, we also make novel discoveries untenable via conventional interspecific analyses.
Collapse
|
4
|
Wang Y, Robbins KR, Rekaya R. Comparison of computational models for assessing conservation of gene expression across species. PLoS One 2010; 5:e13239. [PMID: 20949029 PMCID: PMC2951896 DOI: 10.1371/journal.pone.0013239] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2010] [Accepted: 09/10/2010] [Indexed: 11/22/2022] Open
Abstract
Assessing conservation/divergence of gene expression across species is important for the understanding of gene regulation evolution. Although advances in microarray technology have provided massive high-dimensional gene expression data, the analysis of such data is still challenging. To date, assessing cross-species conservation of gene expression using microarray data has been mainly based on comparison of expression patterns across corresponding tissues, or comparison of co-expression of a gene with a reference set of genes. Because direct and reliable high-throughput experimental data on conservation of gene expression are often unavailable, the assessment of these two computational models is very challenging and has not been reported yet. In this study, we compared one corresponding tissue based method and three co-expression based methods for assessing conservation of gene expression, in terms of their pair-wise agreements, using a frequently used human-mouse tissue expression dataset. We find that 1) the co-expression based methods are only moderately correlated with the corresponding tissue based methods, 2) the reliability of co-expression based methods is affected by the size of the reference ortholog set, and 3) the corresponding tissue based methods may lose some information for assessing conservation of gene expression. We suggest that the use of either of these two computational models to study the evolution of a gene's expression may be subject to great uncertainty, and the investigation of changes in both gene expression patterns over corresponding tissues and co-expression of the gene with other genes is necessary.
Collapse
Affiliation(s)
- Yupeng Wang
- Department of Animal and Dairy Science, University of Georgia, Athens, Georgia, United States of America.
| | | | | |
Collapse
|
5
|
Essien K, Stoeckert CJ. Conservation and divergence of known apicomplexan transcriptional regulons. BMC Genomics 2010; 11:147. [PMID: 20199665 PMCID: PMC2841118 DOI: 10.1186/1471-2164-11-147] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2009] [Accepted: 03/03/2010] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND The apicomplexans are a diverse phylum of parasites causing an assortment of diseases including malaria in a wide variety of animals and lymphoproliferation in cattle. Little is known about how these varied parasites regulate their transcriptional regulons. Even less is known about how regulon systems, consisting of transcription factors and target genes together with their associated biological process, evolve in these diverse parasites. RESULTS In order to obtain insights into the differences in transcriptional regulation between these parasites we compared the orthology profiles of putative malaria transcription factors across species and examined the enrichment patterns of four binding sites across eleven apicomplexans. About three-fifths of the factors are broadly conserved in several phylogenetic orders of sequenced apicomplexans. This observation suggests the existence of regulons whose regulation is conserved across this ancient phylum. Transcription factors not broadly conserved across the phylum are possibly involved in regulon systems that have diverged between species. Examining binding site enrichment patterns in light of transcription factor conservation patterns suggests a second mode via which regulon systems may diverge - rewiring of existing transcription factors and their associated binding sites in specific ways. Integrating binding sites with transcription factor conservation patterns also facilitated prediction of putative regulators for one of the binding sites. CONCLUSIONS Even though transcription factors are underrepresented in apicomplexans, the distribution of these factors and their associated regulons reflect common and family-specific transcriptional regulatory processes.
Collapse
Affiliation(s)
- Kobby Essien
- Department of Bioengineering, University of Pennsylvania, 240 SkirkanichHall, Philadelphia, Pennsylvania 19104, USA
| | | |
Collapse
|
6
|
Jurgelenaite R, Dijkstra TMH, Kocken CHM, Heskes T. Gene regulation in the intraerythrocytic cycle of Plasmodium falciparum. ACTA ACUST UNITED AC 2009; 25:1484-91. [PMID: 19336444 DOI: 10.1093/bioinformatics/btp179] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION To date, there is little knowledge about one of the processes fundamental to the biology of Plasmodium falciparum, gene regulation including transcriptional control. We use noisy threshold models to identify regulatory sequence elements explaining membership to a gene expression cluster where each cluster consists of genes active during the part of the developmental cycle inside a red blood cell. Our approach is both able to capture the combinatorial nature of gene regulation and to incorporate uncertainty about the functionality of putative regulatory sequence elements. RESULTS We find a characteristic pattern where the most common motifs tend to be absent upstream of genes active in the first half of the cycle and present upstream of genes active in the second half. We find no evidence that motif's score, orientation, location and multiplicity improves prediction of gene expression. Through comparative genome analysis, we find a list of potential transcription factors and their associated motifs. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rasa Jurgelenaite
- Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands.
| | | | | | | |
Collapse
|
7
|
Horrocks P, Wong E, Russell K, Emes RD. Control of gene expression in Plasmodium falciparum - ten years on. Mol Biochem Parasitol 2008; 164:9-25. [PMID: 19110008 DOI: 10.1016/j.molbiopara.2008.11.010] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2008] [Revised: 11/25/2008] [Accepted: 11/26/2008] [Indexed: 01/24/2023]
Abstract
Ten years ago this journal published a review with an almost identical title detailing how the then recent introduction of transfection technology had advanced our understanding of the molecular control of transcriptional processes in Plasmodium falciparum, particularly in terms of promoter structure and function. In the succeeding years, sequencing of several Plasmodium spp. genomes and application of high throughput global postgenomic technologies have proven as significant, if not more, as has the ability to genetically manipulate these parasites in dissecting the molecular control of gene expression. Here we aim to review our current understanding of the control of gene expression in P. falciparum, including evidence available from other Plasmodium spp. and apicomplexan parasites. Specifically, however, we will address the current polarised debate regarding the level at which control is mediated, and attempt to identify some of the challenges this field faces in the next 10 years.
Collapse
Affiliation(s)
- Paul Horrocks
- Institute for Science and Technology in Medicine, Keele University, Staffordshire ST5 5BG, United Kingdom.
| | | | | | | |
Collapse
|