1
|
Wang A, Chen W, Tao S. Genome-wide characterization, evolution, structure, and expression analysis of the F-box genes in Caenorhabditis. BMC Genomics 2021; 22:889. [PMID: 34895149 PMCID: PMC8665587 DOI: 10.1186/s12864-021-08189-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2020] [Accepted: 11/19/2021] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND F-box proteins represent a diverse class of adaptor proteins of the ubiquitin-proteasome system (UPS) that play critical roles in the cell cycle, signal transduction, and immune response by removing or modifying cellular regulators. Among closely related organisms of the Caenorhabditis genus, remarkable divergence in F-box gene copy numbers was caused by sizeable species-specific expansion and contraction. Although F-box gene number expansion plays a vital role in shaping genomic diversity, little is known about molecular evolutionary mechanisms responsible for substantial differences in gene number of F-box genes and their functional diversification in Caenorhabditis. Here, we performed a comprehensive evolution and underlying mechanism analysis of F-box genes in five species of Caenorhabditis genus, including C. brenneri, C. briggsae, C. elegans, C. japonica, and C. remanei. RESULTS Herein, we identified and characterized 594, 192, 377, 39, 1426 F-box homologs encoding putative F-box proteins in the genome of C. brenneri, C. briggsae, C. elegans, C. japonica, and C. remanei, respectively. Our work suggested that extensive species-specific tandem duplication followed by a small amount of gene loss was the primary mechanism responsible for F-box gene number divergence in Caenorhabditis genus. After F-box gene duplication events occurred, multiple mechanisms have contributed to gene structure divergence, including exon/intron gain/loss, exonization/pseudoexonization, exon/intron boundaries alteration, exon splits, and intron elongation by tandem repeats. Based on high-throughput RNA sequencing data analysis, we proposed that F-box gene functions have diversified by sub-functionalization through highly divergent stage-specific expression patterns in Caenorhabditis species. CONCLUSIONS Massive species-specific tandem duplications and occasional gene loss drove the rapid evolution of the F-box gene family in Caenorhabditis, leading to complex gene structural variation and diversified functions affecting growth and development within and among Caenorhabditis species. In summary, our findings outline the evolution of F-box genes in the Caenorhabditis genome and lay the foundation for future functional studies.
Collapse
Affiliation(s)
- Ailan Wang
- State Key Laboratory of Crop Stress Biology in Arid Areas and College of Life Sciences, Northwest A & F University, Yangling, 712100 Shaanxi China
- Bioinformatics Center, Northwest A&F University, Yangling, Shaanxi China
- Geneis (Beijing) Co., Beijing, China
| | - Wei Chen
- State Key Laboratory of Crop Stress Biology in Arid Areas and College of Life Sciences, Northwest A & F University, Yangling, 712100 Shaanxi China
- Bioinformatics Center, Northwest A&F University, Yangling, Shaanxi China
| | - Shiheng Tao
- State Key Laboratory of Crop Stress Biology in Arid Areas and College of Life Sciences, Northwest A & F University, Yangling, 712100 Shaanxi China
- Bioinformatics Center, Northwest A&F University, Yangling, Shaanxi China
| |
Collapse
|
2
|
Wang X, Wang SM. DNA damage repair system in C57BL/6 J mice is evolutionarily stable. BMC Genomics 2021; 22:669. [PMID: 34535077 PMCID: PMC8447752 DOI: 10.1186/s12864-021-07983-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Accepted: 09/03/2021] [Indexed: 11/24/2022] Open
Abstract
Background DNA damage repair (DDR) system is vital in maintaining genome stability and survival. DDR consists of over 160 genes in 7 different pathways to repair specific type of DNA damage caused by external and internal damaging factors. The functional importance of DDR system implies that evolution could play important roles in maintaining its functional intactness to perform its function. Indeed, it has been observed that positive selection is present in BRCA1 and BRCA2 (BRCA), which are key genes in homologous recombination pathway of DDR system, in the humans and its close relatives of chimpanzee and bonobos. Efforts have been made to investigate whether the same selection could exist for BRCA in other mammals but found no evidence so far. However, as most of the studies in non-human mammals analyzed only a single or few individuals in the studied species, the observation may not reflect the true status in the given species. Furthermore, few studies have studied evolution selection in other DDR genes except BRCA. In current study, we used laboratory mouse C57BL/6 J as a model to address evolution selection on DDR genes in non-primate mammals by dynamically monitoring genetic variation across 30 generations in C57BL/6 J. Results Using exome sequencing, we collected coding sequences of 169 DDR genes from 44 C57BL/6 J individual genomes in 2018. We compared the coding sequences with the mouse reference genome sequences derived from 1998 C57BL/6 J DNA, and with the mouse Eve6B reference genome sequences derived from 2003 C57BL/6 J DNA, covering 30 generations of C57BL/6 J from 1998 to 2018. We didn’t identify meaningful coding variation in either Brca1 or Brca2, or in 167 other DDR genes across the 30 generations. In the meantime, we did identify 812 coding variants in 116 non-DNA damage repair genes during the same period, which served as a quality control to validate the reliability of our analytic pipeline and the negative results in DDR genes. Conclusions DDR genes in laboratory mouse strain C57BL/6 J were not under positive selection across its 30-generation period, highlighting the possibility that DDR system in rodents could be evolutionarily stable. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07983-7.
Collapse
Affiliation(s)
- Xiaoyu Wang
- Cancer Centre and Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Taipa, Macau
| | - San Ming Wang
- Cancer Centre and Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Taipa, Macau.
| |
Collapse
|
3
|
Tuteja R, McKeown PC, Ryan P, Morgan CC, Donoghue MTA, Downing T, O'Connell MJ, Spillane C. Paternally Expressed Imprinted Genes under Positive Darwinian Selection in Arabidopsis thaliana. Mol Biol Evol 2019; 36:1239-1253. [PMID: 30913563 PMCID: PMC6526901 DOI: 10.1093/molbev/msz063] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Genomic imprinting is an epigenetic phenomenon where autosomal genes display uniparental expression depending on whether they are maternally or paternally inherited. Genomic imprinting can arise from parental conflicts over resource allocation to the offspring, which could drive imprinted loci to evolve by positive selection. We investigate whether positive selection is associated with genomic imprinting in the inbreeding species Arabidopsis thaliana. Our analysis of 140 genes regulated by genomic imprinting in the A. thaliana seed endosperm demonstrates they are evolving more rapidly than expected. To investigate whether positive selection drives this evolutionary acceleration, we identified orthologs of each imprinted gene across 34 plant species and elucidated their evolutionary trajectories. Increased positive selection was sought by comparing its incidence among imprinted genes with nonimprinted controls. Strikingly, we find a statistically significant enrichment of imprinted paternally expressed genes (iPEGs) evolving under positive selection, 50.6% of the total, but no such enrichment for positive selection among imprinted maternally expressed genes (iMEGs). This suggests that maternally- and paternally expressed imprinted genes are subject to different selective pressures. Almost all positively selected amino acids were fixed across 80 sequenced A. thaliana accessions, suggestive of selective sweeps in the A. thaliana lineage. The imprinted genes under positive selection are involved in processes important for seed development including auxin biosynthesis and epigenetic regulation. Our findings support a genomic imprinting model for plants where positive selection can affect paternally expressed genes due to continued conflict with maternal sporophyte tissues, even when parental conflict is reduced in predominantly inbreeding species.
Collapse
Affiliation(s)
- Reetu Tuteja
- Genetics & Biotechnology Lab, Plant & AgriBiosciences Research Centre (PABC), School of Natural Sciences, Ryan Institute, National University of Ireland Galway, Galway, Ireland.,Center for Genomics and Systems Biology, New York University, New York, NY
| | - Peter C McKeown
- Genetics & Biotechnology Lab, Plant & AgriBiosciences Research Centre (PABC), School of Natural Sciences, Ryan Institute, National University of Ireland Galway, Galway, Ireland
| | - Pat Ryan
- Genetics & Biotechnology Lab, Plant & AgriBiosciences Research Centre (PABC), School of Natural Sciences, Ryan Institute, National University of Ireland Galway, Galway, Ireland
| | - Claire C Morgan
- School of Biotechnology, Faculty of Biological Sciences, Dublin City University, Dublin, Ireland.,Division of Diabetes, Endocrinology and Metabolism, Imperial College London, London, United Kingdom
| | - Mark T A Donoghue
- Genetics & Biotechnology Lab, Plant & AgriBiosciences Research Centre (PABC), School of Natural Sciences, Ryan Institute, National University of Ireland Galway, Galway, Ireland.,Memorial Sloan Kettering Cancer Center, New York, NY
| | - Tim Downing
- School of Biotechnology, Faculty of Biological Sciences, Dublin City University, Dublin, Ireland
| | - Mary J O'Connell
- Computational and Molecular Evolutionary Biology Research Group, School of Biology, Faculty of Biological Sciences, The University of Leeds, Leeds, United Kingdom.,Computational and Molecular Evolutionary Biology Group, School of Life Sciences, University of Nottingham, Nottingham, United Kingdom
| | - Charles Spillane
- Genetics & Biotechnology Lab, Plant & AgriBiosciences Research Centre (PABC), School of Natural Sciences, Ryan Institute, National University of Ireland Galway, Galway, Ireland
| |
Collapse
|
4
|
Cell-Derived Viral Genes Evolve under Stronger Purifying Selection in Rhadinoviruses. J Virol 2018; 92:JVI.00359-18. [PMID: 29997213 DOI: 10.1128/jvi.00359-18] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Accepted: 06/01/2018] [Indexed: 12/20/2022] Open
Abstract
Like many other large double-stranded DNA (dsDNA) viruses, herpesviruses are known to capture host genes to evade host defenses. Little is known about the detailed natural history of such genes, nor do we fully understand their evolutionary dynamics. A major obstacle is that they are often highly divergent, maintaining very low sequence similarity to host homologs. Here we use the herpesvirus genus Rhadinovirus as a model system to develop an analytical approach that combines complementary evolutionary and bioinformatic techniques, offering results that are both detailed and robust for a range of genes. Using a systematic phylogenetic strategy, we identify the original host lineage of viral genes with high confidence. We show that although host immunomodulatory genes evolve rapidly compared to other host genes, they undergo a clear increase in purifying selection once captured by a virus. To characterize this shift in detail, we developed a novel technique to identify changes in selection pressure that can be attributable to particular domains. These findings will inform us on how viruses develop strategies to evade the immune system, and our synthesis of techniques can be reapplied to other viruses or biological systems with similar analytical challenges.IMPORTANCE Viruses and hosts have been shown to capture genes from one another as part of the evolutionary arms race. Such genes offer a natural experiment on the effects of evolutionary pressure, since the same gene exists in vastly different selective environments. However, sequences of viral homologs often bear little similarity to the original sequence, complicating the reconstruction of their shared evolutionary history with host counterparts. In this study, we use a genus of herpesviruses as a model system to comprehensively investigate the evolution of host-derived viral genes, using a synthesis of genomics, phylogenetics, selection analysis, and nucleotide and amino acid modeling.
Collapse
|
5
|
Yin H, Guo HB, Weston DJ, Borland AM, Ranjan P, Abraham PE, Jawdy SS, Wachira J, Tuskan GA, Tschaplinski TJ, Wullschleger SD, Guo H, Hettich RL, Gross SM, Wang Z, Visel A, Yang X. Diel rewiring and positive selection of ancient plant proteins enabled evolution of CAM photosynthesis in Agave. BMC Genomics 2018; 19:588. [PMID: 30081833 PMCID: PMC6090859 DOI: 10.1186/s12864-018-4964-7] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2017] [Accepted: 07/26/2018] [Indexed: 12/22/2022] Open
Abstract
Background Crassulacean acid metabolism (CAM) enhances plant water-use efficiency through an inverse day/night pattern of stomatal closure/opening that facilitates nocturnal CO2 uptake. CAM has evolved independently in over 35 plant lineages, accounting for ~ 6% of all higher plants. Agave species are highly heat- and drought-tolerant, and have been domesticated as model CAM crops for beverage, fiber, and biofuel production in semi-arid and arid regions. However, the genomic basis of evolutionary innovation of CAM in genus Agave is largely unknown. Results Using an approach that integrated genomics, gene co-expression networks, comparative genomics and protein structure analyses, we investigated the molecular evolution of CAM as exemplified in Agave. Comparative genomics analyses among C3, C4 and CAM species revealed that core metabolic components required for CAM have ancient genomic origins traceable to non-vascular plants while regulatory proteins required for diel re-programming of metabolism have a more recent origin shared among C3, C4 and CAM species. We showed that accelerated evolution of key functional domains in proteins responsible for primary metabolism and signaling, together with a diel re-programming of the transcription of genes involved in carbon fixation, carbohydrate processing, redox homeostasis, and circadian control is required for the evolution of CAM in Agave. Furthermore, we highlighted the potential candidates contributing to the adaptation of CAM functional modules. Conclusions This work provides evidence of adaptive evolution of CAM related pathways. We showed that the core metabolic components required for CAM are shared by non-vascular plants, but regulatory proteins involved in re-reprogramming of carbon fixation and metabolite transportation appeared more recently. We propose that the accelerated evolution of key proteins together with a diel re-programming of gene expression were required for CAM evolution from C3 ancestors in Agave. Electronic supplementary material The online version of this article (10.1186/s12864-018-4964-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hengfu Yin
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA.,Present address: Research Institute of Subtropical Forestry, Chinese Academy of Forestry, Zhejiang, 311400, Hangzhou, China
| | - Hao-Bo Guo
- Department of Biology, University of Tennessee, Knoxville, TN, 37996, USA
| | - David J Weston
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Anne M Borland
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA.,School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK
| | - Priya Ranjan
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA.,DOE-Center for Bioenergy Innovation (CBI), Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Paul E Abraham
- DOE-Center for Bioenergy Innovation (CBI), Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA.,Chemical Sciences Division, Oak Ridge National Laboratory, 37831, Oak Ridge, TN, USA
| | - Sara S Jawdy
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA.,DOE-Center for Bioenergy Innovation (CBI), Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - James Wachira
- Department of Biology, Morgan State University, Baltimore, MD, 21251, USA
| | - Gerald A Tuskan
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA.,DOE-Center for Bioenergy Innovation (CBI), Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Timothy J Tschaplinski
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA.,DOE-Center for Bioenergy Innovation (CBI), Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Stan D Wullschleger
- Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Hong Guo
- Department of Biology, University of Tennessee, Knoxville, TN, 37996, USA
| | - Robert L Hettich
- DOE-Center for Bioenergy Innovation (CBI), Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA.,Chemical Sciences Division, Oak Ridge National Laboratory, 37831, Oak Ridge, TN, USA
| | - Stephen M Gross
- DOE Joint Genome Institute, Walnut Creek, CA, 94598, USA.,Present address: Illumina, Inc., San Diego, CA, 92122, USA
| | - Zhong Wang
- DOE Joint Genome Institute, Walnut Creek, CA, 94598, USA.,School of Natural Sciences, University of California, Merced, CA, 95343, USA.,Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Axel Visel
- DOE Joint Genome Institute, Walnut Creek, CA, 94598, USA.,School of Natural Sciences, University of California, Merced, CA, 95343, USA.,Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Xiaohan Yang
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA. .,DOE-Center for Bioenergy Innovation (CBI), Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA.
| |
Collapse
|
6
|
Llinares-López F, Papaxanthos L, Bodenham D, Roqueiro D, Borgwardt K. Genome-wide genetic heterogeneity discovery with categorical covariates. Bioinformatics 2018; 33:1820-1828. [PMID: 28200033 PMCID: PMC5870548 DOI: 10.1093/bioinformatics/btx071] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Accepted: 02/08/2017] [Indexed: 12/30/2022] Open
Abstract
Motivation Genetic heterogeneity is the phenomenon that distinct genetic variants may give rise to the same phenotype. The recently introduced algorithm Fast Automatic Interval Search (FAIS) enables the genome-wide search of candidate regions for genetic heterogeneity in the form of any contiguous sequence of variants, and achieves high computational efficiency and statistical power. Although FAIS can test all possible genomic regions for association with a phenotype, a key limitation is its inability to correct for confounders such as gender or population structure, which may lead to numerous false-positive associations. Results We propose FastCMH, a method that overcomes this problem by properly accounting for categorical confounders, while still retaining statistical power and computational efficiency. Experiments comparing FastCMH with FAIS and multiple kinds of burden tests on simulated data, as well as on human and Arabidopsis samples, demonstrate that FastCMH can drastically reduce genomic inflation and discover associations that are missed by standard burden tests. Availability and Implementation An R package fastcmh is available on CRAN and the source code can be found at: https://www.bsse.ethz.ch/mlcb/research/bioinformatics-and-computational-biology/fastcmh.html Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Felipe Llinares-López
- Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Laetitia Papaxanthos
- Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Dean Bodenham
- Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Damian Roqueiro
- Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | - Karsten Borgwardt
- Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
7
|
Mondragón-Palomino M, Stam R, John-Arputharaj A, Dresselhaus T. Diversification of defensins and NLRs in Arabidopsis species by different evolutionary mechanisms. BMC Evol Biol 2017; 17:255. [PMID: 29246101 PMCID: PMC5731061 DOI: 10.1186/s12862-017-1099-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Accepted: 11/24/2017] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Genes encoding proteins underlying host-pathogen co-evolution and which are selected for new resistance specificities frequently are under positive selection, a process that maintains diversity. Here, we tested the contribution of natural selection, recombination and transcriptional divergence to the evolutionary diversification of the plant defensins superfamily in three Arabidopsis species. The intracellular NOD-like receptor (NLR) family was used for comparison because positive selection has been well documented in its members. Similar to defensins, NLRs are encoded by a large and polymorphic gene family and many of their members are involved in the immune response. RESULTS Gene trees of Arabidopsis defensins (DEFLs) show a high prevalence of clades containing orthologs. This indicates that their diversity dates back to a common ancestor and species-specific duplications did not significantly contribute to gene family expansion. DEFLs are characterized by a pervasive pattern of neutral evolution with infrequent positive and negative selection as well as recombination. In comparison, most NLR alignment groups are characterized by frequent occurrence of positive selection and recombination in their leucine-rich repeat (LRR) domain as well negative selection in their nucleotide-binding (NB-ARC) domain. While major NLR subgroups are expressed in pistils and leaves both in presence or absence of pathogen infection, the members of DEFL alignment groups are predominantly transcribed in pistils. Furthermore, conserved groups of NLRs and DEFLs are differentially expressed in response to Fusarium graminearum regardless of whether these genes are under positive selection or not. CONCLUSIONS The present analyses of NLRs expands previous studies in Arabidopsis thaliana and highlights contrasting patterns of purifying and diversifying selection affecting different gene regions. DEFL genes show a different evolutionary trend, with fewer recombination events and significantly fewer instances of natural selection. Their heterogeneous expression pattern suggests that transcriptional divergence probably made the major contribution to functional diversification. In comparison to smaller families encoding pathogenesis-related (PR) proteins under positive selection, DEFLs are involved in a wide variety of processes that altogether might pose structural and functional trade-offs to their family-wide pattern of evolution.
Collapse
Affiliation(s)
- Mariana Mondragón-Palomino
- Cell Biology and Plant Biochemistry, Biochemie-Zentrum Regensburg, University of Regensburg, Universitätstraße 31, 93053, Regensburg, Germany.
| | - Remco Stam
- Chair of Phytopathology, Technical University of Munich, School of Life Sciences Weihenstephan, Emil-Ramann-Str. 2, 85354, Freising, Germany
| | - Ajay John-Arputharaj
- Cell Biology and Plant Biochemistry, Biochemie-Zentrum Regensburg, University of Regensburg, Universitätstraße 31, 93053, Regensburg, Germany
| | - Thomas Dresselhaus
- Cell Biology and Plant Biochemistry, Biochemie-Zentrum Regensburg, University of Regensburg, Universitätstraße 31, 93053, Regensburg, Germany
| |
Collapse
|
8
|
Chen C, Steibel JP, Tempelman RJ. Genome-Wide Association Analyses Based on Broadly Different Specifications for Prior Distributions, Genomic Windows, and Estimation Methods. Genetics 2017; 206:1791-1806. [PMID: 28637709 PMCID: PMC5560788 DOI: 10.1534/genetics.117.202259] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2017] [Accepted: 06/19/2017] [Indexed: 11/18/2022] Open
Abstract
A currently popular strategy (EMMAX) for genome-wide association (GWA) analysis infers association for the specific marker of interest by treating its effect as fixed while treating all other marker effects as classical Gaussian random effects. It may be more statistically coherent to specify all markers as sharing the same prior distribution, whether that distribution is Gaussian, heavy-tailed (BayesA), or has variable selection specifications based on a mixture of, say, two Gaussian distributions [stochastic search and variable selection (SSVS)]. Furthermore, all such GWA inference should be formally based on posterior probabilities or test statistics as we present here, rather than merely being based on point estimates. We compared these three broad categories of priors within a simulation study to investigate the effects of different degrees of skewness for quantitative trait loci (QTL) effects and numbers of QTL using 43,266 SNP marker genotypes from 922 Duroc-Pietrain F2-cross pigs. Genomic regions were based either on single SNP associations, on nonoverlapping windows of various fixed sizes (0.5-3 Mb), or on adaptively determined windows that cluster the genome into blocks based on linkage disequilibrium. We found that SSVS and BayesA lead to the best receiver operating curve properties in almost all cases. We also evaluated approximate maximum a posteriori (MAP) approaches to BayesA and SSVS as potential computationally feasible alternatives; however, MAP inferences were not promising, particularly due to their sensitivity to starting values. We determined that it is advantageous to use variable selection specifications based on adaptively constructed genomic window lengths for GWA studies.
Collapse
Affiliation(s)
- Chunyu Chen
- Department of Animal Science, Michigan State University, East Lansing, Michigan 48824
| | - Juan P Steibel
- Department of Animal Science, Michigan State University, East Lansing, Michigan 48824
| | - Robert J Tempelman
- Department of Animal Science, Michigan State University, East Lansing, Michigan 48824
| |
Collapse
|
9
|
Evolutionary Analysis of the Mammalian Tuftelin Sequence Reveals Features of Functional Importance. J Mol Evol 2017; 84:214-224. [PMID: 28409196 DOI: 10.1007/s00239-017-9789-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2016] [Accepted: 03/22/2017] [Indexed: 12/31/2022]
Abstract
Tuftelin (TUFT1) is an acidic, phosphorylated glycoprotein, initially discovered in developing enamel matrix. TUFT1 is expressed in many mineralized and non-mineralized tissues. We performed an evolutionary analysis of 82 mammalian TUFT1 sequences to identify residues and motifs that were conserved during 220 million years (Ma) of evolution. We showed that 168 residues (out of the 390 residues composing the human TUFT1 sequence) are under purifying selection. Our analyses identified several, new, putatively functional domains and confirmed previously described functional domains, such as the TIP39 interaction domain, which correlates with nuclear localization of the TUFT1 protein, that was demonstrated in several tissues. We also identified several sites under positive selection, which could indicate evolutionary changes possibly related to the functional diversification of TUFT1 during evolution in some lineages. We discovered that TUFT1 and MYZAP (myocardial zonula adherens protein) share a common ancestor that was duplicated circa 500 million years ago. Taken together, these findings expand our knowledge of TUFT1 evolution and provide new information that will be useful for further investigation of TUFT1 functions.
Collapse
|
10
|
Oh S, Zhang R, Wu QL, Liu WT. Evolution and adaptation of SAR11 and Cyanobium in a saline Tibetan lake. ENVIRONMENTAL MICROBIOLOGY REPORTS 2016; 8:595-604. [PMID: 27084571 DOI: 10.1111/1758-2229.12408] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2015] [Accepted: 03/15/2016] [Indexed: 06/05/2023]
Abstract
Lake Qinghai is a unique lacustrine ecosystem located on the Tibetan Plateau and exhibits oligotrophic, alkaline, and saline conditions. Previous studies have focused on the community phylogenetic diversity of bacterioplankton in the ecosystem. This study aimed to address the ecotype diversity of bacterioplankton populations in the unique microbial habitat, using metagenomic sequencing and analysis. Phylogenetic analysis revealed two major bacterial populations: SAR11 IIIa (14% of the total) and Cyanobium (14%). Although the two populations shared high 16S rRNA gene sequence identity (> 98% identity) with their closest marine counterparts, they displayed substantial genomic divergence (≤ 80% average amino acid sequence identity). Comparative genomic analysis identified conservation of carbon and energy storage metabolism (biosynthesis of polyphosphate and polyhydroxyalkanoate) gene operons in the SAR11 IIIa and a cyanate (potential nitrogen source in alkaline conditions) transporter gene operon in the Cyanobium. We further identified genetic signature of positive selection acting on an exodeoxyribonuclease gene of the SAR11 IIIa population, which is potentially associated with DNA repair responsive to strong UV radiation on the high altitude mountain. Taken together, our results revealed the ecosystem-specific gene content of the bacterioplankton populations and provided new insights into their adaptations unique to the Tibetan lake.
Collapse
Affiliation(s)
- Seungdae Oh
- School of Civil and Environmental Engineering, Nanyang Technological University, Singapore, Singapore
- Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Rui Zhang
- State Key Laboratory of Marine Environmental Science, Xiamen University, Fuijan, China
- Institute of Marine Microbes and Ecospheres, Xiamen University, Fuijan, China
| | - Qinglong L Wu
- State Key Laboratory of Lake Science and Environment, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing, China
| | - Wen-Tso Liu
- Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| |
Collapse
|
11
|
|
12
|
Mäkinen H, Vasemägi A, McGinnity P, Cross TF, Primmer CR. Population genomic analyses of early-phase Atlantic Salmon (Salmo salar) domestication/captive breeding. Evol Appl 2014; 8:93-107. [PMID: 25667605 PMCID: PMC4310584 DOI: 10.1111/eva.12230] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Accepted: 10/10/2014] [Indexed: 12/28/2022] Open
Abstract
Domestication can have adverse genetic consequences, which may reduce the fitness of individuals once released back into the wild. Many wild Atlantic salmon (Salmo salarL.) populations are threatened by anthropogenic influences, and they are supplemented with captively bred fish. The Atlantic salmon is also widely used in selective breeding programs to increase the mean trait values for desired phenotypic traits. We analyzed a genomewide set of SNPs in three domesticated Atlantic salmon strains and their wild conspecifics to identify loci underlying domestication. The genetic differentiation between domesticated strains and wild populations was low (FST < 0.03), and domesticated strains harbored similar levels of genetic diversity compared to their wild conspecifics. Only a few loci showed footprints of selection, and these loci were located in different linkage groups among the different wild population/hatchery strain comparisons. Simulated scenarios indicated that differentiation in quantitative trait loci exceeded that in neutral markers during the early phases of divergence only when the difference in the phenotypic optimum between populations was large. This study indicates that detecting selection using standard approaches in the early phases of domestication might be challenging unless selection is strong and the traits under selection show simple inheritance patterns.
Collapse
Affiliation(s)
- Hannu Mäkinen
- Division of Genetics and Physiology, Department of Biology, University of Turku Turku, Finland
| | - Anti Vasemägi
- Division of Genetics and Physiology, Department of Biology, University of Turku Turku, Finland ; Department of Aquaculture, Estonian University of Life Sciences Tartu, Estonia
| | - Philip McGinnity
- Aquaculture and Fisheries Development Centre, School of Biological, Earth, and Environmental Sciences, University College Cork Cork, Ireland ; Marine Institute, Furnace Newport, Co. Mayo, Ireland
| | - Tom F Cross
- Aquaculture and Fisheries Development Centre, School of Biological, Earth, and Environmental Sciences, University College Cork Cork, Ireland
| | - Craig R Primmer
- Division of Genetics and Physiology, Department of Biology, University of Turku Turku, Finland
| |
Collapse
|
13
|
Choudhury A, Hazelhurst S, Meintjes A, Achinike-Oduaran O, Aron S, Gamieldien J, Jalali Sefid Dashti M, Mulder N, Tiffin N, Ramsay M. Population-specific common SNPs reflect demographic histories and highlight regions of genomic plasticity with functional relevance. BMC Genomics 2014; 15:437. [PMID: 24906912 PMCID: PMC4092225 DOI: 10.1186/1471-2164-15-437] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2013] [Accepted: 05/19/2014] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Population differentiation is the result of demographic and evolutionary forces. Whole genome datasets from the 1000 Genomes Project (October 2012) provide an unbiased view of genetic variation across populations from Europe, Asia, Africa and the Americas. Common population-specific SNPs (MAF > 0.05) reflect a deep history and may have important consequences for health and wellbeing. Their interpretation is contextualised by currently available genome data. RESULTS The identification of common population-specific (CPS) variants (SNPs and SSV) is influenced by admixture and the sample size under investigation. Nine of the populations in the 1000 Genomes Project (2 African, 2 Asian (including a merged Chinese group) and 5 European) revealed that the African populations (LWK and YRI), followed by the Japanese (JPT) have the highest number of CPS SNPs, in concordance with their histories and given the populations studied. Using two methods, sliding 50-SNP and 5-kb windows, the CPS SNPs showed distinct clustering across large genome segments and little overlap of clusters between populations. iHS enrichment score and the population branch statistic (PBS) analyses suggest that selective sweeps are unlikely to account for the clustering and population specificity. Of interest is the association of clusters close to recombination hotspots. Functional analysis of genes associated with the CPS SNPs revealed over-representation of genes in pathways associated with neuronal development, including axonal guidance signalling and CREB signalling in neurones. CONCLUSIONS Common population-specific SNPs are non-randomly distributed throughout the genome and are significantly associated with recombination hotspots. Since the variant alleles of most CPS SNPs are the derived allele, they likely arose in the specific population after a split from a common ancestor. Their proximity to genes involved in specific pathways, including neuronal development, suggests evolutionary plasticity of selected genomic regions. Contrary to expectation, selective sweeps did not play a large role in the persistence of population-specific variation. This suggests a stochastic process towards population-specific variation which reflects demographic histories and may have some interesting implications for health and susceptibility to disease.
Collapse
Affiliation(s)
- Ananyo Choudhury
- />Sydney Brenner Institute of Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa
- />Division of Human Genetics, National Health Laboratory Service, School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Scott Hazelhurst
- />Sydney Brenner Institute of Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa
- />School of Electrical & Information Engineering, University of the Witwatersrand, Johannesburg, South Africa
| | - Ayton Meintjes
- />Department Clinical Laboratory Sciences, Computational Biology Group, IDM, University of Cape Town, Cape Town, South Africa
| | - Ovokeraye Achinike-Oduaran
- />Sydney Brenner Institute of Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa
- />Division of Human Genetics, National Health Laboratory Service, School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Shaun Aron
- />Sydney Brenner Institute of Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa
| | - Junaid Gamieldien
- />South African National Bioinformatics Institute/Medical Research Council of South Africa Bioinformatics Unit, University of the Western Cape, Bellville, South Africa
| | - Mahjoubeh Jalali Sefid Dashti
- />South African National Bioinformatics Institute/Medical Research Council of South Africa Bioinformatics Unit, University of the Western Cape, Bellville, South Africa
| | - Nicola Mulder
- />Department Clinical Laboratory Sciences, Computational Biology Group, IDM, University of Cape Town, Cape Town, South Africa
| | - Nicki Tiffin
- />South African National Bioinformatics Institute/Medical Research Council of South Africa Bioinformatics Unit, University of the Western Cape, Bellville, South Africa
| | - Michèle Ramsay
- />Sydney Brenner Institute of Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa
- />Division of Human Genetics, National Health Laboratory Service, School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| |
Collapse
|
14
|
Meyerson NR, Rowley PA, Swan CH, Le DT, Wilkerson GK, Sawyer SL. Positive selection of primate genes that promote HIV-1 replication. Virology 2014; 454-455:291-8. [PMID: 24725956 DOI: 10.1016/j.virol.2014.02.029] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2014] [Revised: 02/19/2014] [Accepted: 02/28/2014] [Indexed: 12/22/2022]
Abstract
Evolutionary analyses have revealed that most host-encoded restriction factors against HIV have experienced virus-driven selection during primate evolution. However, HIV also depends on the function of many human proteins, called host factors, for its replication. It is not clear whether virus-driven selection shapes the evolution of host factor genes to the extent that it is known to shape restriction factor genes. We show that five out of 40 HIV host factor genes (13%) analyzed do bear strong signatures of positive selection. Some of these genes (CD4, NUP153, RANBP2/NUP358) have been characterized with respect to the HIV lifecycle, while others (ANKRD30A/NY-BR-1 and MAP4) remain relatively uncharacterized. One of these, ANKRD30A, shows the most rapid evolution within this set of genes and is induced by interferon stimulation. We discuss how evolutionary analysis can aid the study of host factors for viral replication, just as it has the study of host immunity systems.
Collapse
Affiliation(s)
- Nicholas R Meyerson
- Department of Molecular Biosciences, Institute for Cellular and Molecular Biology, University of Texas at Austin, 2500 Speedway, Austin, TX 78712-1191, USA
| | - Paul A Rowley
- Department of Molecular Biosciences, Institute for Cellular and Molecular Biology, University of Texas at Austin, 2500 Speedway, Austin, TX 78712-1191, USA
| | - Christina H Swan
- Department of Molecular Biosciences, Institute for Cellular and Molecular Biology, University of Texas at Austin, 2500 Speedway, Austin, TX 78712-1191, USA; Regents School of Austin, 3230 Travis Country Circle, Austin, TX, USA
| | - Dona T Le
- Department of Molecular Biosciences, Institute for Cellular and Molecular Biology, University of Texas at Austin, 2500 Speedway, Austin, TX 78712-1191, USA
| | - Gregory K Wilkerson
- Department of Veterinary Sciences, Michale E Keeling Center for Comparative Medicine and Research, University of Texas, MD Anderson Cancer Center, Bastrop, TX, USA
| | - Sara L Sawyer
- Department of Molecular Biosciences, Institute for Cellular and Molecular Biology, University of Texas at Austin, 2500 Speedway, Austin, TX 78712-1191, USA.
| |
Collapse
|
15
|
Mayrose I, Stern A, Burdelova EO, Sabo Y, Laham-Karam N, Zamostiano R, Bacharach E, Pupko T. Synonymous site conservation in the HIV-1 genome. BMC Evol Biol 2013; 13:164. [PMID: 23914950 PMCID: PMC3750384 DOI: 10.1186/1471-2148-13-164] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2013] [Accepted: 07/25/2013] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Synonymous or silent mutations are usually thought to evolve neutrally. However, accumulating recent evidence has demonstrated that silent mutations may destabilize RNA structures or disrupt cis regulatory motifs superimposed on coding sequences. Such observations suggest the existence of stretches of codon sites that are evolutionary conserved at both DNA-RNA and protein levels. Such stretches may point to functionally important regions within protein coding sequences not necessarily reflecting functional constraints on the amino-acid sequence. The HIV-1 genome is highly compact, and often harbors overlapping functional elements at the protein, RNA, and DNA levels. This superimposition of functions leads to complex selective forces acting on all levels of the genome and proteome. Considering the constraints on HIV-1 to maintain such a highly compact genome, we hypothesized that stretches of synonymous conservation would be common within its genome. RESULTS We used a combined computational-experimental approach to detect and characterize regions exhibiting strong purifying selection against synonymous substitutions along the HIV-1 genome. Our methodology is based on advanced probabilistic evolutionary models that explicitly account for synonymous rate variation among sites and rate dependencies among adjacent sites. These models are combined with a randomization procedure to automatically identify the most statistically significant regions of conserved synonymous sites along the genome. Using this procedure we identified 21 conserved regions. Twelve of these are mapped to regions within overlapping genes, seven correlate with known functional elements, while the functions of the remaining four are yet unknown. Among these four regions, we chose the one that deviates most from synonymous rate homogeneity for in-depth computational and experimental characterization. In our assays aiming to quantify viral fitness in both early and late stages of the replication cycle, no differences were observed between the mutated and the wild type virus following the introduction of synonymous mutations. CONCLUSIONS The contradiction between the inferred purifying selective forces and the lack of effect of these mutations on viral replication may be explained by the fact that the phenotype was measured in single-cycle infection assays in cell culture. Such a system does not account for the complexity of HIV-1 infections in vivo, which involves multiple infection cycles and interaction with the host immune system.
Collapse
Affiliation(s)
- Itay Mayrose
- Department of Molecular Biology and Ecology of Plants, Tel Aviv University, Tel-Aviv 69978, Israel.
| | | | | | | | | | | | | | | |
Collapse
|
16
|
Morgan CC, Shakya K, Webb A, Walsh TA, Lynch M, Loscher CE, Ruskin HJ, O'Connell MJ. Colon cancer associated genes exhibit signatures of positive selection at functionally significant positions. BMC Evol Biol 2012; 12:114. [PMID: 22788692 PMCID: PMC3563467 DOI: 10.1186/1471-2148-12-114] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2012] [Accepted: 06/22/2012] [Indexed: 12/17/2022] Open
Abstract
Background Cancer, much like most human disease, is routinely studied by utilizing model organisms. Of these model organisms, mice are often dominant. However, our assumptions of functional equivalence fail to consider the opportunity for divergence conferred by ~180 Million Years (MY) of independent evolution between these species. For a given set of human disease related genes, it is therefore important to determine if functional equivalency has been retained between species. In this study we test the hypothesis that cancer associated genes have different patterns of substitution akin to adaptive evolution in different mammal lineages. Results Our analysis of the current literature and colon cancer databases identified 22 genes exhibiting colon cancer associated germline mutations. We identified orthologs for these 22 genes across a set of high coverage (>6X) vertebrate genomes. Analysis of these orthologous datasets revealed significant levels of positive selection. Evidence of lineage-specific positive selection was identified in 14 genes in both ancestral and extant lineages. Lineage-specific positive selection was detected in the ancestral Euarchontoglires and Hominidae lineages for STK11, in the ancestral primate lineage for CDH1, in the ancestral Murinae lineage for both SDHC and MSH6 genes and the ancestral Muridae lineage for TSC1. Conclusion Identifying positive selection in the Primate, Hominidae, Muridae and Murinae lineages suggests an ancestral functional shift in these genes between the rodent and primate lineages. Analyses such as this, combining evolutionary theory and predictions - along with medically relevant data, can thus provide us with important clues for modeling human diseases.
Collapse
Affiliation(s)
- Claire C Morgan
- Bioinformatics and Molecular Evolution Group, School of Biotechnology, Dublin City University, Glasnevin, Dublin 9, Ireland
| | | | | | | | | | | | | | | |
Collapse
|
17
|
Engle EK, Fay JC. Divergence of the yeast transcription factor FZF1 affects sulfite resistance. PLoS Genet 2012; 8:e1002763. [PMID: 22719269 PMCID: PMC3375221 DOI: 10.1371/journal.pgen.1002763] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2012] [Accepted: 04/26/2012] [Indexed: 01/06/2023] Open
Abstract
Changes in gene expression are commonly observed during evolution. However, the phenotypic consequences of expression divergence are frequently unknown and difficult to measure. Transcriptional regulators provide a mechanism by which phenotypic divergence can occur through multiple, coordinated changes in gene expression during development or in response to environmental changes. Yet, some changes in transcriptional regulators may be constrained by their pleiotropic effects on gene expression. Here, we use a genome-wide screen for promoters that are likely to have diverged in function and identify a yeast transcription factor, FZF1, that has evolved substantial differences in its ability to confer resistance to sulfites. Chimeric alleles from four Saccharomyces species show that divergence in FZF1 activity is due to changes in both its coding and upstream noncoding sequence. Between the two closest species, noncoding changes affect the expression of FZF1, whereas coding changes affect the expression of SSU1, a sulfite efflux pump activated by FZF1. Both coding and noncoding changes also affect the expression of many other genes. Our results show how divergence in the coding and promoter region of a transcription factor alters the response to an environmental stress. Changes in gene regulation are thought to play an important role in evolution. While variation in gene expression between species is common, it is hard to identify the phenotypic consequences of this variation since many changes in gene expression may have subtle or no phenotypic effects. In this study, we investigate changes in sulfite resistance and gene expression caused by the transcription factor, FZF1, that has evolved rapidly during the divergence of related yeast species. We find that divergence in the ability of FZF1 to confer sulfite resistance is mediated by changes in its expression as well as changes in its protein structure, both of which cause changes in the expression of other genes. Our results show how the combination of multiple changes within a transcription factor can produce substantial changes in phenotype and the expression of many genes.
Collapse
Affiliation(s)
- Elizabeth K. Engle
- Molecular Genetics and Genomics Program, Washington University, St. Louis, Missouri, United States of America
| | - Justin C. Fay
- Department of Genetics and Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri, United States of America
- * E-mail:
| |
Collapse
|
18
|
Hofer T, Foll M, Excoffier L. Evolutionary forces shaping genomic islands of population differentiation in humans. BMC Genomics 2012; 13:107. [PMID: 22439654 PMCID: PMC3317871 DOI: 10.1186/1471-2164-13-107] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2011] [Accepted: 03/22/2012] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Levels of differentiation among populations depend both on demographic and selective factors: genetic drift and local adaptation increase population differentiation, which is eroded by gene flow and balancing selection. We describe here the genomic distribution and the properties of genomic regions with unusually high and low levels of population differentiation in humans to assess the influence of selective and neutral processes on human genetic structure. METHODS Individual SNPs of the Human Genome Diversity Panel (HGDP) showing significantly high or low levels of population differentiation were detected under a hierarchical-island model (HIM). A Hidden Markov Model allowed us to detect genomic regions or islands of high or low population differentiation. RESULTS Under the HIM, only 1.5% of all SNPs are significant at the 1% level, but their genomic spatial distribution is significantly non-random. We find evidence that local adaptation shaped high-differentiation islands, as they are enriched for non-synonymous SNPs and overlap with previously identified candidate regions for positive selection. Moreover there is a negative relationship between the size of islands and recombination rate, which is stronger for islands overlapping with genes. Gene ontology analysis supports the role of diet as a major selective pressure in those highly differentiated islands. Low-differentiation islands are also enriched for non-synonymous SNPs, and contain an overly high proportion of genes belonging to the 'Oncogenesis' biological process. CONCLUSIONS Even though selection seems to be acting in shaping islands of high population differentiation, neutral demographic processes might have promoted the appearance of some genomic islands since i) as much as 20% of islands are in non-genic regions ii) these non-genic islands are on average two times shorter than genic islands, suggesting a more rapid erosion by recombination, and iii) most loci are strongly differentiated between Africans and non-Africans, a result consistent with known human demographic history.
Collapse
Affiliation(s)
- Tamara Hofer
- Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, 3012 Bern, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Matthieu Foll
- Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, 3012 Bern, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Laurent Excoffier
- Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, 3012 Bern, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
19
|
Huang YF, Golding GB. Inferring sequence regions under functional divergence in duplicate genes. Bioinformatics 2011; 28:176-83. [DOI: 10.1093/bioinformatics/btr635] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
20
|
Lin MF, Kheradpour P, Washietl S, Parker BJ, Pedersen JS, Kellis M. Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes. Genome Res 2011; 21:1916-28. [PMID: 21994248 DOI: 10.1101/gr.108753.110] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
The degeneracy of the genetic code allows protein-coding DNA and RNA sequences to simultaneously encode additional, overlapping functional elements. A sequence in which both protein-coding and additional overlapping functions have evolved under purifying selection should show increased evolutionary conservation compared to typical protein-coding genes--especially at synonymous sites. In this study, we use genome alignments of 29 placental mammals to systematically locate short regions within human ORFs that show conspicuously low estimated rates of synonymous substitution across these species. The 29-species alignment provides statistical power to locate more than 10,000 such regions with resolution down to nine-codon windows, which are found within more than a quarter of all human protein-coding genes and contain ∼2% of their synonymous sites. We collect numerous lines of evidence that the observed synonymous constraint in these regions reflects selection on overlapping functional elements including splicing regulatory elements, dual-coding genes, RNA secondary structures, microRNA target sites, and developmental enhancers. Our results show that overlapping functional elements are common in mammalian genes, despite the vast genomic landscape.
Collapse
Affiliation(s)
- Michael F Lin
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | | | | | | | | | | |
Collapse
|
21
|
Pertea M, Pertea GM, Salzberg SL. Detection of lineage-specific evolutionary changes among primate species. BMC Bioinformatics 2011; 12:274. [PMID: 21726447 PMCID: PMC3143108 DOI: 10.1186/1471-2105-12-274] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2011] [Accepted: 07/04/2011] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Comparison of the human genome with other primates offers the opportunity to detect evolutionary events that created the diverse phenotypes among the primate species. Because the primate genomes are highly similar to one another, methods developed for analysis of more divergent species do not always detect signs of evolutionary selection. RESULTS We have developed a new method, called DivE, specifically designed to find regions that have evolved either more or less rapidly than expected, for any clade within a set of very closely related species. Unlike some previous methods, DivE does not rely on rates of synonymous and nonsynonymous substitution, which enables it to detect evolutionary events in noncoding regions. We demonstrate using simulated data that DivE compares favorably to alternative methods, and we then apply DivE to the ENCODE regions in 14 primate species. We identify thousands of regions in these primates, ranging from 50 to >10000 bp in length, that appear to have experienced either constrained or accelerated rates of evolution. In particular, we detected 4942 regions that have potentially undergone positive selection in one or more primate species. Most of these regions occur outside of protein-coding genes, although we identified 20 proteins that have experienced positive selection. CONCLUSIONS DivE provides an easy-to-use method to predict both positive and negative selection in noncoding DNA, that is particularly well-suited to detecting lineage-specific selection in large genomes.
Collapse
Affiliation(s)
- Mihaela Pertea
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, USA
| | - Geo M Pertea
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, USA
| | - Steven L Salzberg
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| |
Collapse
|
22
|
Selection and the cell cycle: positive Darwinian selection in a well-known DNA damage response pathway. J Mol Evol 2010; 71:444-57. [PMID: 21057781 DOI: 10.1007/s00239-010-9399-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2010] [Accepted: 10/06/2010] [Indexed: 10/18/2022]
Abstract
Cancer is a common occurrence in multi-cellular organisms and is not strictly limited to the elderly in a population. It is therefore possible that individuals with genotypes that protect against early onset cancers have a selective advantage. In this study the patterns of mutation in the proteins of a well-studied DNA damage response pathway have been examined for evidence of adaptive evolutionary change. Using a maximum likelihood framework and the mammalian species phylogeny, together with codon models of evolution, selective pressure variation across the interacting network of proteins has been detected. The presence of signatures of adaptive evolution in BRCA1 and BRCA2 has already been documented but the effect on the entire network of interacting proteins in this damage response pathway has, until now, been unknown. Positive selection is evident throughout the network with a total of 11 proteins out of 15 examined displaying patterns of substitution characteristic of positive selection. It is also shown here that modern human populations display evidence of an ongoing selective sweep in 9 of these DNA damage repair proteins. The results presented here provide the community with new residues that may be relevant to cancer susceptibility while also highlighting those proteins where human and mouse have undergone lineage-specific functional shift. An understanding of this damage response pathway from an evolutionary perspective will undoubtedly contribute to future cancer treatment approaches.
Collapse
|
23
|
Demogines A, East AM, Lee JH, Grossman SR, Sabeti PC, Paull TT, Sawyer SL. Ancient and recent adaptive evolution of primate non-homologous end joining genes. PLoS Genet 2010; 6:e1001169. [PMID: 20975951 PMCID: PMC2958818 DOI: 10.1371/journal.pgen.1001169] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2010] [Accepted: 09/21/2010] [Indexed: 02/07/2023] Open
Abstract
In human cells, DNA double-strand breaks are repaired primarily by the non-homologous end joining (NHEJ) pathway. Given their critical nature, we expected NHEJ proteins to be evolutionarily conserved, with relatively little sequence change over time. Here, we report that while critical domains of these proteins are conserved as expected, the sequence of NHEJ proteins has also been shaped by recurrent positive selection, leading to rapid sequence evolution in other protein domains. In order to characterize the molecular evolution of the human NHEJ pathway, we generated large simian primate sequence datasets for NHEJ genes. Codon-based models of gene evolution yielded statistical support for the recurrent positive selection of five NHEJ genes during primate evolution: XRCC4, NBS1, Artemis, POLλ, and CtIP. Analysis of human polymorphism data using the composite of multiple signals (CMS) test revealed that XRCC4 has also been subjected to positive selection in modern humans. Crystal structures are available for XRCC4, Nbs1, and Polλ; and residues under positive selection fall exclusively on the surfaces of these proteins. Despite the positive selection of such residues, biochemical experiments with variants of one positively selected site in Nbs1 confirm that functions necessary for DNA repair and checkpoint signaling have been conserved. However, many viruses interact with the proteins of the NHEJ pathway as part of their infectious lifecycle. We propose that an ongoing evolutionary arms race between viruses and NHEJ genes may be driving the surprisingly rapid evolution of these critical genes. Because all cells experience DNA damage, they must also have mechanisms for repairing DNA. When the proteins that repair DNA malfunction, mutation and disease often result. Based on their fundamental importance, DNA repair proteins would be expected to be well preserved over evolutionary time in order to ensure optimal DNA repair function. However, a previous genome-wide study of molecular evolution in Saccharomyces yeast identified the non-homologous end joining (NHEJ) DNA repair pathway as one of the two most rapidly evolving pathways in the yeast genome. In order to analyze the evolution of this pathway in humans, we have generated large evolutionary sequence sets of NHEJ genes from our primate relatives. Similar to the scenario in yeast, several genes in this pathway are evolving rapidly in primate genomes and in modern human populations. Thus, complex and seemingly opposite selective forces are shaping the evolution of these important DNA repair genes. The finding that NHEJ genes are rapidly evolving in species groups as diverse as yeasts and primates indicates a systematic perturbation of the NHEJ pathway, one that is potentially important to human health.
Collapse
Affiliation(s)
- Ann Demogines
- Section of Molecular Genetics and Microbiology, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
| | - Alysia M. East
- Section of Molecular Genetics and Microbiology, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
| | - Ji-Hoon Lee
- Section of Molecular Genetics and Microbiology, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
- The Howard Hughes Medical Institute, Chevy Chase, Maryland, United States of America
| | - Sharon R. Grossman
- FAS Center for Systems Biology, Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Pardis C. Sabeti
- FAS Center for Systems Biology, Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Tanya T. Paull
- Section of Molecular Genetics and Microbiology, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
- The Howard Hughes Medical Institute, Chevy Chase, Maryland, United States of America
| | - Sara L. Sawyer
- Section of Molecular Genetics and Microbiology, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
- * E-mail:
| |
Collapse
|
24
|
Steinway SN, Dannenfelser R, Laucius CD, Hayes JE, Nayak S. JCoDA: a tool for detecting evolutionary selection. BMC Bioinformatics 2010; 11:284. [PMID: 20507581 PMCID: PMC2887424 DOI: 10.1186/1471-2105-11-284] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2010] [Accepted: 05/27/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The incorporation of annotated sequence information from multiple related species in commonly used databases (Ensembl, Flybase, Saccharomyces Genome Database, Wormbase, etc.) has increased dramatically over the last few years. This influx of information has provided a considerable amount of raw material for evaluation of evolutionary relationships. To aid in the process, we have developed JCoDA (Java Codon Delimited Alignment) as a simple-to-use visualization tool for the detection of site specific and regional positive/negative evolutionary selection amongst homologous coding sequences. RESULTS JCoDA accepts user-inputted unaligned or pre-aligned coding sequences, performs a codon-delimited alignment using ClustalW, and determines the dN/dS calculations using PAML (Phylogenetic Analysis Using Maximum Likelihood, yn00 and codeml) in order to identify regions and sites under evolutionary selection. The JCoDA package includes a graphical interface for Phylip (Phylogeny Inference Package) to generate phylogenetic trees, manages formatting of all required file types, and streamlines passage of information between underlying programs. The raw data are output to user configurable graphs with sliding window options for straightforward visualization of pairwise or gene family comparisons. Additionally, codon-delimited alignments are output in a variety of common formats and all dN/dS calculations can be output in comma-separated value (CSV) format for downstream analysis. To illustrate the types of analyses that are facilitated by JCoDA, we have taken advantage of the well studied sex determination pathway in nematodes as well as the extensive sequence information available to identify genes under positive selection, examples of regional positive selection, and differences in selection based on the role of genes in the sex determination pathway. CONCLUSIONS JCoDA is a configurable, open source, user-friendly visualization tool for performing evolutionary analysis on homologous coding sequences. JCoDA can be used to rapidly screen for genes and regions of genes under selection using PAML. It can be freely downloaded at http://www.tcnj.edu/~nayaklab/jcoda.
Collapse
Affiliation(s)
- Steven N Steinway
- Department of Biology, The College of New Jersey, 2000 Pennington Road, Ewing, NJ 08628, USA
| | | | | | | | | |
Collapse
|
25
|
Al-Hashimi N, Sire JY, Delgado S. Evolutionary analysis of mammalian enamelin, the largest enamel protein, supports a crucial role for the 32-kDa peptide and reveals selective adaptation in rodents and primates. J Mol Evol 2010; 69:635-56. [PMID: 20012271 DOI: 10.1007/s00239-009-9302-x] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2009] [Accepted: 11/06/2009] [Indexed: 12/20/2022]
Abstract
Enamelin (ENAM) plays an important role in the mineralization of the forming enamel matrix. We have performed an evolutionary analysis of mammalian ENAM to identify highly conserved residues or regions that could have important function (selective pressure), to predict mutations that could be associated with amelogenesis imperfecta in humans, and to identify possible adaptive evolution of ENAM during 200 million years ago of mammalian evolution. In order to fulfil these objectives, we obtained 36-ENAM sequences that are representative of the mammalian lineages. Our results show a remarkably high conservation pattern in the region of the 32-kDa fragment of ENAM, especially its phosphorylation, glycosylation, and proteolytic sites. In primates and rodents we also identified several sites under positive selection, which could indicate recent evolutionary changes in ENAM function. Furthermore, the analysis of the unusual signal peptide provided new insights on the possible regulation of ENAM secretion, a hypothesis that should be tested in the near future. Taken together, these findings improve our understanding of ENAM evolution and provide new information that would be useful for further investigation of ENAM function as well as for the validation of mutations leading to amelogenesis imperfecta.
Collapse
Affiliation(s)
- Nawfal Al-Hashimi
- Université Pierre et Marie Curie, UMR 7138-Systématique, Adaptation, Evolution, Case 5, 7 Quai Saint-Bernard, Bâtiment A, 4e étage, 75005, Paris, France
| | | | | |
Collapse
|
26
|
Schueler MG, Swanson W, Thomas PJ, Green ED. Adaptive evolution of foundation kinetochore proteins in primates. Mol Biol Evol 2010; 27:1585-97. [PMID: 20142441 DOI: 10.1093/molbev/msq043] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Rapid evolution is a hallmark of centromeric DNA in eukaryotic genomes. Yet, the centromere itself has a conserved functional role that is mediated by the kinetochore protein complex. To broaden our understanding about both the DNA and proteins that interact at the functional centromere, we sought to gain a detailed view of the evolutionary events that have shaped the primate kinetochore. Specifically, we performed comparative mapping and sequencing of the genomic regions encompassing the genes encoding three foundation kinetochore proteins: Centromere Proteins A, B, and C (CENP-A, CENP-B, and CENP-C). A histone H3 variant, CENP-A provides the foundation of the centromere-specific nucleosome. Comparative sequence analyses of the CENP-A gene in 14 primate species revealed encoded amino-acid residues within both the histone-fold domain and the N-terminal tail that are under strong positive selection. Similar comparative analyses of CENP-C, another foundation protein essential for centromere function, identified amino-acid residues throughout the protein under positive selection in the primate lineage, including several in the centromere localization and DNA-binding regions. Perhaps surprisingly, the gene encoding CENP-B, a kinetochore protein that binds specifically to alpha-satellite DNA, was not found to be associated with signatures of positive selection. These findings point to important and distinct evolutionary forces operating on the DNA and proteins of the primate centromere.
Collapse
Affiliation(s)
- Mary G Schueler
- Genome Technology Branch, National Institutes of Health, Bethesda, MD, USA.
| | | | | | | | | |
Collapse
|
27
|
Raponi M, Baralle D. Alternative splicing: good and bad effects of translationally silent substitutions. FEBS J 2010; 277:836-40. [DOI: 10.1111/j.1742-4658.2009.07519.x] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
28
|
Genomic insights into the convergence and pathogenicity factors of Campylobacter jejuni and Campylobacter coli species. J Bacteriol 2009; 191:5824-31. [PMID: 19617370 DOI: 10.1128/jb.00519-09] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Whether or not bacteria form coherent evolutionary groups via means of genetic exchange and, hence, elicit distinct species boundaries remains an unsettled issue. A recent report implied that not only may the former be true but also, in fact, the clearly distinct Campylobacter jejuni and Campylobacter coli species may be converging as a consequence of increased interspecies gene flow fostered, presumably, by the recent invasion of an overlapping ecological niche (S. K. Sheppard, N. D. McCarthy, D. Falush, and M. C. Maiden, Science 320:237-239, 2008). We have reanalyzed the Campylobacter multilocus sequence typing database used in the previous study and found that the number of interspecies gene transfer events may actually be too infrequent to account, unequivocally, for species convergence. For instance, only 1 to 2% of the 4,507 Campylobacter isolates examined appeared to have imported gene alleles from another Campylobacter species. Furthermore, by analyzing the available Campylobacter genomic sequences, we show that although there seems to be a slightly higher number of exchanged genes between C. jejuni and C. coli relative to other comparable species ( approximately 10% versus 2 to 3% of the total genes in the genome, respectively), the function and spatial distribution in the genome of the exchanged genes are far from random, and hence, inconsistent with the species convergence hypothesis. In fact, the exchanged genes appear to be limited to a few environmentally selected cellular functions. Accordingly, these genes may represent important pathogenic determinants of pathogenic Campylobacter, and convergence of (any) two bacterial species remains to be seen.
Collapse
|
29
|
Zhang Z, Townsend JP. Maximum-likelihood model averaging to profile clustering of site types across discrete linear sequences. PLoS Comput Biol 2009; 5:e1000421. [PMID: 19557160 PMCID: PMC2695770 DOI: 10.1371/journal.pcbi.1000421] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2009] [Accepted: 05/21/2009] [Indexed: 11/19/2022] Open
Abstract
A major analytical challenge in computational biology is the detection and description of clusters of specified site types, such as polymorphic or substituted sites within DNA or protein sequences. Progress has been stymied by a lack of suitable methods to detect clusters and to estimate the extent of clustering in discrete linear sequences, particularly when there is no a priori specification of cluster size or cluster count. Here we derive and demonstrate a maximum likelihood method of hierarchical clustering. Our method incorporates a tripartite divide-and-conquer strategy that models sequence heterogeneity, delineates clusters, and yields a profile of the level of clustering associated with each site. The clustering model may be evaluated via model selection using the Akaike Information Criterion, the corrected Akaike Information Criterion, and the Bayesian Information Criterion. Furthermore, model averaging using weighted model likelihoods may be applied to incorporate model uncertainty into the profile of heterogeneity across sites. We evaluated our method by examining its performance on a number of simulated datasets as well as on empirical polymorphism data from diverse natural alleles of the Drosophila alcohol dehydrogenase gene. Our method yielded greater power for the detection of clustered sites across a breadth of parameter ranges, and achieved better accuracy and precision of estimation of clusters, than did the existing empirical cumulative distribution function statistics.
Collapse
Affiliation(s)
- Zhang Zhang
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, United States of America
| | - Jeffrey P. Townsend
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, United States of America
- * E-mail:
| |
Collapse
|
30
|
Studer RA, Robinson-Rechavi M. Large-Scale Analyses of Positive Selection Using Codon Models. Evol Biol 2009. [DOI: 10.1007/978-3-642-00952-5_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|