1
|
Nelson MG, Talavera D. Identification of coevolving positions by ancestral reconstruction. Commun Biol 2025; 8:329. [PMID: 40021815 PMCID: PMC11871020 DOI: 10.1038/s42003-025-07676-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Accepted: 02/05/2025] [Indexed: 03/03/2025] Open
Abstract
Coevolution within proteins occurs when changes in one position affect the selective pressure in another position to preserve the protein structure or function. The identification of coevolving positions within proteins remains contentious, with most methods disregarding the phylogenetic information. Here, we present a time-efficient approach for detecting coevolving pairs, which is almost perfect in terms of precision and specificity. It is based on maximum parsimony-based ancestral reconstruction followed by the identification of pairs with a depletion on separate changes when compared to their number of concurrent changes. Our analysis of a previously characterised biological dataset shows that the coevolving pairs that we identified tend to be close in the protein sequence and structure, slightly less solvent exposed and have a higher mutation rate. We also show how the ancestral reconstruction can be used to detect favourable and unfavourable amino acid combinations. Altogether, we demonstrate how this approach is essential for identifying pairs of positions with weak covariation patterns.
Collapse
Affiliation(s)
- Michael G Nelson
- Division of Cardiovascular Sciences, School of Medical Sciences, The University of Manchester, Oxford Road, Manchester, UK
| | - David Talavera
- Division of Cardiovascular Sciences, School of Medical Sciences, The University of Manchester, Oxford Road, Manchester, UK.
| |
Collapse
|
2
|
Tinh NH, Dang CC, Vinh LS. nT4X and nT4M: Novel Time Non-reversible Mixture Amino Acid Substitution Models. J Mol Evol 2025; 93:136-148. [PMID: 39832000 DOI: 10.1007/s00239-024-10230-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 12/16/2024] [Indexed: 01/22/2025]
Abstract
One of the most important and difficult challenges in the research of molecular evolution is modeling the process of amino acid substitutions. Although single-matrix models, such as the LG model, are popular, their capability to properly capture the heterogeneity of the substitution process across sites is still questioned. Several mixture models with multiple matrices have been introduced and shown to offer advantages over single-matrix models. Current general mixture models assume the reversibility of the evolutionary process, implying that substitution rates between any two amino acids are equal in both forward and backward directions. This assumption is not based on biological properties but rather on computational simplicity. The well-known hypothesis is that more realistic models can yield more accurate evolutionary inferences; therefore, our aim is to estimate more biologically realistic models. To this end, we relax the assumption of reversibility and introduce two new general non-reversible 4-matrix mixture models, called nT4M and nT4X. Using alignments from HSSP and TreeBASE databases as data, our newly estimated models outperformed all single-matrix models and almost all reversible mixture models. Moreover, the new non-reversible mixture models enable us to infer rooted trees.
Collapse
Affiliation(s)
- Nguyen Huy Tinh
- University of Engineering and Technology, Vietnam National University, 144 Xuan Thuy, Cau Giay, 10000, Hanoi, Vietnam
| | - Cuong Cao Dang
- University of Engineering and Technology, Vietnam National University, 144 Xuan Thuy, Cau Giay, 10000, Hanoi, Vietnam
| | - Le Sy Vinh
- University of Engineering and Technology, Vietnam National University, 144 Xuan Thuy, Cau Giay, 10000, Hanoi, Vietnam.
| |
Collapse
|
3
|
Abstract
Compensatory substitutions happen when one mutation is advantageously selected because it restores the loss of fitness induced by a previous deleterious mutation. How frequent such mutations occur in evolution and what is the structural and functional context permitting their emergence remain open questions. We built an atlas of intra-protein compensatory substitutions using a phylogenetic approach and a dataset of 1,630 bacterial protein families for which high-quality sequence alignments and experimentally derived protein structures were available. We identified more than 51,000 positions coevolving by the mean of predicted compensatory mutations. Using the evolutionary and structural properties of the analyzed positions, we demonstrate that compensatory mutations are scarce (typically only a few in the protein history) but widespread (the majority of proteins experienced at least one). Typical coevolving residues are evolving slowly, are located in the protein core outside secondary structure motifs, and are more often in contact than expected by chance, even after accounting for their evolutionary rate and solvent exposure. An exception to this general scheme is residues coevolving for charge compensation, which are evolving faster than noncoevolving sites, in contradiction with predictions from simple coevolutionary models, but similar to stem pairs in RNA. While sites with a significant pattern of coevolution by compensatory mutations are rare, the comparative analysis of hundreds of structures ultimately permits a better understanding of the link between the three-dimensional structure of a protein and its fitness landscape.
Collapse
Affiliation(s)
- Shilpi Chaurasia
- RG Molecular Systems Evolution, Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, August-Thienemann-Straße 2, 24306 Plön, Germany.,Excelra Knowledge Solutions Pvt Ltd, Hyderabad, India
| | - Julien Y Dutheil
- RG Molecular Systems Evolution, Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, August-Thienemann-Straße 2, 24306 Plön, Germany.,Institute of Evolution Sciences of Montpellier (ISEM), CNRS, University of Montpellier, IRD, EPHE, 34095 Montpellier, France
| |
Collapse
|
4
|
Behdenna A, Godfroid M, Petot P, Pothier J, Lambert A, Achaz G. A minimal yet flexible likelihood framework to assess correlated evolution. Syst Biol 2021; 71:823-838. [PMID: 34792608 DOI: 10.1093/sysbio/syab092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 11/04/2021] [Accepted: 11/09/2021] [Indexed: 11/14/2022] Open
Abstract
An evolutionary process is reflected in the sequence of changes of any trait (e.g. morphological or molecular) through time. Yet, a better understanding of evolution would be procured by characterizing correlated evolution, or when two or more evolutionary processes interact. Previously developed parametric methods often require significant computing time as they rely on the estimation of many parameters. Here we propose a minimal likelihood framework modelling the joint evolution of two traits on a known phylogenetic tree. The type and strength of correlated evolution is characterized by a few parameters tuning mutation rates of each trait and interdependencies between these rates. The framework can be applied to study any discrete trait or character ranging from nucleotide substitution to gain or loss of a biological function. More specifically, it can be used to 1) test for independence between two evolutionary processes, 2) identify the type of interaction between them and 3) estimate parameter values of the most likely model of interaction. In the current implementation, the method takes as input a phylogenetic tree with discrete evolutionary events mapped on its branches. The method then maximizes the likelihood for one or several chosen scenarios. The strengths and limits of the method, as well as its relative power compared to a few other methods, are assessed using both simulations and data from 16S rRNA sequences in a sample of 54 γ-enterobacteria. We show that, even with datasets of fewer than 100 species, the method performs well in parameter estimation and in evolutionary model selection.
Collapse
Affiliation(s)
- Abdelkader Behdenna
- Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS UMR 7205, Sorbonne Université, École Pratique des Hautes Études, Université des Antilles, 45 rue Buffon, 75005 Paris, France
- SMILE Group, Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, Université PSL, 11, place Marcellin Berthelot, 75005 Paris, France
- Epigene Labs, 7 Square Gabriel Fauré, 75017 Paris, France
| | - Maxime Godfroid
- SMILE Group, Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, Université PSL, 11, place Marcellin Berthelot, 75005 Paris, France
| | - Patrice Petot
- Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS UMR 7205, Sorbonne Université, École Pratique des Hautes Études, Université des Antilles, 45 rue Buffon, 75005 Paris, France
- SMILE Group, Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, Université PSL, 11, place Marcellin Berthelot, 75005 Paris, France
| | - Joël Pothier
- Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS UMR 7205, Sorbonne Université, École Pratique des Hautes Études, Université des Antilles, 45 rue Buffon, 75005 Paris, France
| | - Amaury Lambert
- SMILE Group, Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, Université PSL, 11, place Marcellin Berthelot, 75005 Paris, France
- Laboratoire de Probabilités, Statistique et Modélisation (LPSM), Sorbonne Université, CNRS UMR 8001, Université de Paris, 4, place Jussieu, 75005 Paris, France
| | - Guillaume Achaz
- Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS UMR 7205, Sorbonne Université, École Pratique des Hautes Études, Université des Antilles, 45 rue Buffon, 75005 Paris, France
- SMILE Group, Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, Université PSL, 11, place Marcellin Berthelot, 75005 Paris, France
- Éco-anthropologie, Muséum National d'Histoire Naturelle, CNRS UMR 7206, Université de Paris, place du Trocadéro, 75016 Paris, France
| |
Collapse
|
5
|
Magee AF, Hilton SK, DeWitt WS. Robustness of phylogenetic inference to model misspecification caused by pairwise epistasis. Mol Biol Evol 2021; 38:4603-4615. [PMID: 34043795 PMCID: PMC8476159 DOI: 10.1093/molbev/msab163] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Likelihood-based phylogenetic inference posits a probabilistic model of character state change along branches of a phylogenetic tree. These models typically assume statistical independence of sites in the sequence alignment. This is a restrictive assumption that facilitates computational tractability, but ignores how epistasis, the effect of genetic background on mutational effects, influences the evolution of functional sequences. We consider the effect of using a misspecified site-independent model on the accuracy of Bayesian phylogenetic inference in the setting of pairwise-site epistasis. Previous work has shown that as alignment length increases, tree reconstruction accuracy also increases. Here, we present a simulation study demonstrating that accuracy increases with alignment size even if the additional sites are epistatically coupled. We introduce an alignment-based test statistic that is a diagnostic for pairwise epistasis and can be used in posterior predictive checks.
Collapse
Affiliation(s)
- Andrew F Magee
- Departments of Biology.,Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Sarah K Hilton
- Departments of Genome Sciences, University of Washington, Seattle, USA.,Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - William S DeWitt
- Departments of Genome Sciences, University of Washington, Seattle, USA.,Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA
| |
Collapse
|
6
|
Tomiczek B, Delewski W, Nierzwicki L, Stolarska M, Grochowina I, Schilke B, Dutkiewicz R, Uzarska MA, Ciesielski SJ, Czub J, Craig EA, Marszalek J. Two-step mechanism of J-domain action in driving Hsp70 function. PLoS Comput Biol 2020; 16:e1007913. [PMID: 32479549 PMCID: PMC7289447 DOI: 10.1371/journal.pcbi.1007913] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 06/11/2020] [Accepted: 04/28/2020] [Indexed: 12/02/2022] Open
Abstract
J-domain proteins (JDPs), obligatory Hsp70 cochaperones, play critical roles in protein homeostasis. They promote key allosteric transitions that stabilize Hsp70 interaction with substrate polypeptides upon hydrolysis of its bound ATP. Although a recent crystal structure revealed the physical mode of interaction between a J-domain and an Hsp70, the structural and dynamic consequences of J-domain action once bound and how Hsp70s discriminate among its multiple JDP partners remain enigmatic. We combined free energy simulations, biochemical assays and evolutionary analyses to address these issues. Our results indicate that the invariant aspartate of the J-domain perturbs a conserved intramolecular Hsp70 network of contacts that crosses domains. This perturbation leads to destabilization of the domain-domain interface—thereby promoting the allosteric transition that triggers ATP hydrolysis. While this mechanistic step is driven by conserved residues, evolutionarily variable residues are key to initial JDP/Hsp70 recognition—via electrostatic interactions between oppositely charged surfaces. We speculate that these variable residues allow an Hsp70 to discriminate amongst JDP partners, as many of them have coevolved. Together, our data points to a two-step mode of J-domain action, a recognition stage followed by a mechanistic stage. It is well appreciated that Hsp70-based systems are the most versatile among molecular chaperones—functioning in all cell types and in all subcellular compartments. Via cyclic binding to protein substrates, Hsp70s facilitate their folding, trafficking, degradation and ability to interact with other proteins. Hsp70 function, however, depends on transient interaction with J-domain protein cochaperones that not only deliver substrates, but also activate the structural changes needed for efficient Hsp70 binding to substrate. But how J-domain proteins mechanistically function to drive these changes and how an Hsp70 discriminates among multiple J-domain partners have remained challenging central questions. Here, by using a combination of computational, evolutionary and experimental approaches, we provide evidence for a two-step mechanism. The initial recognition step involves variable residues that allow fine tuning of both the specificity and strength of J-domain protein interaction with Hsp70. The second, that is the mechanistic step, involves conserved residues that act to disrupt a conserved network of intramolecular interactions within Hsp70, thus ensuring robust activation of the structural changes necessary for effective substrate binding. We suggest that our findings are likely applicable to most Hsp70 systems.
Collapse
Affiliation(s)
- Bartlomiej Tomiczek
- Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, Gdansk, Poland
| | - Wojciech Delewski
- Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, Gdansk, Poland
- Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Lukasz Nierzwicki
- Department of Physical Chemistry, Gdansk University of Technology, Gdansk, Poland
| | - Milena Stolarska
- Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, Gdansk, Poland
| | - Igor Grochowina
- Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, Gdansk, Poland
| | - Brenda Schilke
- Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Rafal Dutkiewicz
- Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, Gdansk, Poland
| | - Marta A. Uzarska
- Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, Gdansk, Poland
| | - Szymon J. Ciesielski
- Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Jacek Czub
- Department of Physical Chemistry, Gdansk University of Technology, Gdansk, Poland
- * E-mail: (JC); (EAC); (JM)
| | - Elizabeth A. Craig
- Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- * E-mail: (JC); (EAC); (JM)
| | - Jaroslaw Marszalek
- Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, Gdansk, Poland
- Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- * E-mail: (JC); (EAC); (JM)
| |
Collapse
|
7
|
Meyer X, Dib L, Salamin N. CoevDB: a database of intramolecular coevolution among protein-coding genes of the bony vertebrates. Nucleic Acids Res 2020; 47:D50-D54. [PMID: 30357342 PMCID: PMC6324051 DOI: 10.1093/nar/gky986] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 10/10/2018] [Indexed: 01/15/2023] Open
Abstract
The study of molecular coevolution, due to its potential to identify gene regions under functional or structural constraints, has recently been subject to numerous scientific inquiries. Particular efforts have been conducted to develop methods predicting the presence of coevolution in molecular sequences. Among these methods, a few aim to model the underlying evolutionary process of coevolution, which enable to differentiate the shared history of genes to coevolution and thus improve their accuracy. However, the usage of such methods remains sparse due to their expensive computational cost and the lack of resources alleviating this issue. Here we present CoevDB (http://phylodb.unil.ch/CoevDB), a database containing the result of a large-scale analysis of intramolecular coevolution of 8201 protein-coding genes of bony vertebrates. The web interface of CoevDB gives access to the results to 800 millions of statistical tests corresponding to all the pairs of sites analyzed. Several type of queries enable users to explore the database by either targeting specific genes or by discovering genes having promising estimations of coevolution.
Collapse
Affiliation(s)
- Xavier Meyer
- Department of Computational Biology, University of Lausanne, Biophore, 1015 Lausanne, Switzerland.,Department of Integrative Biology, University of California, 3060 Valley Life Sciences Bldg, Berkeley, CA 94720-3140, USA
| | - Linda Dib
- Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
| | - Nicolas Salamin
- Department of Computational Biology, University of Lausanne, Biophore, 1015 Lausanne, Switzerland.,Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
| |
Collapse
|
8
|
Yamada K, Davydov II, Besnard G, Salamin N. Duplication history and molecular evolution of the rbcS multigene family in angiosperms. JOURNAL OF EXPERIMENTAL BOTANY 2019; 70:6127-6139. [PMID: 31498865 PMCID: PMC6859733 DOI: 10.1093/jxb/erz363] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2018] [Accepted: 08/12/2019] [Indexed: 05/22/2023]
Abstract
Ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) is considered to be the main enzyme determining the rate of photosynthesis. The small subunit of the protein, encoded by the rbcS gene, has been shown to influence the catalytic efficiency, CO2 specificity, assembly, activity, and stability of RuBisCO. However, the evolution of the rbcS gene remains poorly studied. We inferred the phylogenetic tree of the rbcS gene in angiosperms using the nucleotide sequences and found that it is composed of two lineages that may have existed before the divergence of land plants. Although almost all species sampled carry at least one copy of lineage 1, genes of lineage 2 were lost in most angiosperm species. We found the specific residues that have undergone positive selection during the evolution of the rbcS gene. We detected intensive coevolution between each rbcS gene copy and the rbcL gene encoding the large subunit of RuBisCO. We tested the role played by each rbcS gene copy on the stability of the RuBisCO protein through homology modelling. Our results showed that this evolutionary constraint could limit the level of divergence seen in the rbcS gene, which leads to the similarity among the rbcS gene copies of lineage 1 within species.
Collapse
Affiliation(s)
- Kana Yamada
- Department of Computational Biology, Génopode, University of Lausanne, Lausanne, Switzerland
| | - Iakov I Davydov
- Department of Computational Biology, Génopode, University of Lausanne, Lausanne, Switzerland
- Department of Ecology and Evolution, Biophore, University of Lausanne, Lausanne, Switzerland
| | - Guillaume Besnard
- Laboratoire Evolution et Diversité Biologique (EDB UMR5174), CNRS-UPS-IRD, University of Toulouse III, Toulouse Cedex, France
| | - Nicolas Salamin
- Department of Computational Biology, Génopode, University of Lausanne, Lausanne, Switzerland
| |
Collapse
|
9
|
Simultaneous Bayesian inference of phylogeny and molecular coevolution. Proc Natl Acad Sci U S A 2019; 116:5027-5036. [PMID: 30808804 DOI: 10.1073/pnas.1813836116] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Patterns of molecular coevolution can reveal structural and functional constraints within or among organic molecules. These patterns are better understood when considering the underlying evolutionary process, which enables us to disentangle the signal of the dependent evolution of sites (coevolution) from the effects of shared ancestry of genes. Conversely, disregarding the dependent evolution of sites when studying the history of genes negatively impacts the accuracy of the inferred phylogenetic trees. Although molecular coevolution and phylogenetic history are interdependent, analyses of the two processes are conducted separately, a choice dictated by computational convenience, but at the expense of accuracy. We present a Bayesian method and associated software to infer how many and which sites of an alignment evolve according to an independent or a pairwise dependent evolutionary process, and to simultaneously estimate the phylogenetic relationships among sequences. We validate our method on synthetic datasets and challenge our predictions of coevolution on the 16S rRNA molecule by comparing them with its known molecular structure. Finally, we assess the accuracy of phylogenetic trees inferred under the assumption of independence among sites using synthetic datasets, the 16S rRNA molecule and 10 additional alignments of protein-coding genes of eukaryotes. Our results demonstrate that inferring phylogenetic trees while accounting for dependent site evolution significantly impacts the estimates of the phylogeny and the evolutionary process.
Collapse
|
10
|
Dib L, Salamin N, Gfeller D. Polymorphic sites preferentially avoid co-evolving residues in MHC class I proteins. PLoS Comput Biol 2018; 14:e1006188. [PMID: 29782520 PMCID: PMC5983860 DOI: 10.1371/journal.pcbi.1006188] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Revised: 06/01/2018] [Accepted: 05/09/2018] [Indexed: 01/11/2023] Open
Abstract
Major histocompatibility complex class I (MHC-I) molecules are critical to adaptive immune defence mechanisms in vertebrate species and are encoded by highly polymorphic genes. Polymorphic sites are located close to the ligand-binding groove and entail MHC-I alleles with distinct binding specificities. Some efforts have been made to investigate the relationship between polymorphism and protein stability. However, less is known about the relationship between polymorphism and MHC-I co-evolutionary constraints. Using Direct Coupling Analysis (DCA) we found that co-evolution analysis accurately pinpoints structural contacts, although the protein family is restricted to vertebrates and comprises less than five hundred species, and that the co-evolutionary signal is mainly driven by inter-species changes, and not intra-species polymorphism. Moreover, we show that polymorphic sites in human preferentially avoid co-evolving residues, as well as residues involved in protein stability. These results suggest that sites displaying high polymorphism may have been selected during vertebrates’ evolution to avoid co-evolutionary constraints and thereby maximize their mutability. Amino acid co-evolution represents cases of simultaneous substitution of amino acids at distinct positions in protein sequences. In the MHC-I protein family, such co-evolution could result from either amino acid changes across species or changes within species due to the high polymorphism of MHC-I molecules. Here we show that signals captured by global methods such as Direct Coupling Analysis (DCA) to estimate co-evolution primarily result from changes across species. Moreover, our results indicate that polymorphic sites in MHC-I molecules tend to be decoupled from co-evolving ones. This could suggest that they have been selected to maximize their mutability, which is known to be functionally important to entail MHC-I molecules with a wide repertoire of binding specificities for antigen presentation.
Collapse
Affiliation(s)
- Linda Dib
- Department of Oncology, Ludwig Institute for Cancer Research, University of Lausanne, Switzerland
- Swiss Institutes of Bioinformatics, Quartier Sorge, Lausanne, Switzerland
| | - Nicolas Salamin
- Swiss Institutes of Bioinformatics, Quartier Sorge, Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - David Gfeller
- Department of Oncology, Ludwig Institute for Cancer Research, University of Lausanne, Switzerland
- Swiss Institutes of Bioinformatics, Quartier Sorge, Lausanne, Switzerland
- * E-mail:
| |
Collapse
|
11
|
Dib L, San-Jose LM, Ducrest AL, Salamin N, Roulin A. Selection on the Major Color Gene Melanocortin-1-Receptor Shaped the Evolution of the Melanocortin System Genes. Int J Mol Sci 2017; 18:ijms18122618. [PMID: 29206201 PMCID: PMC5751221 DOI: 10.3390/ijms18122618] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Revised: 11/28/2017] [Accepted: 11/29/2017] [Indexed: 12/20/2022] Open
Abstract
Modular genetic systems and networks have complex evolutionary histories shaped by selection acting on single genes as well as on their integrated function within the network. However, uncovering molecular coevolution requires the detection of coevolving sites in sequences. Detailed knowledge of the functions of each gene in the system is also necessary to identify the selective agents driving coevolution. Using recently developed computational tools, we investigated the effect of positive selection on the coevolution of ten major genes in the melanocortin system, responsible for multiple physiological functions and human diseases. Substitutions driven by positive selection at the melanocortin-1-receptor (MC1R) induced more coevolutionary changes on the system than positive selection on other genes in the system. Contrarily, selection on the highly pleiotropic POMC gene, which orchestrates the activation of the different melanocortin receptors, had the lowest coevolutionary influence. MC1R and possibly its main function, melanin pigmentation, seems to have influenced the evolution of the melanocortin system more than functions regulated by MC2-5Rs such as energy homeostasis, glucocorticoid-dependent stress and anti-inflammatory responses. Although replication in other regulatory systems is needed, this suggests that single functional aspects of a genetic network or system can be of higher importance than others in shaping coevolution among the genes that integrate it.
Collapse
Affiliation(s)
- Linda Dib
- Department of Ecology and Evolution, Biophore, University of Lausanne, 1015 Lausanne, Switzerland.
- Laboratoire de Recherche en Neuroimagerie, Centre Hospitalier Universitaire Vaudois, 1015 Lausanne, Switzerland.
| | - Luis M San-Jose
- Department of Ecology and Evolution, Biophore, University of Lausanne, 1015 Lausanne, Switzerland.
| | - Anne-Lyse Ducrest
- Department of Ecology and Evolution, Biophore, University of Lausanne, 1015 Lausanne, Switzerland.
| | - Nicolas Salamin
- Department of Ecology and Evolution, Biophore, University of Lausanne, 1015 Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland.
- Department of Computational Biology, University of Lausanne, Rue du Bugnon 27, 1011 Lausanne, Switzerland.
| | - Alexandre Roulin
- Department of Ecology and Evolution, Biophore, University of Lausanne, 1015 Lausanne, Switzerland.
- Department of Computational Biology, University of Lausanne, Rue du Bugnon 27, 1011 Lausanne, Switzerland.
| |
Collapse
|
12
|
Meyer X, Chopard B, Salamin N. Accelerating Bayesian inference for evolutionary biology models. Bioinformatics 2017; 33:669-676. [PMID: 28025203 PMCID: PMC5408833 DOI: 10.1093/bioinformatics/btw712] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2016] [Accepted: 11/11/2016] [Indexed: 12/02/2022] Open
Abstract
Motivation Bayesian inference is widely used nowadays and relies largely on Markov chain Monte Carlo (MCMC) methods. Evolutionary biology has greatly benefited from the developments of MCMC methods, but the design of more complex and realistic models and the ever growing availability of novel data is pushing the limits of the current use of these methods. Results We present a parallel Metropolis-Hastings (M-H) framework built with a novel combination of enhancements aimed towards parameter-rich and complex models. We show on a parameter-rich macroevolutionary model increases of the sampling speed up to 35 times with 32 processors when compared to a sequential M-H process. More importantly, our framework achieves up to a twentyfold faster convergence to estimate the posterior probability of phylogenetic trees using 32 processors when compared to the well-known software MrBayes for Bayesian inference of phylogenetic trees. Availability and Implementation https://bitbucket.org/XavMeyer/hogan Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xavier Meyer
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland.,Department of Computer Science, University of Geneva, 1211 Geneva, Switzerland
| | - Bastien Chopard
- Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland.,Department of Computer Science, University of Geneva, 1211 Geneva, Switzerland
| | - Nicolas Salamin
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland
| |
Collapse
|
13
|
Davydov II, Robinson-Rechavi M, Salamin N. State aggregation for fast likelihood computations in molecular evolution. Bioinformatics 2017; 33:354-362. [PMID: 28172542 PMCID: PMC5408795 DOI: 10.1093/bioinformatics/btw632] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2016] [Revised: 09/07/2016] [Accepted: 09/23/2016] [Indexed: 12/24/2022] Open
Abstract
Motivation Codon models are widely used to identify the signature of selection at the molecular level and to test for changes in selective pressure during the evolution of genes encoding proteins. The large size of the state space of the Markov processes used to model codon evolution makes it difficult to use these models with large biological datasets. We propose here to use state aggregation to reduce the state space of codon models and, thus, improve the computational performance of likelihood estimation on these models. Results We show that this heuristic speeds up the computations of the M0 and branch-site models up to 6.8 times. We also show through simulations that state aggregation does not introduce a detectable bias. We analyzed a real dataset and show that aggregation provides highly correlated predictions compared to the full likelihood computations. Finally, state aggregation is a very general approach and can be applied to any continuous-time Markov process-based model with large state space, such as amino acid and coevolution models. We therefore discuss different ways to apply state aggregation to Markov models used in phylogenetics. Availability and Implementation The heuristic is implemented in the godon package (https://bitbucket.org/Davydov/godon) and in a version of FastCodeML (https://gitlab.isb-sib.ch/phylo/fastcodeml). Contact nicolas.salamin@unil.ch Supplementary Information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Iakov I Davydov
- Department of Ecology and Evolution, Biophore, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Genopode, Quartier Sorge, Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- Department of Ecology and Evolution, Biophore, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Genopode, Quartier Sorge, Lausanne, Switzerland
| | - Nicolas Salamin
- Department of Ecology and Evolution, Biophore, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Genopode, Quartier Sorge, Lausanne, Switzerland
| |
Collapse
|
14
|
Nshogozabahizi JC, Dench J, Aris-Brosou S. Widespread Historical Contingency in Influenza Viruses. Genetics 2017; 205:409-420. [PMID: 28049709 PMCID: PMC5223518 DOI: 10.1534/genetics.116.193979] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2016] [Accepted: 11/04/2016] [Indexed: 11/18/2022] Open
Abstract
In systems biology and genomics, epistasis characterizes the impact that a substitution at a particular location in a genome can have on a substitution at another location. This phenomenon is often implicated in the evolution of drug resistance or to explain why particular "disease-causing" mutations do not have the same outcome in all individuals. Hence, uncovering these mutations and their locations in a genome is a central question in biology. However, epistasis is notoriously difficult to uncover, especially in fast-evolving organisms. Here, we present a novel statistical approach that replies on a model developed in ecology and that we adapt to analyze genetic data in fast-evolving systems such as the influenza A virus. We validate the approach using a two-pronged strategy: extensive simulations demonstrate a low-to-moderate sensitivity with excellent specificity and precision, while analyses of experimentally validated data recover known interactions, including in a eukaryotic system. We further evaluate the ability of our approach to detect correlated evolution during antigenic shifts or at the emergence of drug resistance. We show that in all cases, correlated evolution is prevalent in influenza A viruses, involving many pairs of sites linked together in chains; a hallmark of historical contingency. Strikingly, interacting sites are separated by large physical distances, which entails either long-range conformational changes or functional tradeoffs, for which we find support with the emergence of drug resistance. Our work paves a new way for the unbiased detection of epistasis in a wide range of organisms by performing whole-genome scans.
Collapse
Affiliation(s)
| | - Jonathan Dench
- Department of Biology, University of Ottawa, Ontario K1N 6N5, Canada
| | - Stéphane Aris-Brosou
- Department of Biology, University of Ottawa, Ontario K1N 6N5, Canada
- Department of Mathematics and Statistics, University of Ottawa, Ontario K1N 6N5, Canada
| |
Collapse
|
15
|
Dib L, Meyer X, Artimo P, Ioannidis V, Stockinger H, Salamin N. Coev-web: a web platform designed to simulate and evaluate coevolving positions along a phylogenetic tree. BMC Bioinformatics 2015; 16:394. [PMID: 26597459 PMCID: PMC4657261 DOI: 10.1186/s12859-015-0785-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Accepted: 10/20/2015] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND Available methods to simulate nucleotide or amino acid data typically use Markov models to simulate each position independently. These approaches are not appropriate to assess the performance of combinatorial and probabilistic methods that look for coevolving positions in nucleotide or amino acid sequences. RESULTS We have developed a web-based platform that gives a user-friendly access to two phylogenetic-based methods implementing the Coev model: the evaluation of coevolving scores and the simulation of coevolving positions. We have also extended the capabilities of the Coev model to allow for the generalization of the alphabet used in the Markov model, which can now analyse both nucleotide and amino acid data sets. The simulation of coevolving positions is novel and builds upon the developments of the Coev model. It allows user to simulate pairs of dependent nucleotide or amino acid positions. CONCLUSIONS The main focus of our paper is the new simulation method we present for coevolving positions. The implementation of this method is embedded within the web platform Coev-web that is freely accessible at http://coev.vital-it.ch/, and was tested in most modern web browsers.
Collapse
Affiliation(s)
- Linda Dib
- Department of Ecology and Evolution, University of Lausanne, Lausanne, 1015, Switzerland. .,SIB Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland. .,Laboratoire de recherche en neuroimagerie, CHUV, Lausanne, 1011, Switzerland.
| | - Xavier Meyer
- Department of Ecology and Evolution, University of Lausanne, Lausanne, 1015, Switzerland. .,SIB Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland. .,Computer Science department, University of Geneva, Carouge, 1227, Switzerland.
| | - Panu Artimo
- SIB Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland.
| | | | - Heinz Stockinger
- SIB Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland.
| | - Nicolas Salamin
- Department of Ecology and Evolution, University of Lausanne, Lausanne, 1015, Switzerland. .,SIB Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland.
| |
Collapse
|
16
|
Trollope KM, van Wyk N, Kotjomela MA, Volschenk H. Sequence and structure-based prediction of fructosyltransferase activity for functional subclassification of fungal GH32 enzymes. FEBS J 2015; 282:4782-96. [DOI: 10.1111/febs.13536] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2015] [Revised: 09/03/2015] [Accepted: 09/25/2015] [Indexed: 11/27/2022]
Affiliation(s)
- Kim M. Trollope
- Department of Microbiology; Stellenbosch University; South Africa
| | - Niël van Wyk
- Department of Microbiology; Stellenbosch University; South Africa
| | | | | |
Collapse
|
17
|
Abstract
Models of codon evolution have attracted particular interest because of their unique capabilities to detect selection forces and their high fit when applied to sequence evolution. We described here a novel approach for modeling codon evolution, which is based on Kronecker product of matrices. The 61 × 61 codon substitution rate matrix is created using Kronecker product of three 4 × 4 nucleotide substitution matrices, the equilibrium frequency of codons, and the selection rate parameter. The entities of the nucleotide substitution matrices and selection rate are considered as parameters of the model, which are optimized by maximum likelihood. Our fully mechanistic model allows the instantaneous substitution matrix between codons to be fully estimated with only 19 parameters instead of 3,721, by using the biological interdependence existing between positions within codons. We illustrate the properties of our models using computer simulations and assessed its relevance by comparing the AICc measures of our model and other models of codon evolution on simulations and a large range of empirical data sets. We show that our model fits most biological data better compared with the current codon models. Furthermore, the parameters in our model can be interpreted in a similar way as the exchangeability rates found in empirical codon models.
Collapse
Affiliation(s)
- Maryam Zaheri
- Department of Ecology and Evolution, Biophore, University of Lausanne, 1015 Lausanne, SwitzerlandSwiss Institute of Bioinformatics, Genopode, Quartier Sorge, 1015 Lausanne, Switzerland
| | - Linda Dib
- Department of Ecology and Evolution, Biophore, University of Lausanne, 1015 Lausanne, SwitzerlandSwiss Institute of Bioinformatics, Genopode, Quartier Sorge, 1015 Lausanne, Switzerland
| | - Nicolas Salamin
- Department of Ecology and Evolution, Biophore, University of Lausanne, 1015 Lausanne, SwitzerlandSwiss Institute of Bioinformatics, Genopode, Quartier Sorge, 1015 Lausanne, Switzerland
| |
Collapse
|