1
|
Ferreiro D, Branco C, Arenas M. Selection among site-dependent structurally constrained substitution models of protein evolution by approximate Bayesian computation. Bioinformatics 2024; 40:btae096. [PMID: 38374231 PMCID: PMC10914458 DOI: 10.1093/bioinformatics/btae096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 01/15/2024] [Accepted: 02/16/2024] [Indexed: 02/21/2024] Open
Abstract
MOTIVATION The selection among substitution models of molecular evolution is fundamental for obtaining accurate phylogenetic inferences. At the protein level, evolutionary analyses are traditionally based on empirical substitution models but these models make unrealistic assumptions and are being surpassed by structurally constrained substitution (SCS) models. The SCS models often consider site-dependent evolution, a process that provides realism but complicates their implementation into likelihood functions that are commonly used for substitution model selection. RESULTS We present a method to perform selection among site-dependent SCS models, also among empirical and site-dependent SCS models, based on the approximate Bayesian computation (ABC) approach and its implementation into the computational framework ProteinModelerABC. The framework implements ABC with and without regression adjustments and includes diverse empirical and site-dependent SCS models of protein evolution. Using extensive simulated data, we found that it provides selection among SCS and empirical models with acceptable accuracy. As illustrative examples, we applied the framework to analyze a variety of protein families observing that SCS models fit them better than the corresponding best-fitting empirical substitution models. AVAILABILITY AND IMPLEMENTATION ProteinModelerABC is freely available from https://github.com/DavidFerreiro/ProteinModelerABC, can run in parallel and includes a graphical user interface. The framework is distributed with detailed documentation and ready-to-use examples.
Collapse
Affiliation(s)
- David Ferreiro
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Catarina Branco
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| |
Collapse
|
2
|
Ferreiro D, Khalil R, Sousa SF, Arenas M. Substitution Models of Protein Evolution with Selection on Enzymatic Activity. Mol Biol Evol 2024; 41:msae026. [PMID: 38314876 PMCID: PMC10873502 DOI: 10.1093/molbev/msae026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 01/25/2024] [Accepted: 01/31/2024] [Indexed: 02/07/2024] Open
Abstract
Substitution models of evolution are necessary for diverse evolutionary analyses including phylogenetic tree and ancestral sequence reconstructions. At the protein level, empirical substitution models are traditionally used due to their simplicity, but they ignore the variability of substitution patterns among protein sites. Next, in order to improve the realism of the modeling of protein evolution, a series of structurally constrained substitution models were presented, but still they usually ignore constraints on the protein activity. Here, we present a substitution model of protein evolution with selection on both protein structure and enzymatic activity, and that can be applied to phylogenetics. In particular, the model considers the binding affinity of the enzyme-substrate complex as well as structural constraints that include the flexibility of structural flaps, hydrogen bonds, amino acids backbone radius of gyration, and solvent-accessible surface area that are quantified through molecular dynamics simulations. We applied the model to the HIV-1 protease and evaluated it by phylogenetic likelihood in comparison with the best-fitting empirical substitution model and a structurally constrained substitution model that ignores the enzymatic activity. We found that accounting for selection on the protein activity improves the fitting of the modeled functional regions with the real observations, especially in data with high molecular identity, which recommends considering constraints on the protein activity in the development of substitution models of evolution.
Collapse
Affiliation(s)
- David Ferreiro
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Ruqaiya Khalil
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Sergio F Sousa
- UCIBIO/REQUIMTE, BioSIM, Departamento de Biomedicina, Faculdade de Medicina da Universidade do Porto, 4200-319 Porto, Portugal
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| |
Collapse
|
3
|
Del Amparo R, Arenas M. Influence of substitution model selection on protein phylogenetic tree reconstruction. Gene 2023; 865:147336. [PMID: 36871672 DOI: 10.1016/j.gene.2023.147336] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 02/22/2023] [Accepted: 02/28/2023] [Indexed: 03/06/2023]
Abstract
Probabilistic phylogenetic tree reconstruction is traditionally performed under a best-fitting substitution model of molecular evolution previously selected according to diverse statistical criteria. Interestingly, some recent studies proposed that this procedure is unnecessary for phylogenetic tree reconstruction leading to a debate in the field. In contrast to DNA sequences, phylogenetic tree reconstruction from protein sequences is traditionally based on empirical exchangeability matrices that can differ among taxonomic groups and protein families. Considering this aspect, here we investigated the influence of selecting a substitution model of protein evolution on phylogenetic tree reconstruction by the analyses of real and simulated data. We found that phylogenetic tree reconstructions based on a selected best-fitting substitution model of protein evolution are the most accurate, in terms of topology and branch lengths, compared with those derived from substitution models with amino acid replacement matrices far from the selected best-fitting model, especially when the data has large genetic diversity. Indeed, we found that substitution models with similar amino acid replacement matrices produce similar reconstructed phylogenetic trees, suggesting the use of substitution models as similar as possible to a selected best-fitting model when the latter cannot be used. Therefore, we recommend the use of the traditional protocol of selection among substitution models of evolution for protein phylogenetic tree reconstruction.
Collapse
Affiliation(s)
- Roberto Del Amparo
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain; Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain.
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain; Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain; Galicia Sur Health Research Institute (IIS Galicia Sur), 36310 Vigo, Spain.
| |
Collapse
|
4
|
Aledo P, Aledo JC. Proteome-Wide Structural Computations Provide Insights into Empirical Amino Acid Substitution Matrices. Int J Mol Sci 2023; 24:ijms24010796. [PMID: 36614247 PMCID: PMC9821064 DOI: 10.3390/ijms24010796] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 12/24/2022] [Accepted: 12/29/2022] [Indexed: 01/04/2023] Open
Abstract
The relative contribution of mutation and selection to the amino acid substitution rates observed in empirical matrices is unclear. Herein, we present a neutral continuous fitness-stability model, inspired by the Arrhenius law (qij=aije-ΔΔGij). The model postulates that the rate of amino acid substitution (i→j) is determined by the product of a pre-exponential factor, which is influenced by the genetic code structure, and an exponential term reflecting the relative fitness of the amino acid substitutions. To assess the validity of our model, we computed changes in stability of 14,094 proteins, for which 137,073,638 in silico mutants were analyzed. These site-specific data were summarized into a 20 square matrix, whose entries, ΔΔGij, were obtained after averaging through all the sites in all the proteins. We found a significant positive correlation between these energy values and the disease-causing potential of each substitution, suggesting that the exponential term accurately summarizes the fitness effect. A remarkable observation was that amino acids that were highly destabilizing when acting as the source, tended to have little effect when acting as the destination, and vice versa (source → destination). The Arrhenius model accurately reproduced the pattern of substitution rates collected in the empirical matrices, suggesting a relevant role for the genetic code structure and a tuning role for purifying selection exerted via protein stability.
Collapse
|
5
|
Del Amparo R, González-Vázquez LD, Rodríguez-Moure L, Bastolla U, Arenas M. Consequences of Genetic Recombination on Protein Folding Stability. J Mol Evol 2023; 91:33-45. [PMID: 36463317 PMCID: PMC9849154 DOI: 10.1007/s00239-022-10080-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 11/25/2022] [Indexed: 12/05/2022]
Abstract
Genetic recombination is a common evolutionary mechanism that produces molecular diversity. However, its consequences on protein folding stability have not attracted the same attention as in the case of point mutations. Here, we studied the effects of homologous recombination on the computationally predicted protein folding stability for several protein families, finding less detrimental effects than we previously expected. Although recombination can affect multiple protein sites, we found that the fraction of recombined proteins that are eliminated by negative selection because of insufficient stability is not significantly larger than the corresponding fraction of proteins produced by mutation events. Indeed, although recombination disrupts epistatic interactions, the mean stability of recombinant proteins is not lower than that of their parents. On the other hand, the difference of stability between recombined proteins is amplified with respect to the parents, promoting phenotypic diversity. As a result, at least one third of recombined proteins present stability between those of their parents, and a substantial fraction have higher or lower stability than those of both parents. As expected, we found that parents with similar sequences tend to produce recombined proteins with stability close to that of the parents. Finally, the simulation of protein evolution along the ancestral recombination graph with empirical substitution models commonly used in phylogenetics, which ignore constraints on protein folding stability, showed that recombination favors the decrease of folding stability, supporting the convenience of adopting structurally constrained models when possible for inferences of protein evolutionary histories with recombination.
Collapse
Affiliation(s)
- Roberto Del Amparo
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain
| | - Luis Daniel González-Vázquez
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain
| | - Laura Rodríguez-Moure
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain
| | - Ugo Bastolla
- Centre for Molecular Biology Severo Ochoa (CSIC-UAM), 28049 Madrid, Spain
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain ,Galicia Sur Health Research Institute (IIS Galicia Sur), 36310 Vigo, Spain
| |
Collapse
|
6
|
Del Amparo R, Arenas M. Consequences of Substitution Model Selection on Protein Ancestral Sequence Reconstruction. Mol Biol Evol 2022; 39:6628884. [PMID: 35789388 PMCID: PMC9254009 DOI: 10.1093/molbev/msac144] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
The selection of the best-fitting substitution model of molecular evolution is a traditional step for phylogenetic inferences, including ancestral sequence reconstruction (ASR). However, a few recent studies suggested that applying this procedure does not affect the accuracy of phylogenetic tree reconstruction. Here, we revisited this debate topic by analyzing the influence of selection among substitution models of protein evolution, with focus on exchangeability matrices, on the accuracy of ASR using simulated and real data. We found that the selected best-fitting substitution model produces the most accurate ancestral sequences, especially if the data present large genetic diversity. Indeed, ancestral sequences reconstructed under substitution models with similar exchangeability matrices were similar, suggesting that if the selected best-fitting model cannot be used for the reconstruction, applying a model similar to the selected one is preferred. We conclude that selecting among substitution models of protein evolution is recommended for reconstructing accurate ancestral sequences.
Collapse
Affiliation(s)
- Roberto Del Amparo
- CINBIO, Universidade de Vigo, Vigo, Spain.,Departamento de Bioquímica, Xenética e Immunoloxía, Universidade de Vigo, Vigo, Spain
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, Vigo, Spain.,Departamento de Bioquímica, Xenética e Immunoloxía, Universidade de Vigo, Vigo, Spain.,Galicia Sur Health Research Institute (IIS Galicia Sur), Vigo, Spain
| |
Collapse
|
7
|
Abstract
The reconstruction of genetic material of ancestral organisms constitutes a powerful application of evolutionary biology. A fundamental step in this inference is the ancestral sequence reconstruction (ASR), which can be performed with diverse methodologies implemented in computer frameworks. However, most of these methodologies ignore evolutionary properties frequently observed in microbes, such as genetic recombination and complex selection processes, that can bias the traditional ASR. From a practical perspective, here I review methodologies for the reconstruction of ancestral DNA and protein sequences, with particular focus on microbes, and including biases, recommendations, and software implementations. I conclude that microbial ASR is a complex analysis that should be carefully performed and that there is a need for methods to infer more realistic ancestral microbial sequences.
Collapse
Affiliation(s)
- Miguel Arenas
- Biomedical Research Center (CINBIO), University of Vigo, Vigo, Spain.
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain.
- Galicia Sur Health Research Institute (IIS Galicia Sur), Vigo, Spain.
| |
Collapse
|
8
|
Norn C, André I, Theobald DL. A thermodynamic model of protein structure evolution explains empirical amino acid substitution matrices. Protein Sci 2021; 30:2057-2068. [PMID: 34218472 PMCID: PMC8442976 DOI: 10.1002/pro.4155] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Revised: 06/25/2021] [Accepted: 06/29/2021] [Indexed: 12/30/2022]
Abstract
Proteins evolve under a myriad of biophysical selection pressures that collectively control the patterns of amino acid substitutions. These evolutionary pressures are sufficiently consistent over time and across protein families to produce substitution patterns, summarized in global amino acid substitution matrices such as BLOSUM, JTT, WAG, and LG, which can be used to successfully detect homologs, infer phylogenies, and reconstruct ancestral sequences. Although the factors that govern the variation of amino acid substitution rates have received much attention, the influence of thermodynamic stability constraints remains unresolved. Here we develop a simple model to calculate amino acid substitution matrices from evolutionary dynamics controlled by a fitness function that reports on the thermodynamic effects of amino acid mutations in protein structures. This hybrid biophysical and evolutionary model accounts for nucleotide transition/transversion rate bias, multi‐nucleotide codon changes, the number of codons per amino acid, and thermodynamic protein stability. We find that our theoretical model accurately recapitulates the complex yet universal pattern observed in common global amino acid substitution matrices used in phylogenetics. These results suggest that selection for thermodynamically stable proteins, coupled with nucleotide mutation bias filtered by the structure of the genetic code, is the primary driver behind the global amino acid substitution patterns observed in proteins throughout the tree of life.
Collapse
Affiliation(s)
- Christoffer Norn
- Biochemistry and Structural Biology, Lund University, Lund, Sweden
| | - Ingemar André
- Biochemistry and Structural Biology, Lund University, Lund, Sweden
| | - Douglas L Theobald
- Biochemistry Department, Brandeis University, Waltham, Massachusetts, USA
| |
Collapse
|
9
|
Del Amparo R, Branco C, Arenas J, Vicens A, Arenas M. Analysis of selection in protein-coding sequences accounting for common biases. Brief Bioinform 2021; 22:6105943. [PMID: 33479739 DOI: 10.1093/bib/bbaa431] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 12/17/2020] [Accepted: 12/22/2020] [Indexed: 12/16/2022] Open
Abstract
The evolution of protein-coding genes is usually driven by selective processes, which favor some evolutionary trajectories over others, optimizing the subsequent protein stability and activity. The analysis of selection in this type of genetic data is broadly performed with the metric nonsynonymous/synonymous substitution rate ratio (dN/dS). However, most of the well-established methodologies to estimate this metric make crucial assumptions, such as lack of recombination or invariable codon frequencies along genes, which can bias the estimation. Here, we review the most relevant biases in the dN/dS estimation and provide a detailed guide to estimate this metric using state-of-the-art procedures that account for such biases, along with illustrative practical examples and recommendations. We also discuss the traditional interpretation of the estimated dN/dS emphasizing the importance of considering complementary biological information such as the role of the observed substitutions on the stability and function of proteins. This review is oriented to help evolutionary biologists that aim to accurately estimate selection in protein-coding sequences.
Collapse
Affiliation(s)
- Roberto Del Amparo
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Catarina Branco
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Jesús Arenas
- Unit of Microbiology and Immunology, University of Zaragoza, 50013 Zaragoza, Spain
| | - Alberto Vicens
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| |
Collapse
|
10
|
Abstract
Proteins are commonly used as molecular targets against pathogens such as viruses and bacteria. However, pathogens can evolve rapidly permitting their populations to increase in protein diversity over time and thus escape to the activity of a molecular therapy. Subsequently, in order to design more durable and robust therapies as well as to understand viral evolution in a host and subsequent transmission, it is central to understand the evolution of pathogen proteins. This understanding can enable the detection of protein regions that can be potential targets for therapies and predict the emergence of molecular resistance against therapies. In this direction, two articles published recently in the Journal of Molecular Evolution investigated the evolution of proteomes of diverse flaviviruses, including Zika virus, Dengue virus and West Nile virus. Here I discuss the importance of considering the evolution of viral proteins, with the use of as realistic as possible models and methods that mimic protein evolution, to improve the design of antiviral therapies.
Collapse
|
11
|
Arenas M, Bastolla U. ProtASR2: Ancestral reconstruction of protein sequences accounting for folding stability. Methods Ecol Evol 2020. [DOI: 10.1111/2041-210x.13341] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Miguel Arenas
- Department of Biochemistry, Genetics and Immunology University of Vigo Vigo Spain
- Biomedical Research Center (CINBIO) University of Vigo Vigo Spain
| | - Ugo Bastolla
- Bioinformatics Unit Centre for Molecular Biology Severo Ochoa (CSIC) Madrid Spain
| |
Collapse
|
12
|
Del Amparo R, Vicens A, Arenas M. The influence of heterogeneous codon frequencies along sequences on the estimation of molecular adaptation. Bioinformatics 2020; 36:430-436. [PMID: 31304972 DOI: 10.1093/bioinformatics/btz558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Revised: 07/08/2019] [Accepted: 07/11/2019] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The nonsynonymous/synonymous substitution rate ratio (dN/dS) is a commonly used parameter to quantify molecular adaptation in protein-coding data. It is known that the estimation of dN/dS can be biased if some evolutionary processes are ignored. In this concern, common ML methods to estimate dN/dS assume invariable codon frequencies among sites, despite this characteristic is rare in nature, and it could bias the estimation of this parameter. RESULTS Here we studied the influence of variable codon frequencies among genetic regions on the estimation of dN/dS. We explored scenarios varying the number of genetic regions that differ in codon frequencies, the amount of variability of codon frequencies among regions and the nucleotide frequencies at each codon position among regions. We found that ignoring heterogeneous codon frequencies among regions overall leads to underestimation of dN/dS and the bias increases with the level of heterogeneity of codon frequencies. Interestingly, we also found that varying nucleotide frequencies among regions at the first or second codon position leads to underestimation of dN/dS while variation at the third codon position leads to overestimation of dN/dS. Next, we present a methodology to reduce this bias based on the analysis of partitions presenting similar codon frequencies and we applied it to analyze four real datasets. We conclude that accounting for heterogeneous codon frequencies along sequences is required to obtain realistic estimates of molecular adaptation through this relevant evolutionary parameter. AVAILABILITY AND IMPLEMENTATION The applied frameworks for the computer simulations of protein-coding data and estimation of molecular adaptation are SGWE and PAML, respectively. Both are publicly available and referenced in the study. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Roberto Del Amparo
- Department of Biochemistry, Genetics and Immunology.,Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain
| | - Alberto Vicens
- Department of Biochemistry, Genetics and Immunology.,Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- Department of Biochemistry, Genetics and Immunology.,Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain
| |
Collapse
|
13
|
Kuzminkova AA, Sokol AD, Ushakova KE, Popadin KY, Gunbin KV. mtProtEvol: the resource presenting molecular evolution analysis of proteins involved in the function of Vertebrate mitochondria. BMC Evol Biol 2019; 19:47. [PMID: 30813887 PMCID: PMC6391778 DOI: 10.1186/s12862-019-1371-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Heterotachy is the variation in the evolutionary rate of aligned sites in different parts of the phylogenetic tree. It occurs mainly due to epistatic interactions among the substitutions, which are highly complex and make it difficult to study protein evolution. The vast majority of computational evolutionary approaches for studying these epistatic interactions or their evolutionary consequences in proteins require high computational time. However, recently, it has been shown that the evolution of residue solvent accessibility (RSA) is tightly linked with changes in protein fitness and intra-protein epistatic interactions. This provides a computationally fast alternative, based on comparison of evolutionary rates of amino acid replacements with the rates of RSA evolutionary changes in order to recognize any shifts in epistatic interaction. RESULTS Based on RSA information, data randomization and phylogenetic approaches, we constructed a software pipeline, which can be used to analyze the evolutionary consequences of intra-protein epistatic interactions with relatively low computational time. We analyzed the evolution of 512 protein families tightly linked to mitochondrial function in Vertebrates and created "mtProtEvol", the web resource with data on protein evolution. In strict agreement with lifespan and metabolic rate data, we demonstrated that different functional categories of mitochondria-related proteins subjected to selection on accelerated and decelerated RSA rates in rodents and primates. For example, accelerated RSA evolution in rodents has been shown for Krebs cycle enzymes, respiratory chain and reactive oxygen species metabolism, while in primates these functions are stress-response, translation and mtDNA integrity. Decelerated RSA evolution in rodents has been demonstrated for translational machinery and oxidative stress response components. CONCLUSIONS mtProtEvol is an interactive resource focused on evolutionary analysis of epistatic interactions in protein families involved in Vertebrata mitochondria function and available at http://bioinfodbs.kantiana.ru/mtProtEvol /. This resource and the devised software pipeline may be useful tool for researchers in area of protein evolution.
Collapse
Affiliation(s)
- Anastasia A. Kuzminkova
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
| | - Anastasia D. Sokol
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
| | - Kristina E. Ushakova
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
| | - Konstantin Yu. Popadin
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Konstantin V. Gunbin
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
- Center of Brain Neurobiology and Neurogenetics, Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
- Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|