1
|
Hermans P, Tsishyn M, Schwersensky M, Rooman M, Pucci F. Exploring Evolution to Uncover Insights Into Protein Mutational Stability. Mol Biol Evol 2025; 42:msae267. [PMID: 39786559 PMCID: PMC11721782 DOI: 10.1093/molbev/msae267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 11/27/2024] [Accepted: 11/28/2024] [Indexed: 01/12/2025] Open
Abstract
Determining the impact of mutations on the thermodynamic stability of proteins is essential for a wide range of applications such as rational protein design and genetic variant interpretation. Since protein stability is a major driver of evolution, evolutionary data are often used to guide stability predictions. Many state-of-the-art stability predictors extract evolutionary information from multiple sequence alignments of proteins homologous to a query protein, and leverage it to predict the effects of mutations on protein stability. To evaluate the power and the limitations of such methods, we used the massive amount of stability data recently obtained by deep mutational scanning to study how best to construct multiple sequence alignments and optimally extract evolutionary information from them. We tested different evolutionary models and found that, unexpectedly, independent-site models achieve similar accuracy to more complex epistatic models. A detailed analysis of the latter models suggests that their inference often results in noisy couplings, which do not appear to add predictive power over the independent-site contribution, at least in the context of stability prediction. Interestingly, by combining any of the evolutionary features with a simple structural feature, the relative solvent accessibility of the mutated residue, we achieved similar prediction accuracy to supervised, machine learning-based, protein stability change predictors. Our results provide new insights into the relationship between protein evolution and stability, and show how evolutionary information can be exploited to improve the performance of mutational stability prediction.
Collapse
Affiliation(s)
- Pauline Hermans
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels 1050, Belgium
| | - Matsvei Tsishyn
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels 1050, Belgium
| | - Martin Schwersensky
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels 1050, Belgium
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels 1050, Belgium
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels 1050, Belgium
| |
Collapse
|
2
|
Nagar N, Tubiana J, Loewenthal G, Wolfson HJ, Ben Tal N, Pupko T. EvoRator2: Predicting Site-specific Amino Acid Substitutions Based on Protein Structural Information Using Deep Learning. J Mol Biol 2023; 435:168155. [PMID: 37356902 DOI: 10.1016/j.jmb.2023.168155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 05/13/2023] [Accepted: 05/17/2023] [Indexed: 06/27/2023]
Abstract
Multiple sequence alignments (MSAs) are the workhorse of molecular evolution and structural biology research. From MSAs, the amino acids that are tolerated at each site during protein evolution can be inferred. However, little is known regarding the repertoire of tolerated amino acids in proteins when only a few or no sequence homologs are available, such as orphan and de novo designed proteins. Here we present EvoRator2, a deep-learning algorithm trained on over 15,000 protein structures that can predict which amino acids are tolerated at any given site, based exclusively on protein structural information mined from atomic coordinate files. We show that EvoRator2 obtained satisfying results for the prediction of position-weighted scoring matrices (PSSM). We further show that EvoRator2 obtained near state-of-the-art performance on proteins with high quality structures in predicting the effect of mutations in deep mutation scanning (DMS) experiments and that for certain DMS targets, EvoRator2 outperformed state-of-the-art methods. We also show that by combining EvoRator2's predictions with those obtained by a state-of-the-art deep-learning method that accounts for the information in the MSA, the prediction of the effect of mutation in DMS experiments was improved in terms of both accuracy and stability. EvoRator2 is designed to predict which amino-acid substitutions are tolerated in such proteins without many homologous sequences, including orphan or de novo designed proteins. We implemented our approach in the EvoRator web server (https://evorator.tau.ac.il).
Collapse
Affiliation(s)
- Natan Nagar
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Jérôme Tubiana
- Blavatnik School of Computer Science, Raymond & Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Gil Loewenthal
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Haim J Wolfson
- Blavatnik School of Computer Science, Raymond & Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Nir Ben Tal
- School of Neurobiology, Biochemistry & Biophysics, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.
| |
Collapse
|
3
|
Artificial intelligence challenges for predicting the impact of mutations on protein stability. Curr Opin Struct Biol 2021; 72:161-168. [PMID: 34922207 DOI: 10.1016/j.sbi.2021.11.001] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 09/15/2021] [Accepted: 11/08/2021] [Indexed: 01/17/2023]
Abstract
Stability is a key ingredient of protein fitness, and its modification through targeted mutations has applications in various fields, such as protein engineering, drug design, and deleterious variant interpretation. Many studies have been devoted over the past decades to build new, more effective methods for predicting the impact of mutations on protein stability based on the latest developments in artificial intelligence. We discuss their features, algorithms, computational efficiency, and accuracy estimated on an independent test set. We focus on a critical analysis of their limitations, the recurrent biases toward the training set, their generalizability, and interpretability. We found that the accuracy of the predictors has stagnated at around 1 kcal/mol for over 15 years. We conclude by discussing the challenges that need to be addressed to reach improved performance.
Collapse
|
4
|
Morales AC, Rice AM, Ho AT, Mordstein C, Mühlhausen S, Watson S, Cano L, Young B, Kudla G, Hurst LD. Causes and Consequences of Purifying Selection on SARS-CoV-2. Genome Biol Evol 2021; 13:evab196. [PMID: 34427640 PMCID: PMC8504154 DOI: 10.1093/gbe/evab196] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/19/2021] [Indexed: 02/06/2023] Open
Abstract
Owing to a lag between a deleterious mutation's appearance and its selective removal, gold-standard methods for mutation rate estimation assume no meaningful loss of mutations between parents and offspring. Indeed, from analysis of closely related lineages, in SARS-CoV-2, the Ka/Ks ratio was previously estimated as 1.008, suggesting no within-host selection. By contrast, we find a higher number of observed SNPs at 4-fold degenerate sites than elsewhere and, allowing for the virus's complex mutational and compositional biases, estimate that the mutation rate is at least 49-67% higher than would be estimated based on the rate of appearance of variants in sampled genomes. Given the high Ka/Ks one might assume that the majority of such intrahost selection is the purging of nonsense mutations. However, we estimate that selection against nonsense mutations accounts for only ∼10% of all the "missing" mutations. Instead, classical protein-level selective filters (against chemically disparate amino acids and those predicted to disrupt protein functionality) account for many missing mutations. It is less obvious why for an intracellular parasite, amino acid cost parameters, notably amino acid decay rate, is also significant. Perhaps most surprisingly, we also find evidence for real-time selection against synonymous mutations that move codon usage away from that of humans. We conclude that there is common intrahost selection on SARS-CoV-2 that acts on nonsense, missense, and possibly synonymous mutations. This has implications for methods of mutation rate estimation, for determining times to common ancestry and the potential for intrahost evolution including vaccine escape.
Collapse
Affiliation(s)
- Atahualpa Castillo Morales
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, United Kingdom
| | - Alan M Rice
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, United Kingdom
| | - Alexander T Ho
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, United Kingdom
| | - Christine Mordstein
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, United Kingdom
- MRC Human Genetics Unit, Institute for Genetics and Molecular Medicine, The University of Edinburgh, United Kingdom
- Department of Molecular Biology and Genetics, Aarhus University, Denmark
| | - Stefanie Mühlhausen
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, United Kingdom
| | - Samir Watson
- Department of Molecular Biology and Genetics, Aarhus University, Denmark
| | - Laura Cano
- MRC Human Genetics Unit, Institute for Genetics and Molecular Medicine, The University of Edinburgh, United Kingdom
| | - Bethan Young
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, United Kingdom
- MRC Human Genetics Unit, Institute for Genetics and Molecular Medicine, The University of Edinburgh, United Kingdom
| | - Grzegorz Kudla
- MRC Human Genetics Unit, Institute for Genetics and Molecular Medicine, The University of Edinburgh, United Kingdom
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, United Kingdom
| |
Collapse
|
5
|
The Mutational Robustness of the Genetic Code and Codon Usage in Environmental Context: A Non-Extremophilic Preference? Life (Basel) 2021; 11:life11080773. [PMID: 34440517 PMCID: PMC8398314 DOI: 10.3390/life11080773] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 07/23/2021] [Accepted: 07/28/2021] [Indexed: 12/12/2022] Open
Abstract
The genetic code was evolved, to some extent, to minimize the effects of mutations. The effects of mutations depend on the amino acid repertoire, the structure of the genetic code and frequencies of amino acids in proteomes. The amino acid compositions of proteins and corresponding codon usages are still under selection, which allows us to ask what kind of environment the standard genetic code is adapted to. Using simple computational models and comprehensive datasets comprising genomic and environmental data from all three domains of Life, we estimate the expected severity of non-synonymous genomic mutations in proteins, measured by the change in amino acid physicochemical properties. We show that the fidelity in these physicochemical properties is expected to deteriorate with extremophilic codon usages, especially in thermophiles. These findings suggest that the genetic code performs better under non-extremophilic conditions, which not only explains the low substitution rates encountered in halophiles and thermophiles but the revealed relationship between the genetic code and habitat allows us to ponder on earlier phases in the history of Life.
Collapse
|
6
|
Jeong HB, Kim HK. Increased mRNA Stability and Expression Level of Croceibacter atlanticus Lipase Gene Developed through Molecular Evolution Process. J Microbiol Biotechnol 2021; 31:882-889. [PMID: 34024893 PMCID: PMC9706013 DOI: 10.4014/jmb.2103.03011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 04/13/2021] [Accepted: 05/17/2021] [Indexed: 12/15/2022]
Abstract
In order to use an enzyme industrially, it is necessary to increase the activity of the enzyme and optimize the reaction characteristics through molecular evolution techniques. We used the error-prone PCR method to improve the reaction characteristics of LipCA lipase discovered in Antarctic Croceibacter atlanticus. Recombinant Escherichia coli colonies showing large halo zones were selected in tributyrin-containing medium. The lipase activity of one mutant strain (M3-1) was significantly increased, compared to the wild-type (WT) strain. M3-1 strain produced about three times more lipase enzyme than did WT strain. After confirming the nucleotide sequence of the M3-1 gene to be different from that of the WT gene by four bases (73, 381, 756, and 822), the secondary structures of WT and M3-1 mRNA were predicted and compared by RNAfold web program. Compared to the mean free energy (MFE) of WT mRNA, that of M3-1 mRNA was lowered by 4.4 kcal/mol, and the MFE value was significantly lowered by mutations of bases 73 and 756. Site-directed mutagenesis was performed to find out which of the four base mutations actually affected the enzyme expression level. Among them, one mutant enzyme production decreased as WT enzyme production when the base 73 was changed (T→C). These results show that one base change at position 73 can significantly affect protein expression level, and demonstrate that changing the mRNA sequence can increase the stability of mRNA, and can increase the production of foreign protein in E. coli.
Collapse
Affiliation(s)
- Han Byeol Jeong
- Division of Biotechnology, The Catholic University of Korea, Bucheon 14662, Republic of Korea
| | - Hyung Kwoun Kim
- Division of Biotechnology, The Catholic University of Korea, Bucheon 14662, Republic of Korea,Corresponding author Phone: +82-2-2164-4890 Fax: +82-2-2164-4865 E-mail:
| |
Collapse
|
7
|
Bellacchio E. Mutations Causing Mild or No Structural Damage in Interfaces of Multimerization of the Fibrinogen γ-Module More Likely Confer Negative Dominant Behaviors. Int J Mol Sci 2020; 21:ijms21239016. [PMID: 33260935 PMCID: PMC7730044 DOI: 10.3390/ijms21239016] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Revised: 11/23/2020] [Accepted: 11/25/2020] [Indexed: 02/02/2023] Open
Abstract
Different pathogenic variants in the same protein or even within the same domain of a protein may differ in their patterns of disease inheritance, with some of the variants behaving as negative dominant and others as autosomal recessive mutations. Here is presented a structural analysis and comparison of the molecular characteristics of the sites in fibrinogen γ-module, a fibrinogen component critical in multimerization processes, targeted by pathogenic variants (HGMD database) and by variants found in the healthy population (gnomAD database). The main result of this study is the identification of the molecular pathogenic mechanisms defining which pattern of disease inheritance is selected by mutations at the crossroad of autosomal recessive and negative dominant modalities. The observations in this analysis also warn about the possibility that several variants reported in the non-pathogenic gnomAD database might indeed be a hidden source of diseases with autosomal recessive inheritance or requiring a combination with other disease-causing mutations. Disease presentation might remain mostly unrevealed simply because the very low variant frequency rarely results in biallelic pathogenic mutations or the coupling with mutations in other genes contributing to the same disease. The results here presented provide hints for a deeper search of pathogenic mechanisms and modalities of disease inheritance for protein mutants participating in multimerization phenomena.
Collapse
Affiliation(s)
- Emanuele Bellacchio
- Area di Ricerca Genetica e Malattie Rare, Bambino Gesù Children's Hospital, IRCCS, Piazza Sant'Onofrio 4, 00165 Rome, Italy
| |
Collapse
|