1
|
Zhang Y, Deng J, Dong M, Wu J, Zhao Q, Gao X, Xiong D. PILOT: Deep Siamese network with hybrid attention improves prediction of mutation impact on protein stability. Neural Netw 2025; 188:107476. [PMID: 40252373 DOI: 10.1016/j.neunet.2025.107476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2024] [Revised: 02/13/2025] [Accepted: 04/07/2025] [Indexed: 04/21/2025]
Abstract
Evaluating the mutation impact on protein stability (ΔΔG) is essential in the study of protein engineering and understanding molecular mechanisms of disease-associated mutations. Here, we propose a novel deep learning framework, PILOT, for improved prediction of ΔΔG using a Siamese network with hybrid attention mechanism. The PILOT framework leverages multiple attention modules to effectively extract representations for amino acids, atoms, and protein sequences, respectively. This approach significantly ensures the deep fusion of structural information at both residue and atom levels, the seamless integration of structural and sequence representations, and the effective capture of both long-range and short-range dependencies among amino acids. Our extensive evaluations demonstrate that PILOT greatly outperforms other state-of-the-art methods. We also showcase that PILOT identifies exceptional patterns for different mutation types. Moreover, we illustrate the clinical applicability of PILOT in highlighting pathogenic variants from benign variants and VUS (variants of uncertain significance), and distinguishing de novo mutations in disease cases and controls. In summary, PILOT presents a robust deep learning tool that could offer significant insights into drug design, medical applications, and protein engineering studies.
Collapse
Affiliation(s)
- Yuan Zhang
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan 411105, China
| | - Junsheng Deng
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan 411105, China
| | - Mingyuan Dong
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan 411105, China
| | - Jiafeng Wu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan 411105, China
| | - Qiuye Zhao
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA.
| | - Xieping Gao
- Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University, Changsha 410081, China.
| | - Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA.
| |
Collapse
|
2
|
Delgado J, Reche R, Cianferoni D, Orlando G, van der Kant R, Rousseau F, Schymkowitz J, Serrano L. FoldX force field revisited, an improved version. Bioinformatics 2025; 41:btaf064. [PMID: 39913370 PMCID: PMC11879241 DOI: 10.1093/bioinformatics/btaf064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 01/23/2025] [Accepted: 02/04/2025] [Indexed: 03/06/2025] Open
Abstract
MOTIVATION The FoldX force field was originally validated with a database of 1000 mutants at a time when there were few high-resolution structures. Here, we have manually curated a database of 5556 mutants affecting protein stability, resulting in 2484 highly confident mutations denominated FoldX stability dataset (FSD), represented in non-redundant X-ray structures with <2.5 Å resolution, not involving duplicates, metals, or prosthetic groups. Using this database, we have created a new version of the FoldX force field by introducing pi stacking, pH dependency for all charged residues, improving aromatic-aromatic interactions, modifying the Ncap contribution and α-helix dipole, recalibrating the side-chain entropy of methionine, adjusting the H-bond parameters, and modifying the solvation contribution of tryptophan and others. RESULTS These changes have led to significant improvements for the prediction of specific mutants involving the above residues/interactions and a statistically significant increase of FoldX predictions, as well as for the majority of the 20 aa. Removing all training sets data from FSD [Validation FoldX Stability Dataset (VFSD) dataset] resulted in improved predictions from R = 0.693 (RMSE = 1.277 kcal/mol) to R = 0.706 (RMSE = 1.252 kcal/mol) when compared with the previously released version. FoldX achieves 95% accuracy considering an error of ±0.85 kcal/mol in prediction and an area under the curve = 0.78 for the VFSD, predicting the sign of the energy change upon mutation. AVAILABILITY AND IMPLEMENTATION FoldX versions 4.1 and 5.1 are freely available for academics at https://foldxsuite.crg.eu/.
Collapse
Affiliation(s)
- Javier Delgado
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Raul Reche
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Damiano Cianferoni
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Gabriele Orlando
- Switch Laboratory, VIB Center for Brain and Disease Research, VIB, 3000 Leuven, Belgium
- Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, 3000 Leuven, Belgium
- Switch Laboratory, VIB Center for AI & Computational Biology, VIB, 3000 Leuven, Belgium
| | - Rob van der Kant
- Switch Laboratory, VIB Center for Brain and Disease Research, VIB, 3000 Leuven, Belgium
- Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, 3000 Leuven, Belgium
- Switch Laboratory, VIB Center for AI & Computational Biology, VIB, 3000 Leuven, Belgium
| | - Frederic Rousseau
- Switch Laboratory, VIB Center for Brain and Disease Research, VIB, 3000 Leuven, Belgium
- Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, 3000 Leuven, Belgium
- Switch Laboratory, VIB Center for AI & Computational Biology, VIB, 3000 Leuven, Belgium
| | - Joost Schymkowitz
- Switch Laboratory, VIB Center for Brain and Disease Research, VIB, 3000 Leuven, Belgium
- Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, 3000 Leuven, Belgium
- Switch Laboratory, VIB Center for AI & Computational Biology, VIB, 3000 Leuven, Belgium
| | - Luis Serrano
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
- Universitat Pompeu Fabra (UPF), Barcelona 08002, Spain
- ICREA, Pg. Lluis Companys 23, Barcelona 08010, Spain
| |
Collapse
|
3
|
Xu Y, Liu D, Gong H. Improving the prediction of protein stability changes upon mutations by geometric learning and a pre-training strategy. NATURE COMPUTATIONAL SCIENCE 2024; 4:840-850. [PMID: 39455825 DOI: 10.1038/s43588-024-00716-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 10/03/2024] [Indexed: 10/28/2024]
Abstract
Accurate prediction of protein mutation effects is of great importance in protein engineering and design. Here we propose GeoStab-suite, a suite of three geometric learning-based models-GeoFitness, GeoDDG and GeoDTm-for the prediction of fitness score, ΔΔG and ΔTm of a protein upon mutations, respectively. GeoFitness engages a specialized loss function to allow supervised training of a unified model using the large amount of multi-labeled fitness data in the deep mutational scanning database. To further improve the downstream tasks of ΔΔG and ΔTm prediction, the encoder of GeoFitness is reutilized as a pre-trained module in GeoDDG and GeoDTm to overcome the challenge of lacking sufficient labeled data. This pre-training strategy, in combination with data expansion, markedly improves model performance and generalizability. In the benchmark test, GeoDDG and GeoDTm outperform the other state-of-the-art methods by at least 30% and 70%, respectively, in terms of the Spearman correlation coefficient.
Collapse
Affiliation(s)
- Yunxin Xu
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Frontier Research Center for Biological Structure, Tsinghua University, Beijing, China
| | - Di Liu
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Frontier Research Center for Biological Structure, Tsinghua University, Beijing, China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China.
- Beijing Frontier Research Center for Biological Structure, Tsinghua University, Beijing, China.
| |
Collapse
|
4
|
Velecký J, Berezný M, Musil M, Damborsky J, Bednar D, Mazurenko S. BenchStab: a tool for automated querying of web-based stability predictors. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae553. [PMID: 39259175 PMCID: PMC11427696 DOI: 10.1093/bioinformatics/btae553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Revised: 08/02/2024] [Accepted: 09/10/2024] [Indexed: 09/12/2024]
Abstract
SUMMARY Protein design requires information about how mutations affect protein stability. Many web-based predictors are available for this purpose, yet comparing them or using them en masse is difficult. Here, we present BenchStab, a console tool/Python package for easy and quick execution of 19 predictors and result collection on a list of mutants. Moreover, the tool is easily extensible with additional predictors. We created an independent dataset derived from the FireProtDB and evaluated 24 different prediction methods. AVAILABILITY AND IMPLEMENTATION BenchStab is an open-source Python package available at https://github.com/loschmidt/BenchStab with a detailed README and example usage at https://loschmidt.chemi.muni.cz/benchstab. The BenchStab dataset is available on Zenodo: https://zenodo.org/records/10637728.
Collapse
Affiliation(s)
- Jan Velecký
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
| | - Matej Berezný
- Department of Information Systems, Faculty of Information Technology, Brno University of Technology, 612 00 Brno, Czech Republic
| | - Milos Musil
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- Department of Information Systems, Faculty of Information Technology, Brno University of Technology, 612 00 Brno, Czech Republic
- International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| | - David Bednar
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| | - Stanislav Mazurenko
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| |
Collapse
|
5
|
Bernett J, Blumenthal DB, Grimm DG, Haselbeck F, Joeres R, Kalinina OV, List M. Guiding questions to avoid data leakage in biological machine learning applications. Nat Methods 2024; 21:1444-1453. [PMID: 39122953 DOI: 10.1038/s41592-024-02362-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 06/26/2024] [Indexed: 08/12/2024]
Abstract
Machine learning methods for extracting patterns from high-dimensional data are very important in the biological sciences. However, in certain cases, real-world applications cannot confirm the reported prediction performance. One of the main reasons for this is data leakage, which can be seen as the illicit sharing of information between the training data and the test data, resulting in performance estimates that are far better than the performance observed in the intended application scenario. Data leakage can be difficult to detect in biological datasets due to their complex dependencies. With this in mind, we present seven questions that should be asked to prevent data leakage when constructing machine learning models in biological domains. We illustrate the usefulness of our questions by applying them to nontrivial examples. Our goal is to raise awareness of potential data leakage problems and to promote robust and reproducible machine learning-based research in biology.
Collapse
Affiliation(s)
- Judith Bernett
- TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - David B Blumenthal
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
| | - Dominik G Grimm
- TUM Campus Straubing for Biotechnology and Sustainability, Technical University of Munich, Straubing, Germany.
- Bioinformatics, Weihenstephan-Triesdorf University of Applied Sciences, Straubing, Germany.
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
| | - Florian Haselbeck
- TUM Campus Straubing for Biotechnology and Sustainability, Technical University of Munich, Straubing, Germany
- Bioinformatics, Weihenstephan-Triesdorf University of Applied Sciences, Straubing, Germany
- Smart Farming, Weihenstephan-Triesdorf University of Applied Sciences, Freising, Germany
| | - Roman Joeres
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden
- Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
| | - Olga V Kalinina
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany.
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany.
- Medical Faculty, Saarland University, Homburg, Germany.
| | - Markus List
- TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
- Munich Data Science Institute (MDSI), Technical University of Munich, Garching, Germany.
| |
Collapse
|
6
|
Cisneros AF, Nielly-Thibault L, Mallik S, Levy ED, Landry CR. Mutational biases favor complexity increases in protein interaction networks after gene duplication. Mol Syst Biol 2024; 20:549-572. [PMID: 38499674 PMCID: PMC11066126 DOI: 10.1038/s44320-024-00030-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 02/27/2024] [Accepted: 02/28/2024] [Indexed: 03/20/2024] Open
Abstract
Biological systems can gain complexity over time. While some of these transitions are likely driven by natural selection, the extent to which they occur without providing an adaptive benefit is unknown. At the molecular level, one example is heteromeric complexes replacing homomeric ones following gene duplication. Here, we build a biophysical model and simulate the evolution of homodimers and heterodimers following gene duplication using distributions of mutational effects inferred from available protein structures. We keep the specific activity of each dimer identical, so their concentrations drift neutrally without new functions. We show that for more than 60% of tested dimer structures, the relative concentration of the heteromer increases over time due to mutational biases that favor the heterodimer. However, allowing mutational effects on synthesis rates and differences in the specific activity of homo- and heterodimers can limit or reverse the observed bias toward heterodimers. Our results show that the accumulation of more complex protein quaternary structures is likely under neutral evolution, and that natural selection would be needed to reverse this tendency.
Collapse
Affiliation(s)
- Angel F Cisneros
- Département de biochimie, de microbiologie et de bio-informatique, Faculté des sciences et de génie, Université Laval, G1V 0A6, Québec, Canada
- Institut de biologie intégrative et des systèmes, Université Laval, G1V 0A6, Québec, Canada
- PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, G1V 0A6, Québec, Canada
- Centre de recherche sur les données massives, Université Laval, G1V 0A6, Québec, Canada
- Department of Chemical and Structural Biology, Weizmann Institute of Science, 7610001, Rehovot, Israel
| | - Lou Nielly-Thibault
- Institut de biologie intégrative et des systèmes, Université Laval, G1V 0A6, Québec, Canada
- PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, G1V 0A6, Québec, Canada
- Centre de recherche sur les données massives, Université Laval, G1V 0A6, Québec, Canada
- Département de biologie, Faculté des sciences et de génie, Université Laval, G1V 0A6, Québec, Canada
| | - Saurav Mallik
- Department of Chemical and Structural Biology, Weizmann Institute of Science, 7610001, Rehovot, Israel
| | - Emmanuel D Levy
- Department of Chemical and Structural Biology, Weizmann Institute of Science, 7610001, Rehovot, Israel
| | - Christian R Landry
- Département de biochimie, de microbiologie et de bio-informatique, Faculté des sciences et de génie, Université Laval, G1V 0A6, Québec, Canada.
- Institut de biologie intégrative et des systèmes, Université Laval, G1V 0A6, Québec, Canada.
- PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, G1V 0A6, Québec, Canada.
- Centre de recherche sur les données massives, Université Laval, G1V 0A6, Québec, Canada.
- Département de biologie, Faculté des sciences et de génie, Université Laval, G1V 0A6, Québec, Canada.
| |
Collapse
|
7
|
Lemieux P, Bradley D, Dubé AK, Dionne U, Landry CR. Dissection of the role of a Src homology 3 domain in the evolution of binding preference of paralogous proteins. Genetics 2024; 226:iyad175. [PMID: 37793087 PMCID: PMC10763533 DOI: 10.1093/genetics/iyad175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 07/07/2023] [Accepted: 08/07/2023] [Indexed: 10/06/2023] Open
Abstract
Protein-protein interactions (PPIs) drive many cellular processes. Some interactions are directed by Src homology 3 (SH3) domains that bind proline-rich motifs on other proteins. The evolution of the binding specificity of SH3 domains is not completely understood, particularly following gene duplication. Paralogous genes accumulate mutations that can modify protein functions and, for SH3 domains, their binding preferences. Here, we examined how the binding of the SH3 domains of 2 paralogous yeast type I myosins, Myo3 and Myo5, evolved following duplication. We found that the paralogs have subtly different SH3-dependent interaction profiles. However, by swapping SH3 domains between the paralogs and characterizing the SH3 domains freed from their protein context, we find that very few of the differences in interactions, if any, depend on the SH3 domains themselves. We used ancestral sequence reconstruction to resurrect the preduplication SH3 domains and examined, moving back in time, how the binding preference changed. Although the most recent ancestor of the 2 domains had a very similar binding preference as the extant ones, older ancestral domains displayed a gradual loss of interaction with the modern interaction partners when inserted in the extant paralogs. Molecular docking and experimental characterization of the free ancestral domains showed that their affinity with the proline motifs is likely not the cause for this loss of binding. Taken together, our results suggest that a SH3 and its host protein could create intramolecular or allosteric interactions essential for the SH3-dependent PPIs, making domains not functionally equivalent even when they have the same binding specificity.
Collapse
Affiliation(s)
- Pascale Lemieux
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, 1030, Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Regroupement Québécois de Recherche sur la Fonction, l’Ingénierie et les Applications des Protéines, (PROTEO), Université Laval, 1045 Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Centre de recherche en données massives (CRDM), Université Laval, 1065, Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Département de biochimie, microbiologie et bio-informatique, Université Laval, 1045 Avenue de la Médecine, Québec, QC, Canada G1V 0A6
| | - David Bradley
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, 1030, Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Regroupement Québécois de Recherche sur la Fonction, l’Ingénierie et les Applications des Protéines, (PROTEO), Université Laval, 1045 Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Centre de recherche en données massives (CRDM), Université Laval, 1065, Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Département de biochimie, microbiologie et bio-informatique, Université Laval, 1045 Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Département de biologie, Université Laval, 1045 Avenue de la Médecine, Québec, QC, Canada G1V 0A6
| | - Alexandre K Dubé
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, 1030, Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Regroupement Québécois de Recherche sur la Fonction, l’Ingénierie et les Applications des Protéines, (PROTEO), Université Laval, 1045 Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Centre de recherche en données massives (CRDM), Université Laval, 1065, Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Département de biochimie, microbiologie et bio-informatique, Université Laval, 1045 Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Département de biologie, Université Laval, 1045 Avenue de la Médecine, Québec, QC, Canada G1V 0A6
| | - Ugo Dionne
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, 1030, Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Regroupement Québécois de Recherche sur la Fonction, l’Ingénierie et les Applications des Protéines, (PROTEO), Université Laval, 1045 Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Centre de Recherche du Centre Hospitalier Universitaire (CHU) de Québec, Université Laval, Québec, QC, Canada G1R 2J6
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada M5G 1X5
| | - Christian R Landry
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, 1030, Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Regroupement Québécois de Recherche sur la Fonction, l’Ingénierie et les Applications des Protéines, (PROTEO), Université Laval, 1045 Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Centre de recherche en données massives (CRDM), Université Laval, 1065, Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Département de biochimie, microbiologie et bio-informatique, Université Laval, 1045 Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Département de biologie, Université Laval, 1045 Avenue de la Médecine, Québec, QC, Canada G1V 0A6
| |
Collapse
|
8
|
Zheng F, Liu Y, Yang Y, Wen Y, Li M. Assessing computational tools for predicting protein stability changes upon missense mutations using a new dataset. Protein Sci 2024; 33:e4861. [PMID: 38084013 PMCID: PMC10751734 DOI: 10.1002/pro.4861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 11/14/2023] [Accepted: 12/06/2023] [Indexed: 12/28/2023]
Abstract
Insight into how mutations affect protein stability is crucial for protein engineering, understanding genetic diseases, and exploring protein evolution. Numerous computational methods have been developed to predict the impact of amino acid substitutions on protein stability. Nevertheless, comparing these methods poses challenges due to variations in their training data. Moreover, it is observed that they tend to perform better at predicting destabilizing mutations than stabilizing ones. Here, we meticulously compiled a new dataset from three recently published databases: ThermoMutDB, FireProtDB, and ProThermDB. This dataset, which does not overlap with the well-established S2648 dataset, consists of 4038 single-point mutations, including over 1000 stabilizing mutations. We assessed these mutations using 27 computational methods, including the latest ones utilizing mega-scale stability datasets and transfer learning. We excluded entries with overlap or similarity to training datasets to ensure fairness. Pearson correlation coefficients for the tested tools ranged from 0.20 to 0.53 on unseen data, and none of the methods could accurately predict stabilizing mutations, even those performing well in anti-symmetric property analysis. While most methods present consistent trends for predicting destabilizing mutations across various properties such as solvent exposure and secondary conformation, stabilizing mutations do not exhibit a clear pattern. Our study also suggests that solely addressing training dataset bias may not significantly enhance accuracy of predicting stabilizing mutations. These findings emphasize the importance of developing precise predictive methods for stabilizing mutations.
Collapse
Affiliation(s)
- Feifan Zheng
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yang Liu
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yan Yang
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yuhao Wen
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Minghui Li
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| |
Collapse
|
9
|
Rana MM, Nguyen DD. Geometric Graph Learning to Predict Changes in Binding Free Energy and Protein Thermodynamic Stability upon Mutation. J Phys Chem Lett 2023; 14:10870-10879. [PMID: 38032742 DOI: 10.1021/acs.jpclett.3c02679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2023]
Abstract
Accurate prediction of binding free energy changes upon mutations is vital for optimizing drugs, designing proteins, understanding genetic diseases, and cost-effective virtual screening. While machine learning methods show promise in this domain, achieving accuracy and generalization across diverse data sets remains a challenge. This study introduces Geometric Graph Learning for Protein-Protein Interactions (GGL-PPI), a novel approach integrating geometric graph representation and machine learning to forecast mutation-induced binding free energy changes. GGL-PPI leverages atom-level graph coloring and multiscale weighted colored geometric subgraphs to capture structural features of biomolecules, demonstrating superior performance on three standard data sets, namely, AB-Bind, SKEMPI 1.0, and SKEMPI 2.0 data sets. The model's efficacy extends to predicting protein thermodynamic stability in a blind test set, providing unbiased predictions for both direct and reverse mutations and showcasing notable generalization. GGL-PPI's precision in predicting changes in binding free energy and stability due to mutations enhances our comprehension of protein complexes, offering valuable insights for drug design endeavors.
Collapse
Affiliation(s)
- Md Masud Rana
- Department of Mathematics, University of Kentucky, Lexington, Kentucky 40506, United States
| | - Duc Duy Nguyen
- Department of Mathematics, University of Kentucky, Lexington, Kentucky 40506, United States
| |
Collapse
|
10
|
Musil M, Jezik A, Horackova J, Borko S, Kabourek P, Damborsky J, Bednar D. FireProt 2.0: web-based platform for the fully automated design of thermostable proteins. Brief Bioinform 2023; 25:bbad425. [PMID: 38018911 PMCID: PMC10685400 DOI: 10.1093/bib/bbad425] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 10/25/2023] [Accepted: 11/01/2023] [Indexed: 11/30/2023] Open
Abstract
Thermostable proteins find their use in numerous biomedical and biotechnological applications. However, the computational design of stable proteins often results in single-point mutations with a limited effect on protein stability. However, the construction of stable multiple-point mutants can prove difficult due to the possibility of antagonistic effects between individual mutations. FireProt protocol enables the automated computational design of highly stable multiple-point mutants. FireProt 2.0 builds on top of the previously published FireProt web, retaining the original functionality and expanding it with several new stabilization strategies. FireProt 2.0 integrates the AlphaFold database and the homology modeling for structure prediction, enabling calculations starting from a sequence. Multiple-point designs are constructed using the Bron-Kerbosch algorithm minimizing the antagonistic effect between the individual mutations. Users can newly limit the FireProt calculation to a set of user-defined mutations, run a saturation mutagenesis of the whole protein or select rigidifying mutations based on B-factors. Evolution-based back-to-consensus strategy is complemented by ancestral sequence reconstruction. FireProt 2.0 is significantly faster and a reworked graphical user interface broadens the tool's availability even to users with older hardware. FireProt 2.0 is freely available at http://loschmidt.chemi.muni.cz/fireprotweb.
Collapse
Affiliation(s)
- Milos Musil
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic
- Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
- International Clinical Research Centre, St. Anne’s University Hospital Brno, Brno, Czech Republic
| | - Andrej Jezik
- Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
| | - Jana Horackova
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic
| | - Simeon Borko
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic
- Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
- International Clinical Research Centre, St. Anne’s University Hospital Brno, Brno, Czech Republic
| | - Petr Kabourek
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic
- International Clinical Research Centre, St. Anne’s University Hospital Brno, Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic
- International Clinical Research Centre, St. Anne’s University Hospital Brno, Brno, Czech Republic
| | - David Bednar
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic
- International Clinical Research Centre, St. Anne’s University Hospital Brno, Brno, Czech Republic
| |
Collapse
|
11
|
Tsishyn M, Pucci F, Rooman M. Quantification of biases in predictions of protein-protein binding affinity changes upon mutations. Brief Bioinform 2023; 25:bbad491. [PMID: 38197311 PMCID: PMC10777193 DOI: 10.1093/bib/bbad491] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 10/02/2023] [Accepted: 12/05/2023] [Indexed: 01/11/2024] Open
Abstract
Understanding the impact of mutations on protein-protein binding affinity is a key objective for a wide range of biotechnological applications and for shedding light on disease-causing mutations, which are often located at protein-protein interfaces. Over the past decade, many computational methods using physics-based and/or machine learning approaches have been developed to predict how protein binding affinity changes upon mutations. They all claim to achieve astonishing accuracy on both training and test sets, with performances on standard benchmarks such as SKEMPI 2.0 that seem overly optimistic. Here we benchmarked eight well-known and well-used predictors and identified their biases and dataset dependencies, using not only SKEMPI 2.0 as a test set but also deep mutagenesis data on the severe acute respiratory syndrome coronavirus 2 spike protein in complex with the human angiotensin-converting enzyme 2. We showed that, even though most of the tested methods reach a significant degree of robustness and accuracy, they suffer from limited generalizability properties and struggle to predict unseen mutations. Interestingly, the generalizability problems are more severe for pure machine learning approaches, while physics-based methods are less affected by this issue. Moreover, undesirable prediction biases toward specific mutation properties, the most marked being toward destabilizing mutations, are also observed and should be carefully considered by method developers. We conclude from our analyses that there is room for improvement in the prediction models and suggest ways to check, assess and improve their generalizability and robustness.
Collapse
Affiliation(s)
- Matsvei Tsishyn
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Roosevelt Ave, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels, Belgium
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Roosevelt Ave, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels, Belgium
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Roosevelt Ave, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels, Belgium
| |
Collapse
|
12
|
Sieg J, Rarey M. Searching similar local 3D micro-environments in protein structure databases with MicroMiner. Brief Bioinform 2023; 24:bbad357. [PMID: 37833838 DOI: 10.1093/bib/bbad357] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 08/28/2023] [Accepted: 09/18/2023] [Indexed: 10/15/2023] Open
Abstract
The available protein structure data are rapidly increasing. Within these structures, numerous local structural sites depict the details characterizing structure and function. However, searching and analyzing these sites extensively and at scale poses a challenge. We present a new method to search local sites in protein structure databases using residue-defined local 3D micro-environments. We implemented the method in a new tool called MicroMiner and demonstrate the capabilities of residue micro-environment search on the example of structural mutation analysis. Usually, experimental structures for both the wild-type and the mutant are unavailable for comparison. With MicroMiner, we extracted $>255 \times 10^{6}$ amino acid pairs in protein structures from the PDB, exemplifying single mutations' local structural changes for single chains and $>45 \times 10^{6}$ pairs for protein-protein interfaces. We further annotate existing data sets of experimentally measured mutation effects, like $\Delta \Delta G$ measurements, with the extracted structure pairs to combine the mutation effect measurement with the structural change upon mutation. In addition, we show how MicroMiner can bridge the gap between mutation analysis and structure-based drug design tools. MicroMiner is available as a command line tool and interactively on the https://proteins.plus/ webserver.
Collapse
Affiliation(s)
- Jochen Sieg
- Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany
| | - Matthias Rarey
- Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany
| |
Collapse
|
13
|
Pandey P, Panday SK, Rimal P, Ancona N, Alexov E. Predicting the Effect of Single Mutations on Protein Stability and Binding with Respect to Types of Mutations. Int J Mol Sci 2023; 24:12073. [PMID: 37569449 PMCID: PMC10418460 DOI: 10.3390/ijms241512073] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 07/24/2023] [Accepted: 07/26/2023] [Indexed: 08/13/2023] Open
Abstract
The development of methods and algorithms to predict the effect of mutations on protein stability, protein-protein interaction, and protein-DNA/RNA binding is necessitated by the needs of protein engineering and for understanding the molecular mechanism of disease-causing variants. The vast majority of the leading methods require a database of experimentally measured folding and binding free energy changes for training. These databases are collections of experimental data taken from scientific investigations typically aimed at probing the role of particular residues on the above-mentioned thermodynamic characteristics, i.e., the mutations are not introduced at random and do not necessarily represent mutations originating from single nucleotide variants (SNV). Thus, the reported performance of the leading algorithms assessed on these databases or other limited cases may not be applicable for predicting the effect of SNVs seen in the human population. Indeed, we demonstrate that the SNVs and non-SNVs are not equally presented in the corresponding databases, and the distribution of the free energy changes is not the same. It is shown that the Pearson correlation coefficients (PCCs) of folding and binding free energy changes obtained in cases involving SNVs are smaller than for non-SNVs, indicating that caution should be used in applying them to reveal the effect of human SNVs. Furthermore, it is demonstrated that some methods are sensitive to the chemical nature of the mutations, resulting in PCCs that differ by a factor of four across chemically different mutations. All methods are found to underestimate the energy changes by roughly a factor of 2.
Collapse
Affiliation(s)
- Preeti Pandey
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA; (P.P.); (S.K.P.); (P.R.)
| | - Shailesh Kumar Panday
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA; (P.P.); (S.K.P.); (P.R.)
| | - Prawin Rimal
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA; (P.P.); (S.K.P.); (P.R.)
| | - Nicolas Ancona
- Department of Biological Sciences, Clemson University, Clemson, SC 29634, USA;
| | - Emil Alexov
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA; (P.P.); (S.K.P.); (P.R.)
| |
Collapse
|
14
|
Gerasimavicius L, Livesey BJ, Marsh JA. Correspondence between functional scores from deep mutational scans and predicted effects on protein stability. Protein Sci 2023; 32:e4688. [PMID: 37243972 PMCID: PMC10273344 DOI: 10.1002/pro.4688] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 04/19/2023] [Accepted: 05/24/2023] [Indexed: 05/29/2023]
Abstract
Many methodologically diverse computational methods have been applied to the growing challenge of predicting and interpreting the effects of protein variants. As many pathogenic mutations have a perturbing effect on protein stability or intermolecular interactions, one highly interpretable approach is to use protein structural information to model the physical impacts of variants and predict their likely effects on protein stability and interactions. Previous efforts have assessed the accuracy of stability predictors in reproducing thermodynamically accurate values and evaluated their ability to distinguish between known pathogenic and benign mutations. Here, we take an alternate approach, and explore how well stability predictor scores correlate with functional impacts derived from deep mutational scanning (DMS) experiments. In this work, we compare the predictions of 9 protein stability-based tools against mutant protein fitness values from 49 independent DMS datasets, covering 170,940 unique single amino acid variants. We find that FoldX and Rosetta show the strongest correlations with DMS-based functional scores, similar to their previous top performance in distinguishing between pathogenic and benign variants. For both methods, performance is considerably improved when considering intermolecular interactions from protein complex structures, when available. Furthermore, using these two predictors, we derive a "Foldetta" consensus score, which improves upon the performance of both, and manages to match dedicated variant effect predictors in reflecting variant functional impacts. Finally, we also highlight that predicted stability effects show consistently higher correlations with certain DMS experimental phenotypes, particularly those based upon protein abundance, and, in certain cases, can significantly outcompete sequence-based variant effect prediction methodologies for predicting functional scores from DMS experiments.
Collapse
Affiliation(s)
- Lukas Gerasimavicius
- MRC Human Genetics Unit, Institute of Genetics & CancerUniversity of EdinburghEdinburghUK
| | - Benjamin J. Livesey
- MRC Human Genetics Unit, Institute of Genetics & CancerUniversity of EdinburghEdinburghUK
| | - Joseph A. Marsh
- MRC Human Genetics Unit, Institute of Genetics & CancerUniversity of EdinburghEdinburghUK
| |
Collapse
|
15
|
Blaabjerg LM, Kassem MM, Good LL, Jonsson N, Cagiada M, Johansson KE, Boomsma W, Stein A, Lindorff-Larsen K. Rapid protein stability prediction using deep learning representations. eLife 2023; 12:e82593. [PMID: 37184062 PMCID: PMC10266766 DOI: 10.7554/elife.82593] [Citation(s) in RCA: 57] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 05/12/2023] [Indexed: 05/16/2023] Open
Abstract
Predicting the thermodynamic stability of proteins is a common and widely used step in protein engineering, and when elucidating the molecular mechanisms behind evolution and disease. Here, we present RaSP, a method for making rapid and accurate predictions of changes in protein stability by leveraging deep learning representations. RaSP performs on-par with biophysics-based methods and enables saturation mutagenesis stability predictions in less than a second per residue. We use RaSP to calculate ∼ 230 million stability changes for nearly all single amino acid changes in the human proteome, and examine variants observed in the human population. We find that variants that are common in the population are substantially depleted for severe destabilization, and that there are substantial differences between benign and pathogenic variants, highlighting the role of protein stability in genetic diseases. RaSP is freely available-including via a Web interface-and enables large-scale analyses of stability in experimental and predicted protein structures.
Collapse
Affiliation(s)
- Lasse M Blaabjerg
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Maher M Kassem
- Center for Basic Machine Learning Research in Life Science, Department of Computer Science, University of CopenhagenCopenhagenDenmark
| | - Lydia L Good
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Nicolas Jonsson
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Matteo Cagiada
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Kristoffer E Johansson
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Wouter Boomsma
- Center for Basic Machine Learning Research in Life Science, Department of Computer Science, University of CopenhagenCopenhagenDenmark
| | - Amelie Stein
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of CopenhagenCopenhagenDenmark
| |
Collapse
|
16
|
Dou Z, Sun Y, Jiang X, Wu X, Li Y, Gong B, Wang L. Data-driven strategies for the computational design of enzyme thermal stability: trends, perspectives, and prospects. Acta Biochim Biophys Sin (Shanghai) 2023; 55:343-355. [PMID: 37143326 PMCID: PMC10160227 DOI: 10.3724/abbs.2023033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 11/23/2022] [Indexed: 03/05/2023] Open
Abstract
Thermal stability is one of the most important properties of enzymes, which sustains life and determines the potential for the industrial application of biocatalysts. Although traditional methods such as directed evolution and classical rational design contribute greatly to this field, the enormous sequence space of proteins implies costly and arduous experiments. The development of enzyme engineering focuses on automated and efficient strategies because of the breakthrough of high-throughput DNA sequencing and machine learning models. In this review, we propose a data-driven architecture for enzyme thermostability engineering and summarize some widely adopted datasets, as well as machine learning-driven approaches for designing the thermal stability of enzymes. In addition, we present a series of existing challenges while applying machine learning in enzyme thermostability design, such as the data dilemma, model training, and use of the proposed models. Additionally, a few promising directions for enhancing the performance of the models are discussed. We anticipate that the efficient incorporation of machine learning can provide more insights and solutions for the design of enzyme thermostability in the coming years.
Collapse
Affiliation(s)
- Zhixin Dou
- State Key Laboratory of Microbial TechnologyShandong UniversityQingdao266237China
| | - Yuqing Sun
- School of SoftwareShandong UniversityJinan250101China
| | - Xukai Jiang
- National Glycoengineering Research CenterShandong UniversityQingdao266237China
| | - Xiuyun Wu
- State Key Laboratory of Microbial TechnologyShandong UniversityQingdao266237China
| | - Yingjie Li
- State Key Laboratory of Microbial TechnologyShandong UniversityQingdao266237China
| | - Bin Gong
- School of SoftwareShandong UniversityJinan250101China
| | - Lushan Wang
- State Key Laboratory of Microbial TechnologyShandong UniversityQingdao266237China
| |
Collapse
|
17
|
Cisneros AF, Gagnon-Arsenault I, Dubé AK, Després PC, Kumar P, Lafontaine K, Pelletier JN, Landry CR. Epistasis between promoter activity and coding mutations shapes gene evolvability. SCIENCE ADVANCES 2023; 9:eadd9109. [PMID: 36735790 PMCID: PMC9897669 DOI: 10.1126/sciadv.add9109] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 12/22/2022] [Indexed: 06/01/2023]
Abstract
The evolution of protein-coding genes proceeds as mutations act on two main dimensions: regulation of transcription level and the coding sequence. The extent and impact of the connection between these two dimensions are largely unknown because they have generally been studied independently. By measuring the fitness effects of all possible mutations on a protein complex at various levels of promoter activity, we show that promoter activity at the optimal level for the wild-type protein masks the effects of both deleterious and beneficial coding mutations. Mutations that are deleterious at low activity but masked at optimal activity are slightly destabilizing for individual subunits and binding interfaces. Coding mutations that increase protein abundance are beneficial at low expression but could potentially incur a cost at high promoter activity. We thereby demonstrate that promoter activity in interaction with protein properties can dictate which coding mutations are beneficial, neutral, or deleterious.
Collapse
Affiliation(s)
- Angel F. Cisneros
- Département de biochimie, de microbiologie et de bio-informatique, Faculté des sciences et de génie, Université Laval, G1V 0A6, Québec, Canada
- Institut de biologie intégrative et des systèmes, Université Laval, G1V 0A6, Québec, Canada
- PROTEO, Le regroupement québécois de recherche sur la fonction, l’ingénierie et les applications des protéines, Université Laval, G1V 0A6, Québec, Canada
- Centre de recherche sur les données massives, Université Laval, G1V 0A6, Québec, Canada
| | - Isabelle Gagnon-Arsenault
- Département de biochimie, de microbiologie et de bio-informatique, Faculté des sciences et de génie, Université Laval, G1V 0A6, Québec, Canada
- Institut de biologie intégrative et des systèmes, Université Laval, G1V 0A6, Québec, Canada
- PROTEO, Le regroupement québécois de recherche sur la fonction, l’ingénierie et les applications des protéines, Université Laval, G1V 0A6, Québec, Canada
- Centre de recherche sur les données massives, Université Laval, G1V 0A6, Québec, Canada
- Département de biologie, Faculté des sciences et de génie, Université Laval, G1V 0A6, Québec, Canada
| | - Alexandre K. Dubé
- Département de biochimie, de microbiologie et de bio-informatique, Faculté des sciences et de génie, Université Laval, G1V 0A6, Québec, Canada
- Institut de biologie intégrative et des systèmes, Université Laval, G1V 0A6, Québec, Canada
- PROTEO, Le regroupement québécois de recherche sur la fonction, l’ingénierie et les applications des protéines, Université Laval, G1V 0A6, Québec, Canada
- Centre de recherche sur les données massives, Université Laval, G1V 0A6, Québec, Canada
- Département de biologie, Faculté des sciences et de génie, Université Laval, G1V 0A6, Québec, Canada
| | - Philippe C. Després
- Département de biochimie, de microbiologie et de bio-informatique, Faculté des sciences et de génie, Université Laval, G1V 0A6, Québec, Canada
- Institut de biologie intégrative et des systèmes, Université Laval, G1V 0A6, Québec, Canada
- PROTEO, Le regroupement québécois de recherche sur la fonction, l’ingénierie et les applications des protéines, Université Laval, G1V 0A6, Québec, Canada
- Centre de recherche sur les données massives, Université Laval, G1V 0A6, Québec, Canada
| | - Pradum Kumar
- Département de biochimie, de microbiologie et de bio-informatique, Faculté des sciences et de génie, Université Laval, G1V 0A6, Québec, Canada
- Department of Biosciences and Bioengineering, Indian Institute of Technology Roorkee, Roorkee, 247667, India
| | - Kiana Lafontaine
- PROTEO, Le regroupement québécois de recherche sur la fonction, l’ingénierie et les applications des protéines, Université Laval, G1V 0A6, Québec, Canada
- Département de biochimie et de médecine moléculaire, Faculté de médecine, Université de Montréal, H3C 3J7, Montréal, Canada
| | - Joelle N. Pelletier
- PROTEO, Le regroupement québécois de recherche sur la fonction, l’ingénierie et les applications des protéines, Université Laval, G1V 0A6, Québec, Canada
- Département de biochimie et de médecine moléculaire, Faculté de médecine, Université de Montréal, H3C 3J7, Montréal, Canada
- Département de chimie, Faculté des arts et des sciences, Université de Montréal, H3C 3J7, Montréal, Canada
| | - Christian R. Landry
- Département de biochimie, de microbiologie et de bio-informatique, Faculté des sciences et de génie, Université Laval, G1V 0A6, Québec, Canada
- Institut de biologie intégrative et des systèmes, Université Laval, G1V 0A6, Québec, Canada
- PROTEO, Le regroupement québécois de recherche sur la fonction, l’ingénierie et les applications des protéines, Université Laval, G1V 0A6, Québec, Canada
- Centre de recherche sur les données massives, Université Laval, G1V 0A6, Québec, Canada
- Département de biologie, Faculté des sciences et de génie, Université Laval, G1V 0A6, Québec, Canada
| |
Collapse
|
18
|
Lihan M, Lupyan D, Oehme D. Target-template relationships in protein structure prediction and their effect on the accuracy of thermostability calculations. Protein Sci 2023; 32:e4557. [PMID: 36573828 PMCID: PMC9878467 DOI: 10.1002/pro.4557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 12/22/2022] [Accepted: 12/23/2022] [Indexed: 12/28/2022]
Abstract
Improving protein thermostability has been a labor- and time-consuming process in industrial applications of protein engineering. Advances in computational approaches have facilitated the development of more efficient strategies to allow the prioritization of stabilizing mutants. Among these is FEP+, a free energy perturbation implementation that uses a thoroughly tested physics-based method to achieve unparalleled accuracy in predicting changes in protein thermostability. To gauge the applicability of FEP+ to situations where crystal structures are unavailable, here we have applied the FEP+ approach to homology models of 12 different proteins covering 316 mutations. By comparing predictions obtained with homology models to those obtained using crystal structures, we have identified that local rather than global sequence conservation between target and template sequence is a determining factor in the accuracy of predictions. By excluding mutation sites with low local sequence identity (<40%) to a template structure, we have obtained predictions with comparable performance to crystal structures (R2 of 0.67 and 0.63 and an RMSE of 1.20 and 1.16 kcal/mol for crystal structure and homology model predictions, respectively) for identifying stabilizing mutations when incorporating residue scanning into a cascade screening strategy. Additionally, we identify and discuss inherent limitations in sequence alignments and homology modeling protocols that translate into the poor FEP+ performance of a few select examples. Overall, our retrospective study provides detailed guidelines for the application of the FEP+ approach using homology models for protein thermostability predictions, which will greatly extend this approach to studies that were previously limited by structure availability.
Collapse
Affiliation(s)
- Muyun Lihan
- NIH Center for Macromolecular Modeling and Bioinformatics, Beckman Institute for Advanced Science and Technology, and Center for Biophysics and Quantitative BiologyUniversity of Illinois Urbana‐ChampaignUrbanaIllinoisUSA
- Schrödinger Inc.CambridgeMassachusettsUSA
| | | | | |
Collapse
|
19
|
Sora V, Laspiur AO, Degn K, Arnaudi M, Utichi M, Beltrame L, De Menezes D, Orlandi M, Stoltze UK, Rigina O, Sackett PW, Wadt K, Schmiegelow K, Tiberti M, Papaleo E. RosettaDDGPrediction for high-throughput mutational scans: From stability to binding. Protein Sci 2023; 32:e4527. [PMID: 36461907 PMCID: PMC9795540 DOI: 10.1002/pro.4527] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 11/25/2022] [Accepted: 11/25/2022] [Indexed: 12/05/2022]
Abstract
Reliable prediction of free energy changes upon amino acid substitutions (ΔΔGs) is crucial to investigate their impact on protein stability and protein-protein interaction. Advances in experimental mutational scans allow high-throughput studies thanks to multiplex techniques. On the other hand, genomics initiatives provide a large amount of data on disease-related variants that can benefit from analyses with structure-based methods. Therefore, the computational field should keep the same pace and provide new tools for fast and accurate high-throughput ΔΔG calculations. In this context, the Rosetta modeling suite implements effective approaches to predict folding/unfolding ΔΔGs in a protein monomer upon amino acid substitutions and calculate the changes in binding free energy in protein complexes. However, their application can be challenging to users without extensive experience with Rosetta. Furthermore, Rosetta protocols for ΔΔG prediction are designed considering one variant at a time, making the setup of high-throughput screenings cumbersome. For these reasons, we devised RosettaDDGPrediction, a customizable Python wrapper designed to run free energy calculations on a set of amino acid substitutions using Rosetta protocols with little intervention from the user. Moreover, RosettaDDGPrediction assists with checking completed runs and aggregates raw data for multiple variants, as well as generates publication-ready graphics. We showed the potential of the tool in four case studies, including variants of uncertain significance in childhood cancer, proteins with known experimental unfolding ΔΔGs values, interactions between target proteins and disordered motifs, and phosphomimetics. RosettaDDGPrediction is available, free of charge and under GNU General Public License v3.0, at https://github.com/ELELAB/RosettaDDGPrediction.
Collapse
Affiliation(s)
- Valentina Sora
- Cancer Structural Biology, Danish Cancer Society Research CenterCopenhagenDenmark
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Adrian Otamendi Laspiur
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Kristine Degn
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Matteo Arnaudi
- Cancer Structural Biology, Danish Cancer Society Research CenterCopenhagenDenmark
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Mattia Utichi
- Cancer Structural Biology, Danish Cancer Society Research CenterCopenhagenDenmark
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Ludovica Beltrame
- Cancer Structural Biology, Danish Cancer Society Research CenterCopenhagenDenmark
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Dayana De Menezes
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Matteo Orlandi
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Ulrik Kristoffer Stoltze
- Department of Clinical GeneticsCopenhagen University Hospital RigshospitaletCopenhagenDenmark
- Department of Pediatrics and Adolescent MedicineUniversity Hospital RigshospitaletCopenhagenDenmark
- Institute of Clinical Medicine, Faculty of MedicineUniversity of CopenhagenCopenhagenDenmark
| | - Olga Rigina
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Peter Wad Sackett
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| | - Karin Wadt
- Department of Clinical GeneticsCopenhagen University Hospital RigshospitaletCopenhagenDenmark
- Institute of Clinical Medicine, Faculty of MedicineUniversity of CopenhagenCopenhagenDenmark
| | - Kjeld Schmiegelow
- Department of Pediatrics and Adolescent MedicineUniversity Hospital RigshospitaletCopenhagenDenmark
- Institute of Clinical Medicine, Faculty of MedicineUniversity of CopenhagenCopenhagenDenmark
| | - Matteo Tiberti
- Cancer Structural Biology, Danish Cancer Society Research CenterCopenhagenDenmark
| | - Elena Papaleo
- Cancer Structural Biology, Danish Cancer Society Research CenterCopenhagenDenmark
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and TechnologyTechnical University of DenmarkLyngbyDenmark
| |
Collapse
|
20
|
Després PC, Cisneros AF, Alexander EMM, Sonigara R, Gagné-Thivierge C, Dubé AK, Landry CR. Asymmetrical dose responses shape the evolutionary trade-off between antifungal resistance and nutrient use. Nat Ecol Evol 2022; 6:1501-1515. [PMID: 36050399 DOI: 10.1038/s41559-022-01846-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Accepted: 07/07/2022] [Indexed: 12/22/2022]
Abstract
Antimicrobial resistance is an emerging threat for public health. The success of resistance mutations depends on the trade-off between the benefits and costs they incur. This trade-off is largely unknown and uncharacterized for antifungals. Here, we systematically measure the effect of all amino acid substitutions in the yeast cytosine deaminase Fcy1, the target of the antifungal 5-fluorocytosine (5-FC, flucytosine). We identify over 900 missense mutations granting resistance to 5-FC, a large fraction of which appear to act through destabilization of the protein. The relationship between 5-FC resistance and growth sustained by cytosine deamination is characterized by a sharp trade-off, such that small gains in resistance universally lead to large losses in canonical enzyme function. We show that this steep relationship can be explained by differences in the dose-response functions of 5-FC and cytosine. Finally, we observe the same trade-off shape for the orthologue of FCY1 in Cryptoccocus neoformans, a human pathogen. Our results provide a powerful resource and platform for interpreting drug target variants in fungal pathogens as well as unprecedented insights into resistance-function trade-offs.
Collapse
Affiliation(s)
- Philippe C Després
- Département de Biochimie, de Microbiologie et de Bio-informatique, Faculté des Sciences et de Génie, Université Laval, Québec, Canada.
- Institut de Biologie Intégrative et des Systèmes, Université Laval, Québec, Canada.
- PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, Québec, Canada.
- Centre de Recherche sur les Données Massives, Université Laval, Québec, Canada.
| | - Angel F Cisneros
- Département de Biochimie, de Microbiologie et de Bio-informatique, Faculté des Sciences et de Génie, Université Laval, Québec, Canada
- Institut de Biologie Intégrative et des Systèmes, Université Laval, Québec, Canada
- PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, Québec, Canada
- Centre de Recherche sur les Données Massives, Université Laval, Québec, Canada
| | - Emilie M M Alexander
- Département de Biochimie, de Microbiologie et de Bio-informatique, Faculté des Sciences et de Génie, Université Laval, Québec, Canada
- Institut de Biologie Intégrative et des Systèmes, Université Laval, Québec, Canada
- PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, Québec, Canada
- Centre de Recherche sur les Données Massives, Université Laval, Québec, Canada
| | - Ria Sonigara
- Département de Biochimie, de Microbiologie et de Bio-informatique, Faculté des Sciences et de Génie, Université Laval, Québec, Canada
- Institut de Biologie Intégrative et des Systèmes, Université Laval, Québec, Canada
- Centre de Recherche sur les Données Massives, Université Laval, Québec, Canada
- Département de Biologie, Faculté des Sciences et de Génie, Université Laval, Québec, Canada
| | - Cynthia Gagné-Thivierge
- Département de Biochimie, de Microbiologie et de Bio-informatique, Faculté des Sciences et de Génie, Université Laval, Québec, Canada
- Institut de Biologie Intégrative et des Systèmes, Université Laval, Québec, Canada
- PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, Québec, Canada
- Centre de Recherche sur les Données Massives, Université Laval, Québec, Canada
- Département de Biologie, Faculté des Sciences et de Génie, Université Laval, Québec, Canada
| | - Alexandre K Dubé
- Département de Biochimie, de Microbiologie et de Bio-informatique, Faculté des Sciences et de Génie, Université Laval, Québec, Canada
- Institut de Biologie Intégrative et des Systèmes, Université Laval, Québec, Canada
- PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, Québec, Canada
- Centre de Recherche sur les Données Massives, Université Laval, Québec, Canada
- Département de Biologie, Faculté des Sciences et de Génie, Université Laval, Québec, Canada
| | - Christian R Landry
- Département de Biochimie, de Microbiologie et de Bio-informatique, Faculté des Sciences et de Génie, Université Laval, Québec, Canada.
- Institut de Biologie Intégrative et des Systèmes, Université Laval, Québec, Canada.
- PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, Québec, Canada.
- Centre de Recherche sur les Données Massives, Université Laval, Québec, Canada.
- Département de Biologie, Faculté des Sciences et de Génie, Université Laval, Québec, Canada.
| |
Collapse
|
21
|
Shah AA, Alturise F, Alkhalifah T, Khan YD. Deep Learning Approaches for Detection of Breast Adenocarcinoma Causing Carcinogenic Mutations. Int J Mol Sci 2022; 23:ijms231911539. [PMID: 36232840 PMCID: PMC9570286 DOI: 10.3390/ijms231911539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 09/19/2022] [Accepted: 09/23/2022] [Indexed: 11/16/2022] Open
Abstract
Genes are composed of DNA and each gene has a specific sequence. Recombination or replication within the gene base ends in a permanent change in the nucleotide collection in a DNA called mutation and some mutations can lead to cancer. Breast adenocarcinoma starts in secretary cells. Breast adenocarcinoma is the most common of all cancers that occur in women. According to a survey within the United States of America, there are more than 282,000 breast adenocarcinoma patients registered each 12 months, and most of them are women. Recognition of cancer in its early stages saves many lives. A proposed framework is developed for the early detection of breast adenocarcinoma using an ensemble learning technique with multiple deep learning algorithms, specifically: Long Short-Term Memory (LSTM), Gated Recurrent Units (GRU), and Bi-directional LSTM. There are 99 types of driver genes involved in breast adenocarcinoma. This study uses a dataset of 4127 samples including men and women taken from more than 12 cohorts of cancer detection institutes. The dataset encompasses a total of 6170 mutations that occur in 99 genes. On these gene sequences, different algorithms are applied for feature extraction. Three types of testing techniques including independent set testing, self-consistency testing, and a 10-fold cross-validation test is applied to validate and test the learning approaches. Subsequently, multiple deep learning approaches such as LSTM, GRU, and bi-directional LSTM algorithms are applied. Several evaluation metrics are enumerated for the validation of results including accuracy, sensitivity, specificity, Mathew’s correlation coefficient, area under the curve, training loss, precision, recall, F1 score, and Cohen’s kappa while the values obtained are 99.57, 99.50, 99.63, 0.99, 1.0, 0.2027, 99.57, 99.57, 99.57, and 99.14 respectively.
Collapse
Affiliation(s)
- Asghar Ali Shah
- Department of Computer Science, University of Management and Technology, Lahore 54770, Pakistan
| | - Fahad Alturise
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass 58892, Qassim, Saudi Arabia
| | - Tamim Alkhalifah
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass 58892, Qassim, Saudi Arabia
- Correspondence:
| | - Yaser Daanial Khan
- Department of Computer Science, University of Management and Technology, Lahore 54770, Pakistan
| |
Collapse
|
22
|
Pak MA, Ivankov DN. Best templates outperform homology models in predicting the impact of mutations on protein stability. Bioinformatics 2022; 38:4312-4320. [PMID: 35894930 DOI: 10.1093/bioinformatics/btac515] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 05/31/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Prediction of protein stability change upon mutation (ΔΔG) is crucial for facilitating protein engineering and understanding of protein folding principles. Robust prediction of protein folding free energy change requires the knowledge of protein three-dimensional (3D) structure. In case, protein 3D structure is not available, one can predict the structure from protein sequence; however, the perspectives of ΔΔG predictions for predicted protein structures are unknown. The accuracy of using 3D structures of the best templates for the ΔΔG prediction is also unclear. RESULTS To investigate these questions, we used a representative set of seven diverse and accurate publicly available tools (FoldX, Eris, Rosetta, DDGun, ACDC-NN, ThermoNet and DynaMut) for stability change prediction combined with AlphaFold or I-Tasser for protein 3D structure prediction. We found that best templates perform consistently better than (or similar to) homology models for all ΔΔG predictors. Our findings imply using the best template structure for the prediction of protein stability change upon mutation if the protein 3D structure is not available. AVAILABILITY AND IMPLEMENTATION The data are available at https://github.com/ivankovlab/template-vs-model. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Marina A Pak
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow 121205, Russia
| | - Dmitry N Ivankov
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow 121205, Russia
| |
Collapse
|
23
|
PSP-GNM: Predicting Protein Stability Changes upon Point Mutations with a Gaussian Network Model. Int J Mol Sci 2022; 23:ijms231810711. [PMID: 36142614 PMCID: PMC9505940 DOI: 10.3390/ijms231810711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 09/05/2022] [Accepted: 09/09/2022] [Indexed: 11/26/2022] Open
Abstract
Understanding the effects of missense mutations on protein stability is a widely acknowledged significant biological problem. Genomic missense mutations may alter one or more amino acids, leading to increased or decreased stability of the encoded proteins. In this study, we describe a novel approach—Protein Stability Prediction with a Gaussian Network Model (PSP-GNM)—to measure the unfolding Gibbs free energy change (ΔΔG) and evaluate the effects of single amino acid substitutions on protein stability. Specifically, PSP-GNM employs a coarse-grained Gaussian Network Model (GNM) that has interactions between amino acids weighted by the Miyazawa–Jernigan statistical potential. We used PSP-GNM to simulate partial unfolding of the wildtype and mutant protein structures, and then used the difference in the energies and entropies of the unfolded wildtype and mutant proteins to calculate ΔΔG. The extent of the agreement between the ΔΔG calculated by PSP-GNM and the experimental ΔΔG was evaluated on three benchmark datasets: 350 forward mutations (S350 dataset), 669 forward and reverse mutations (S669 dataset) and 611 forward and reverse mutations (S611 dataset). We observed a Pearson correlation coefficient as high as 0.61, which is comparable to many of the existing state-of-the-art methods. The agreement with experimental ΔΔG further increased when we considered only those measurements made close to 25 °C and neutral pH, suggesting dependence on experimental conditions. We also assessed for the antisymmetry (ΔΔGreverse = −ΔΔGforward) between the forward and reverse mutations on the Ssym+ dataset, which has 352 forward and reverse mutations. While most available methods do not display significant antisymmetry, PSP-GNM demonstrated near-perfect antisymmetry, with a Pearson correlation of −0.97. PSP-GNM is written in Python and can be downloaded as a stand-alone code.
Collapse
|
24
|
Cancer-related Mutations with Local or Long-range Effects on an Allosteric Loop of p53. J Mol Biol 2022; 434:167663. [PMID: 35659507 DOI: 10.1016/j.jmb.2022.167663] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 05/19/2022] [Accepted: 05/25/2022] [Indexed: 12/31/2022]
Abstract
The tumor protein 53 (p53) is involved in transcription-dependent and independent processes. Several p53 variants related to cancer have been found to impact protein stability. Other variants, on the contrary, might have little impact on structural stability and have local or long-range effects on the p53 interactome. Our group previously identified a loop in the DNA binding domain (DBD) of p53 (residues 207-213) which can recruit different interactors. Experimental structures of p53 in complex with other proteins strengthen the importance of this interface for protein-protein interactions. We here characterized with structure-based approaches somatic and germline variants of p53 which could have a marginal effect in terms of stability and act locally or allosterically on the region 207-213 with consequences on the cytosolic functions of this protein. To this goal, we studied 1132 variants in the p53 DBD with structure-based approaches, accounting also for protein dynamics. We focused on variants predicted with marginal effects on structural stability. We then investigated each of these variants for their impact on DNA binding, dimerization of the p53 DBD, and intramolecular contacts with the 207-213 region. Furthermore, we identified variants that could modulate long-range the conformation of the region 207-213 using a coarse-grain model for allostery and all-atom molecular dynamics simulations. Our predictions have been further validated using enhanced sampling methods for 15 variants. The methodologies used in this study could be more broadly applied to other p53 variants or cases where conformational changes of loop regions are essential in the function of disease-related proteins.
Collapse
|
25
|
Casadio R, Savojardo C, Fariselli P, Capriotti E, Martelli PL. Turning Failures into Applications: The Problem of Protein ΔΔG Prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2449:169-185. [PMID: 35507262 DOI: 10.1007/978-1-0716-2095-3_6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
After nearly two decades of research in the field of computational methods based on machine learning and knowledge-based potentials for ΔG and ΔΔG prediction upon variations, we now realize that all the approaches are poorly performing when tested on specific cases and that there is large space for improvement. Why this is so? Is it wrong the underlying assumption that experimental protein thermodynamics in solution reflects the thermodynamics of a single protein? Both machine learning and knowledge-based computational methods are rigorous and we know the solid theory behind. We are now in a critical situation, which suggests that predictions of protein instability upon variation should be considered with care. In the following, we will show how to cope with the problem of understanding which protein positions may be of interest for biotechnological and biomedical purposes. By applying a consensus procedure, we indicate possible strategies for the result interpretation.
Collapse
Affiliation(s)
- Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Turin, Italy
| | - Emidio Capriotti
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| |
Collapse
|
26
|
García-Cebollada H, López A, Sancho J. Protposer: the web server that readily proposes protein stabilizing mutations with high PPV. Comput Struct Biotechnol J 2022; 20:2415-2433. [PMID: 35664235 PMCID: PMC9133766 DOI: 10.1016/j.csbj.2022.05.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 05/05/2022] [Accepted: 05/05/2022] [Indexed: 01/23/2023] Open
Abstract
Protein stability is a requisite for most biotechnological and medical applications of proteins. As natural proteins tend to suffer from a low conformational stability ex vivo, great efforts have been devoted toward increasing their stability through rational design and engineering of appropriate mutations. Unfortunately, even the best currently used predictors fail to compute the stability of protein variants with sufficient accuracy and their usefulness as tools to guide the rational stabilisation of proteins is limited. We present here Protposer, a protein stabilising tool based on a different approach. Instead of quantifying changes in stability, Protposer uses structure- and sequence-based screening modules to nominate candidate mutations for subsequent evaluation by a logistic regression model, carefully trained to avoid overfitting. Thus, Protposer analyses PDB files in search for stabilization opportunities and provides a ranked list of promising mutations with their estimated success rates (eSR), their probabilities of being stabilising by at least 0.5 kcal/mol. The agreement between eSRs and actual positive predictive values (PPV) on external datasets of mutations is excellent. When Protposer is used with its Optimal kappa selection threshold, its PPV is above 0.7. Even with less stringent thresholds, Protposer largely outperforms FoldX, Rosetta and PoPMusiC. Indicating the PDB file of the protein suffices to obtain a ranked list of mutations, their eSRs and hints on the likely source of the stabilization expected. Protposer is a distinct, straightforward and highly successful tool to design protein stabilising mutations, and it is freely available for academic use at http://webapps.bifi.es/the-protposer.
Collapse
|
27
|
Tiberti M, Terkelsen T, Degn K, Beltrame L, Cremers TC, da Piedade I, Di Marco M, Maiani E, Papaleo E. MutateX: an automated pipeline for in silico saturation mutagenesis of protein structures and structural ensembles. Brief Bioinform 2022; 23:6552273. [PMID: 35323860 DOI: 10.1093/bib/bbac074] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 01/28/2022] [Accepted: 02/16/2022] [Indexed: 12/26/2022] Open
Abstract
Mutations, which result in amino acid substitutions, influence the stability of proteins and their binding to biomolecules. A molecular understanding of the effects of protein mutations is both of biotechnological and medical relevance. Empirical free energy functions that quickly estimate the free energy change upon mutation (ΔΔG) can be exploited for systematic screenings of proteins and protein complexes. In silico saturation mutagenesis can guide the design of new experiments or rationalize the consequences of known mutations. Often software such as FoldX, while fast and reliable, lack the necessary automation features to apply them in a high-throughput manner. We introduce MutateX, a software to automate the prediction of ΔΔGs associated with the systematic mutation of each residue within a protein, or protein complex to all other possible residue types, using the FoldX energy function. MutateX also supports ΔΔG calculations over protein ensembles, upon post-translational modifications and in multimeric assemblies. At the heart of MutateX lies an automated pipeline engine that handles input preparation, parallelization and outputs publication-ready figures. We illustrate the MutateX protocol applied to different case studies. The results of the high-throughput scan provided by our tools can help in different applications, such as the analysis of disease-associated mutations, to complement experimental deep mutational scans, or assist the design of variants for industrial applications. MutateX is a collection of Python tools that relies on open-source libraries. It is available free of charge under the GNU General Public License from https://github.com/ELELAB/mutatex.
Collapse
Affiliation(s)
- Matteo Tiberti
- Cancer Structural Biology, Danish Cancer Society Research Center, 2100, Copenhagen, Denmark
| | - Thilde Terkelsen
- Cancer Structural Biology, Danish Cancer Society Research Center, 2100, Copenhagen, Denmark
| | - Kristine Degn
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and Technology, Technical University of Denmark, 2800, Lyngby, Denmark
| | - Ludovica Beltrame
- Cancer Structural Biology, Danish Cancer Society Research Center, 2100, Copenhagen, Denmark
| | - Tycho Canter Cremers
- Cancer Structural Biology, Danish Cancer Society Research Center, 2100, Copenhagen, Denmark
| | - Isabelle da Piedade
- Cancer Structural Biology, Danish Cancer Society Research Center, 2100, Copenhagen, Denmark
| | - Miriam Di Marco
- Cancer Structural Biology, Danish Cancer Society Research Center, 2100, Copenhagen, Denmark
| | - Emiliano Maiani
- Cancer Structural Biology, Danish Cancer Society Research Center, 2100, Copenhagen, Denmark
| | - Elena Papaleo
- Cancer Structural Biology, Danish Cancer Society Research Center, 2100, Copenhagen, Denmark.,Cancer Systems Biology, Section for Bioinformatics, Department of Health and Technology, Technical University of Denmark, 2800, Lyngby, Denmark.,Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
28
|
Kebabci N, Timucin AC, Timucin E. Toward Compilation of Balanced Protein Stability Data Sets: Flattening the ΔΔ G Curve through Systematic Enrichment. J Chem Inf Model 2022; 62:1345-1355. [PMID: 35201762 DOI: 10.1021/acs.jcim.2c00054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Often studies analyzing stability data sets and/or predictors ignore neutral mutations and use a binary classification scheme labeling only destabilizing and stabilizing mutations. Recognizing that highly concentrated neutral mutations interfere with data set quality, we have explored three protein stability data sets: S2648, PON-tstab, and the symmetric Ssym that differ in size and quality. A characteristic leptokurtic shape in the ΔΔG distributions of all three data sets including the curated and symmetric ones was reported due to concentrated neutral mutations. To further investigate the impact of neutral mutations on ΔΔG predictions, we have comprehensively assessed the performance of 11 predictors on the PON-tstab data set. Correlation and error analyses showed that all of the predictors performed the best on the neutral mutations, while their performance became gradually worse as the ΔΔG of the mutations departed further from the neutral zone regardless of the direction, implying a bias toward dense mutations. To this end, after unraveling the role of concentrated neutral mutations in biases of stability data sets, we described a systematic enrichment approach to balance the ΔΔG distributions. Before enrichment, mutations were clustered based on their biochemical and/or structural features, and then three mutations were selected from every 2 kcal/mol of each cluster. Upon implementation of this approach by distinct clustering schemes, we generated five subsets varying in size and ΔΔG distributions. All subsets showed improved ΔΔG and frequency distributions. We ultimately reported that the errors toward enriched subsets were higher than those toward the parent data sets, confirming the enrichment of difficult-to-predict mutations in the subsets. In summary, we elaborated the prediction bias toward a concentrated neutral zone and also implemented a rational strategy to tackle this and other forms of biases. Ultimately, this study equipping us with an extended view of shortcomings of stability data sets is a step taken toward development of an unbiased predictor.
Collapse
Affiliation(s)
- Narod Kebabci
- Department of Biostatistics and Bioinformatics, Institute of Health Sciences, Acibadem University, Istanbul 34752, Turkey
| | - Ahmet Can Timucin
- Department of Molecular Biology and Genetics, Faculty of Arts and Sciences, Acibadem University, Istanbul 34752, Turkey
| | - Emel Timucin
- Department of Biostatistics and Medical Informatics, School of Medicine, Acibadem University, Istanbul 34752, Turkey
| |
Collapse
|
29
|
Baek KT, Kepp KP. Data set and fitting dependencies when estimating protein mutant stability: Toward simple, balanced, and interpretable models. J Comput Chem 2022; 43:504-518. [PMID: 35040492 DOI: 10.1002/jcc.26810] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 12/13/2021] [Accepted: 01/03/2022] [Indexed: 12/27/2022]
Abstract
Accurate prediction of protein stability changes upon mutation (ΔΔG) is increasingly important to evolution studies, protein engineering, and screening of disease-causing gene variants but is challenged by biases in training data. We investigated 45 linear regression models trained on data sets that account systematically for destabilization bias and mutation-type bias BM . The models were externally validated on three test data sets probing different pathologies and for internal consistency (symmetry and neutrality). Model structure and performance substantially depended on training data and even fitting method. We developed two final models: SimBa-IB for typical natural mutations and SimBa-SYM for situations where stabilizing and destabilizing mutations occur to a similar extent. SimBa-SYM, despite is simplicity, is essentially non-biased (vs. the Ssym data set) while still performing well for all data sets (R ~ 0.46-0.54, MAE = 1.16-1.24 kcal/mol). The simple models provide advantage in terms of interpretability, use and future improvement, and are freely available on GitHub.
Collapse
Affiliation(s)
| | - Kasper P Kepp
- DTU Chemistry, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
30
|
Pancotti C, Benevenuta S, Birolo G, Alberini V, Repetto V, Sanavia T, Capriotti E, Fariselli P. Predicting protein stability changes upon single-point mutation: a thorough comparison of the available tools on a new dataset. Brief Bioinform 2022; 23:6502552. [PMID: 35021190 PMCID: PMC8921618 DOI: 10.1093/bib/bbab555] [Citation(s) in RCA: 61] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 11/29/2021] [Accepted: 12/05/2021] [Indexed: 12/13/2022] Open
Abstract
Predicting the difference in thermodynamic stability between protein variants is crucial for protein design and understanding the genotype-phenotype relationships. So far, several computational tools have been created to address this task. Nevertheless, most of them have been trained or optimized on the same and ‘all’ available data, making a fair comparison unfeasible. Here, we introduce a novel dataset, collected and manually cleaned from the latest version of the ThermoMutDB database, consisting of 669 variants not included in the most widely used training datasets. The prediction performance and the ability to satisfy the antisymmetry property by considering both direct and reverse variants were evaluated across 21 different tools. The Pearson correlations of the tested tools were in the ranges of 0.21–0.5 and 0–0.45 for the direct and reverse variants, respectively. When both direct and reverse variants are considered, the antisymmetric methods perform better achieving a Pearson correlation in the range of 0.51–0.62. The tested methods seem relatively insensitive to the physiological conditions, performing well also on the variants measured with more extreme pH and temperature values. A common issue with all the tested methods is the compression of the \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$\Delta \Delta G$\end{document} predictions toward zero. Furthermore, the thermodynamic stability of the most significantly stabilizing variants was found to be more challenging to predict. This study is the most extensive comparisons of prediction methods using an entirely novel set of variants never tested before.
Collapse
Affiliation(s)
- Corrado Pancotti
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Silvia Benevenuta
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Giovanni Birolo
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Virginia Alberini
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Valeria Repetto
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Tiziana Sanavia
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Emidio Capriotti
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| |
Collapse
|
31
|
Artificial intelligence challenges for predicting the impact of mutations on protein stability. Curr Opin Struct Biol 2021; 72:161-168. [PMID: 34922207 DOI: 10.1016/j.sbi.2021.11.001] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 09/15/2021] [Accepted: 11/08/2021] [Indexed: 01/17/2023]
Abstract
Stability is a key ingredient of protein fitness, and its modification through targeted mutations has applications in various fields, such as protein engineering, drug design, and deleterious variant interpretation. Many studies have been devoted over the past decades to build new, more effective methods for predicting the impact of mutations on protein stability based on the latest developments in artificial intelligence. We discuss their features, algorithms, computational efficiency, and accuracy estimated on an independent test set. We focus on a critical analysis of their limitations, the recurrent biases toward the training set, their generalizability, and interpretability. We found that the accuracy of the predictors has stagnated at around 1 kcal/mol for over 15 years. We conclude by discussing the challenges that need to be addressed to reach improved performance.
Collapse
|
32
|
Samaga YBL, Raghunathan S, Priyakumar UD. SCONES: Self-Consistent Neural Network for Protein Stability Prediction Upon Mutation. J Phys Chem B 2021; 125:10657-10671. [PMID: 34546056 DOI: 10.1021/acs.jpcb.1c04913] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Engineering proteins to have desired properties by mutating amino acids at specific sites is commonplace. Such engineered proteins must be stable to function. Experimental methods used to determine stability at throughputs required to scan the protein sequence space thoroughly are laborious. To this end, many machine learning based methods have been developed to predict thermodynamic stability changes upon mutation. These methods have been evaluated for symmetric consistency by testing with hypothetical reverse mutations. In this work, we propose transitive data augmentation, evaluating transitive consistency with our new Stransitive data set, and a new machine learning based method, the first of its kind, that incorporates both symmetric and transitive properties into the architecture. Our method, called SCONES, is an interpretable neural network that predicts small relative protein stability changes for missense mutations that do not significantly alter the structure. It estimates a residue's contributions toward protein stability (ΔG) in its local structural environment, and the difference between independently predicted contributions of the reference and mutant residues is reported as ΔΔG. We show that this self-consistent machine learning architecture is immune to many common biases in data sets, relies less on data than existing methods, is robust to overfitting, and can explain a substantial portion of the variance in experimental data.
Collapse
Affiliation(s)
- Yashas B L Samaga
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500 032, India
| | - Shampa Raghunathan
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500 032, India
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500 032, India
| |
Collapse
|
33
|
Islam MKB, Rahman J, Hasan MAM, Ahmad S. predForm-Site: Formylation site prediction by incorporating multiple features and resolving data imbalance. Comput Biol Chem 2021; 94:107553. [PMID: 34384997 DOI: 10.1016/j.compbiolchem.2021.107553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 06/22/2021] [Accepted: 07/28/2021] [Indexed: 10/20/2022]
Abstract
Formylation is one of the newly discovered post-translational modifications in lysine residue which is responsible for different kinds of diseases. In this work, a novel predictor, named predForm-Site, has been developed to predict formylation sites with higher accuracy. We have integrated multiple sequence features for developing a more informative representation of formylation sites. Moreover, decision function of the underlying classifier have been optimized on skewed formylation dataset during prediction model training for prediction quality improvement. On the dataset used by LFPred and Formator predictor, predForm-Site achieved 99.5% sensitivity, 99.8% specificity and 99.8% overall accuracy with AUC of 0.999 in the jackknife test. In the independent test, it has also achieved more than 97% sensitivity and 99% specificity. Similarly, in benchmarking with recent method CKSAAP_FormSite, the proposed predictor significantly outperformed in all the measures, particularly sensitivity by around 20%, specificity by nearly 30% and overall accuracy by more than 22%. These experimental results show that the proposed predForm-Site can be used as a complementary tool for the fast exploration of formylation sites. For convenience of the scientific community, predForm-Site has been deployed as an online tool, accessible at http://103.99.176.239:8080/predForm-Site.
Collapse
Affiliation(s)
- Md Khaled Ben Islam
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia; Department of Computer Science & Engineering, Pabna University of Science and Technology, Pabna, Bangladesh.
| | - Julia Rahman
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia; Department of Computer Science & Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh.
| | - Md Al Mehedi Hasan
- Department of Computer Science & Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| | - Shamim Ahmad
- Department of Computer Science & Engineering, Rajshahi University, Rajshahi, Bangladesh
| |
Collapse
|
34
|
Marabotti A, Del Prete E, Scafuri B, Facchiano A. Performance of Web tools for predicting changes in protein stability caused by mutations. BMC Bioinformatics 2021; 22:345. [PMID: 34225665 PMCID: PMC8256537 DOI: 10.1186/s12859-021-04238-w] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Accepted: 05/18/2021] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Despite decades on developing dedicated Web tools, it is still difficult to predict correctly the changes of the thermodynamic stability of proteins caused by mutations. Here, we assessed the reliability of five recently developed Web tools, in order to evaluate the progresses in the field. RESULTS The results show that, although there are improvements in the field, the assessed predictors are still far from ideal. Prevailing problems include the bias towards destabilizing mutations, and, in general, the results are unreliable when the mutation causes a ΔΔG within the interval ± 0.5 kcal/mol. We found that using several predictors and combining their results into a consensus is a rough, but effective way to increase reliability of the predictions. CONCLUSIONS We suggest all developers to consider in their future tools the usage of balanced data sets for training of predictors, and all users to combine the results of multiple tools to increase the chances of having correct predictions about the effect of mutations on the thermodynamic stability of a protein.
Collapse
Affiliation(s)
- Anna Marabotti
- Department of Chemistry and Biology "A. Zambelli", University of Salerno, Fisciano, SA, Italy.
| | - Eugenio Del Prete
- CNR-IAC, National Research Council, Institute for Applied Mathematics "Mauro Picone", Naples, Italy
| | - Bernardina Scafuri
- Department of Chemistry and Biology "A. Zambelli", University of Salerno, Fisciano, SA, Italy
| | - Angelo Facchiano
- CNR-ISA, National Research Council, Institute of Food Science, Avellino, Italy.
| |
Collapse
|
35
|
A Deep-Learning Sequence-Based Method to Predict Protein Stability Changes Upon Genetic Variations. Genes (Basel) 2021; 12:genes12060911. [PMID: 34204764 PMCID: PMC8231498 DOI: 10.3390/genes12060911] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 06/08/2021] [Accepted: 06/09/2021] [Indexed: 01/17/2023] Open
Abstract
Several studies have linked disruptions of protein stability and its normal functions to disease. Therefore, during the last few decades, many tools have been developed to predict the free energy changes upon protein residue variations. Most of these methods require both sequence and structure information to obtain reliable predictions. However, the lower number of protein structures available with respect to their sequences, due to experimental issues, drastically limits the application of these tools. In addition, current methodologies ignore the antisymmetric property characterizing the thermodynamics of the protein stability: a variation from wild-type to a mutated form of the protein structure (XW→XM) and its reverse process (XM→XW) must have opposite values of the free energy difference (ΔΔGWM=−ΔΔGMW). Here we propose ACDC-NN-Seq, a deep neural network system that exploits the sequence information and is able to incorporate into its architecture the antisymmetry property. To our knowledge, this is the first convolutional neural network to predict protein stability changes relying solely on the protein sequence. We show that ACDC-NN-Seq compares favorably with the existing sequence-based methods.
Collapse
|
36
|
Louis BBV, Abriata LA. Reviewing Challenges of Predicting Protein Melting Temperature Change Upon Mutation Through the Full Analysis of a Highly Detailed Dataset with High-Resolution Structures. Mol Biotechnol 2021; 63:863-884. [PMID: 34101125 PMCID: PMC8443528 DOI: 10.1007/s12033-021-00349-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 06/01/2021] [Indexed: 11/26/2022]
Abstract
Predicting the effects of mutations on protein stability is a key problem in fundamental and applied biology, still unsolved even for the relatively simple case of small, soluble, globular, monomeric, two-state-folder proteins. Many articles discuss the limitations of prediction methods and of the datasets used to train them, which result in low reliability for actual applications despite globally capturing trends. Here, we review these and other issues by analyzing one of the most detailed, carefully curated datasets of melting temperature change (ΔTm) upon mutation for proteins with high-resolution structures. After examining the composition of this dataset to discuss imbalances and biases, we inspect several of its entries assisted by an online app for data navigation and structure display and aided by a neural network that predicts ΔTm with accuracy close to that of programs available to this end. We pose that the ΔTm predictions of our network, and also likely those of other programs, account only for a baseline-like general effect of each type of amino acid substitution which then requires substantial corrections to reproduce the actual stability changes. The corrections are very different for each specific case and arise from fine structural details which are not well represented in the dataset and which, despite appearing reasonable upon visual inspection of the structures, are hard to encode and parametrize. Based on these observations, additional analyses, and a review of recent literature, we propose recommendations for developers of stability prediction methods and for efforts aimed at improving the datasets used for training. We leave our interactive interface for analysis available online at http://lucianoabriata.altervista.org/papersdata/proteinstability2021/s1626navigation.html so that users can further explore the dataset and baseline predictions, possibly serving as a tool useful in the context of structural biology and protein biotechnology research and as material for education in protein biophysics.
Collapse
Affiliation(s)
- Benjamin B V Louis
- Master of Life Sciences Engineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, CH-1015, Lausanne, Switzerland
| | - Luciano A Abriata
- Laboratory for Biomolecular Modeling, School of Life Sciences, École Polytechnique Fédérale de Lausanne, and Swiss Institute of Bioinformatics, CH-1015, Lausanne, Switzerland.
- Protein Production and Structure Core Facility, School of Life Sciences, École Polytechnique Fédérale de Lausanne, CH-1015, Lausanne, Switzerland.
| |
Collapse
|
37
|
Wilson CJ, Chang M, Karttunen M, Choy WY. KEAP1 Cancer Mutants: A Large-Scale Molecular Dynamics Study of Protein Stability. Int J Mol Sci 2021; 22:5408. [PMID: 34065616 PMCID: PMC8161161 DOI: 10.3390/ijms22105408] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Revised: 05/11/2021] [Accepted: 05/13/2021] [Indexed: 12/30/2022] Open
Abstract
We have performed 280 μs of unbiased molecular dynamics (MD) simulations to investigate the effects of 12 different cancer mutations on Kelch-like ECH-associated protein 1 (KEAP1) (G333C, G350S, G364C, G379D, R413L, R415G, A427V, G430C, R470C, R470H, R470S and G476R), one of the frequently mutated proteins in lung cancer. The aim was to provide structural insight into the effects of these mutants, including a new class of ANCHOR (additionally NRF2-complexed hypomorph) mutant variants. Our work provides additional insight into the structural dynamics of mutants that could not be analyzed experimentally, painting a more complete picture of their mutagenic effects. Notably, blade-wise analysis of the Kelch domain points to stability as a possible target of cancer in KEAP1. Interestingly, structural analysis of the R470C ANCHOR mutant, the most prevalent missense mutation in KEAP1, revealed no significant change in structural stability or NRF2 binding site dynamics, possibly indicating an covalent modification as this mutant's mode of action.
Collapse
Affiliation(s)
- Carter J. Wilson
- Department of Biochemistry, The University of Western Ontario, 1151 Richmond Street, London, ON N6A 5C1, Canada; (C.J.W.); (M.C.)
- Department of Applied Mathematics, The University of Western Ontario, 1151 Richmond Street, London, ON N6A 5B7, Canada
| | - Megan Chang
- Department of Biochemistry, The University of Western Ontario, 1151 Richmond Street, London, ON N6A 5C1, Canada; (C.J.W.); (M.C.)
| | - Mikko Karttunen
- Department of Applied Mathematics, The University of Western Ontario, 1151 Richmond Street, London, ON N6A 5B7, Canada
- Department of Chemistry, The University of Western Ontario, 1151 Richmond Street, London, ON N6A 3K7, Canada
- Centre for Advanced Materials and Biomaterials Research, The University of Western Ontario, 1151 Richmond Street, London, ON N6A 5B7, Canada
| | - Wing-Yiu Choy
- Department of Biochemistry, The University of Western Ontario, 1151 Richmond Street, London, ON N6A 5C1, Canada; (C.J.W.); (M.C.)
| |
Collapse
|
38
|
SAAFEC-SEQ: A Sequence-Based Method for Predicting the Effect of Single Point Mutations on Protein Thermodynamic Stability. Int J Mol Sci 2021; 22:ijms22020606. [PMID: 33435356 PMCID: PMC7827184 DOI: 10.3390/ijms22020606] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2020] [Revised: 12/23/2020] [Accepted: 01/06/2021] [Indexed: 01/04/2023] Open
Abstract
Modeling the effect of mutations on protein thermodynamics stability is useful for protein engineering and understanding molecular mechanisms of disease-causing variants. Here, we report a new development of the SAAFEC method, the SAAFEC-SEQ, which is a gradient boosting decision tree machine learning method to predict the change of the folding free energy caused by amino acid substitutions. The method does not require the 3D structure of the corresponding protein, but only its sequence and, thus, can be applied on genome-scale investigations where structural information is very sparse. SAAFEC-SEQ uses physicochemical properties, sequence features, and evolutionary information features to make the predictions. It is shown to consistently outperform all existing state-of-the-art sequence-based methods in both the Pearson correlation coefficient and root-mean-squared-error parameters as benchmarked on several independent datasets. The SAAFEC-SEQ has been implemented into a web server and is available as stand-alone code that can be downloaded and embedded into other researchers’ code.
Collapse
|
39
|
Castellana S, Biagini T, Petrizzelli F, Parca L, Panzironi N, Caputo V, Vescovi AL, Carella M, Mazza T. MitImpact 3: modeling the residue interaction network of the Respiratory Chain subunits. Nucleic Acids Res 2021; 49:D1282-D1288. [PMID: 33300029 PMCID: PMC7779045 DOI: 10.1093/nar/gkaa1032] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Revised: 10/14/2020] [Accepted: 12/08/2020] [Indexed: 12/26/2022] Open
Abstract
Numerous lines of evidence have shown that the interaction between the nuclear and mitochondrial genomes ensures the efficient functioning of the OXPHOS complexes, with substantial implications in bioenergetics, adaptation, and disease. Their interaction is a fascinating and complex trait of the eukaryotic cell that MitImpact explores with its third major release. MitImpact expands its collection of genomic, clinical, and functional annotations of all non-synonymous substitutions of the human mitochondrial genome with new information on putative Compensated Pathogenic Deviations and co-varying amino acid sites of the Respiratory Chain subunits. It further provides evidence of energetic and structural residue compensation by techniques of molecular dynamics simulation. MitImpact is freely accessible at http://mitimpact.css-mendel.it.
Collapse
Affiliation(s)
- Stefano Castellana
- Laboratory of Bioinformatics, IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), 71013, Italy
| | - Tommaso Biagini
- Laboratory of Bioinformatics, IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), 71013, Italy
| | - Francesco Petrizzelli
- Laboratory of Bioinformatics, IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), 71013, Italy
- Department of Experimental Medicine, Sapienza University of Rome, Rome 00161, Italy
| | - Luca Parca
- Laboratory of Bioinformatics, IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), 71013, Italy
| | - Noemi Panzironi
- Department of Experimental Medicine, Sapienza University of Rome, Rome 00161, Italy
| | - Viviana Caputo
- Department of Experimental Medicine, Sapienza University of Rome, Rome 00161, Italy
| | - Angelo Luigi Vescovi
- ISBReMIT Institute for Stem Cell Biology, Regenerative Medicine and Innovative Therapies, IRCSS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), 71013, Italy
| | - Massimo Carella
- Laboratory of Medical Genetics, IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG) 71013, Italy
| | - Tommaso Mazza
- Laboratory of Bioinformatics, IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), 71013, Italy
| |
Collapse
|
40
|
Chen Y, Lu H, Zhang N, Zhu Z, Wang S, Li M. PremPS: Predicting the impact of missense mutations on protein stability. PLoS Comput Biol 2020; 16:e1008543. [PMID: 33378330 PMCID: PMC7802934 DOI: 10.1371/journal.pcbi.1008543] [Citation(s) in RCA: 136] [Impact Index Per Article: 27.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 01/12/2021] [Accepted: 11/16/2020] [Indexed: 12/12/2022] Open
Abstract
Computational methods that predict protein stability changes induced by missense mutations have made a lot of progress over the past decades. Most of the available methods however have very limited accuracy in predicting stabilizing mutations because existing experimental sets are dominated by mutations reducing protein stability. Moreover, few approaches could consistently perform well across different test cases. To address these issues, we developed a new computational method PremPS to more accurately evaluate the effects of missense mutations on protein stability. The PremPS method is composed of only ten evolutionary- and structure-based features and parameterized on a balanced dataset with an equal number of stabilizing and destabilizing mutations. A comprehensive comparison of the predictive performance of PremPS with other available methods on nine benchmark datasets confirms that our approach consistently outperforms other methods and shows considerable improvement in estimating the impacts of stabilizing mutations. A protein could have multiple structures available, and if another structure of the same protein is used, the predicted change in stability for structure-based methods might be different. Thus, we further estimated the impact of using different structures on prediction accuracy, and demonstrate that our method performs well across different types of structures except for low-resolution structures and models built based on templates with low sequence identity. PremPS can be used for finding functionally important variants, revealing the molecular mechanisms of functional influences and protein design. PremPS is freely available at https://lilab.jysw.suda.edu.cn/research/PremPS/, which allows to do large-scale mutational scanning and takes about four minutes to perform calculations for a single mutation per protein with ~ 300 residues and requires ~ 0.4 seconds for each additional mutation.
Collapse
Affiliation(s)
- Yuting Chen
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Haoyu Lu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Ning Zhang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Zefeng Zhu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Shuqin Wang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Minghui Li
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| |
Collapse
|
41
|
Li B, Yang YT, Capra JA, Gerstein MB. Predicting changes in protein thermodynamic stability upon point mutation with deep 3D convolutional neural networks. PLoS Comput Biol 2020; 16:e1008291. [PMID: 33253214 PMCID: PMC7728386 DOI: 10.1371/journal.pcbi.1008291] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 12/10/2020] [Accepted: 08/26/2020] [Indexed: 12/22/2022] Open
Abstract
Predicting mutation-induced changes in protein thermodynamic stability (ΔΔG) is of great interest in protein engineering, variant interpretation, and protein biophysics. We introduce ThermoNet, a deep, 3D-convolutional neural network (3D-CNN) designed for structure-based prediction of ΔΔGs upon point mutation. To leverage the image-processing power inherent in CNNs, we treat protein structures as if they were multi-channel 3D images. In particular, the inputs to ThermoNet are uniformly constructed as multi-channel voxel grids based on biophysical properties derived from raw atom coordinates. We train and evaluate ThermoNet with a curated data set that accounts for protein homology and is balanced with direct and reverse mutations; this provides a framework for addressing biases that have likely influenced many previous ΔΔG prediction methods. ThermoNet demonstrates performance comparable to the best available methods on the widely used Ssym test set. In addition, ThermoNet accurately predicts the effects of both stabilizing and destabilizing mutations, while most other methods exhibit a strong bias towards predicting destabilization. We further show that homology between Ssym and widely used training sets like S2648 and VariBench has likely led to overestimated performance in previous studies. Finally, we demonstrate the practical utility of ThermoNet in predicting the ΔΔGs for two clinically relevant proteins, p53 and myoglobin, and for pathogenic and benign missense variants from ClinVar. Overall, our results suggest that 3D-CNNs can model the complex, non-linear interactions perturbed by mutations, directly from biophysical properties of atoms.
Collapse
Affiliation(s)
- Bian Li
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Biological Sciences and Vanderbilt Genetics Institute, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Yucheng T. Yang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
| | - John A. Capra
- Department of Biological Sciences and Vanderbilt Genetics Institute, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Mark B. Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Computer Science, Yale University, New Haven, Connecticut, United States of America
| |
Collapse
|
42
|
Gerasimavicius L, Liu X, Marsh JA. Identification of pathogenic missense mutations using protein stability predictors. Sci Rep 2020; 10:15387. [PMID: 32958805 PMCID: PMC7506547 DOI: 10.1038/s41598-020-72404-w] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Accepted: 08/31/2020] [Indexed: 12/17/2022] Open
Abstract
Attempts at using protein structures to identify disease-causing mutations have been dominated by the idea that most pathogenic mutations are disruptive at a structural level. Therefore, computational stability predictors, which assess whether a mutation is likely to be stabilising or destabilising to protein structure, have been commonly used when evaluating new candidate disease variants, despite not having been developed specifically for this purpose. We therefore tested 13 different stability predictors for their ability to discriminate between pathogenic and putatively benign missense variants. We find that one method, FoldX, significantly outperforms all other predictors in the identification of disease variants. Moreover, we demonstrate that employing predicted absolute energy change scores improves performance of nearly all predictors in distinguishing pathogenic from benign variants. Importantly, however, we observe that the utility of computational stability predictors is highly heterogeneous across different proteins, and that they are all inferior to the best performing variant effect predictors for identifying pathogenic mutations. We suggest that this is largely due to alternate molecular mechanisms other than protein destabilisation underlying many pathogenic mutations. Thus, better ways of incorporating protein structural information and molecular mechanisms into computational variant effect predictors will be required for improved disease variant prioritisation.
Collapse
Affiliation(s)
- Lukas Gerasimavicius
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, UK
| | - Xin Liu
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, UK
| | - Joseph A Marsh
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, UK.
| |
Collapse
|
43
|
Caldararu O, Mehra R, Blundell TL, Kepp KP. Systematic Investigation of the Data Set Dependency of Protein Stability Predictors. J Chem Inf Model 2020; 60:4772-4784. [DOI: 10.1021/acs.jcim.0c00591] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Octav Caldararu
- DTU Chemistry, Technical University of Denmark, Building 206, 2800 Kgs. Lyngby, Denmark
| | - Rukmankesh Mehra
- DTU Chemistry, Technical University of Denmark, Building 206, 2800 Kgs. Lyngby, Denmark
| | - Tom L. Blundell
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
| | - Kasper P. Kepp
- DTU Chemistry, Technical University of Denmark, Building 206, 2800 Kgs. Lyngby, Denmark
| |
Collapse
|
44
|
Sanavia T, Birolo G, Montanucci L, Turina P, Capriotti E, Fariselli P. Limitations and challenges in protein stability prediction upon genome variations: towards future applications in precision medicine. Comput Struct Biotechnol J 2020; 18:1968-1979. [PMID: 32774791 PMCID: PMC7397395 DOI: 10.1016/j.csbj.2020.07.011] [Citation(s) in RCA: 85] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 07/10/2020] [Accepted: 07/14/2020] [Indexed: 12/13/2022] Open
Abstract
Protein stability predictions are becoming essential in medicine to develop novel immunotherapeutic agents and for drug discovery. Despite the large number of computational approaches for predicting the protein stability upon mutation, there are still critical unsolved problems: 1) the limited number of thermodynamic measurements for proteins provided by current databases; 2) the large intrinsic variability of ΔΔG values due to different experimental conditions; 3) biases in the development of predictive methods caused by ignoring the anti-symmetry of ΔΔG values between mutant and native protein forms; 4) over-optimistic prediction performance, due to sequence similarity between proteins used in training and test datasets. Here, we review these issues, highlighting new challenges required to improve current tools and to achieve more reliable predictions. In addition, we provide a perspective of how these methods will be beneficial for designing novel precision medicine approaches for several genetic disorders caused by mutations, such as cancer and neurodegenerative diseases.
Collapse
Affiliation(s)
- Tiziana Sanavia
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Giovanni Birolo
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Ludovica Montanucci
- Department of Comparative Biomedicine and Food Science (BCA), University of Padova, Viale dell'Università 16, 35020 Legnaro, Italy
| | - Paola Turina
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Via F. Selmi 3, 40126 Bologna, Italy
| | - Emidio Capriotti
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Via F. Selmi 3, 40126 Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| |
Collapse
|
45
|
Marabotti A, Scafuri B, Facchiano A. Predicting the stability of mutant proteins by computational approaches: an overview. Brief Bioinform 2020; 22:5850907. [PMID: 32496523 DOI: 10.1093/bib/bbaa074] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 04/07/2020] [Accepted: 04/10/2020] [Indexed: 01/06/2023] Open
Abstract
A very large number of computational methods to predict the change in thermodynamic stability of proteins due to mutations have been developed during the last 30 years, and many different web servers are currently available. Nevertheless, most of them suffer from severe drawbacks that decrease their general reliability and, consequently, their applicability to different goals such as protein engineering or the predictions of the effects of mutations in genetic diseases. In this review, we have summarized all the main approaches used to develop these tools, with a survey of the web servers currently available. Moreover, we have also reviewed the different assessments made during the years, in order to allow the reader to check directly the different performances of these tools, to select the one that best fits his/her needs, and to help naïve users in finding the best option for their needs.
Collapse
|
46
|
Lv X, Chen J, Lu Y, Chen Z, Xiao N, Yang Y. Accurately Predicting Mutation-Caused Stability Changes from Protein Sequences Using Extreme Gradient Boosting. J Chem Inf Model 2020; 60:2388-2395. [PMID: 32203653 DOI: 10.1021/acs.jcim.0c00064] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Accurately predicting the impact of point mutation on protein stability has crucial roles in protein design and engineering. In this study, we proposed a novel method (BoostDDG) to predict stability changes upon point mutations from protein sequences based on the extreme gradient boosting. We extracted features comprehensively from evolutional information and predicted structures and performed feature selection by a strategy of sequential forward selection. The features and parameters were optimized by homologue-based cross-validation to avoid overfitting. Finally, we found that 14 features from six groups led to the highest Pearson correlation coefficient (PCC) of 0.535, which is consistent with the 0.540 on an independent test. Our method was indicated to consistently outperform other sequence-based methods on three precompiled test sets, and 7363 variants on two proteins (PTEN and TPMT). These results highlighted that BoostDDG is a powerful tool for predicting stability changes upon point mutations from protein sequences.
Collapse
Affiliation(s)
- Xuan Lv
- State Key Laboratory of High-Performance Computing, School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, China
| | - Jianwen Chen
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, Guangdong 510275, China
| | - Yutong Lu
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, Guangdong 510275, China
| | - Zhiguang Chen
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, Guangdong 510275, China
| | - Nong Xiao
- State Key Laboratory of High-Performance Computing, School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, China.,School of Data and Computer Science, Sun Yat-sen University, Guangzhou, Guangdong 510275, China
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, Guangdong 510275, China.,Key Laboratory of Machine Intelligence and Advanced Computing, Sun Yat-sen University, Ministry of Education, Guangzhou, Guangdong 510275, China
| |
Collapse
|
47
|
Zhang N, Chen Y, Lu H, Zhao F, Alvarez RV, Goncearenco A, Panchenko AR, Li M. MutaBind2: Predicting the Impacts of Single and Multiple Mutations on Protein-Protein Interactions. iScience 2020; 23:100939. [PMID: 32169820 PMCID: PMC7068639 DOI: 10.1016/j.isci.2020.100939] [Citation(s) in RCA: 100] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Revised: 11/21/2019] [Accepted: 02/20/2020] [Indexed: 01/17/2023] Open
Abstract
Missense mutations may affect proteostasis by destabilizing or over-stabilizing protein complexes and changing the pathway flux. Predicting the effects of stabilizing mutations on protein-protein interactions is notoriously difficult because existing experimental sets are skewed toward mutations reducing protein-protein binding affinity and many computational methods fail to correctly evaluate their effects. To address this issue, we developed a method MutaBind2, which estimates the impacts of single as well as multiple mutations on protein-protein interactions. MutaBind2 employs only seven features, and the most important of them describe interactions of proteins with the solvent, evolutionary conservation of the site, and thermodynamic stability of the complex and each monomer. This approach shows a distinct improvement especially in evaluating the effects of mutations increasing binding affinity. MutaBind2 can be used for finding disease driver mutations, designing stable protein complexes, and discovering new protein-protein interaction inhibitors.
Collapse
Affiliation(s)
- Ning Zhang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Yuting Chen
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Haoyu Lu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Feiyang Zhao
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Roberto Vera Alvarez
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | - Alexander Goncearenco
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | - Anna R Panchenko
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA.
| | - Minghui Li
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China.
| |
Collapse
|
48
|
Vedithi SC, Rodrigues CHM, Portelli S, Skwark MJ, Das M, Ascher DB, Blundell TL, Malhotra S. Computational saturation mutagenesis to predict structural consequences of systematic mutations in the beta subunit of RNA polymerase in Mycobacterium leprae. Comput Struct Biotechnol J 2020; 18:271-286. [PMID: 32042379 PMCID: PMC7000446 DOI: 10.1016/j.csbj.2020.01.002] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Revised: 01/03/2020] [Accepted: 01/07/2020] [Indexed: 11/26/2022] Open
Abstract
Rifampin resistance in leprosy may remain undetected due to the lack of rapid and effective diagnostic tools. A quick and reliable method is essential to determine the impacts of emerging detrimental mutations in the drug targets. The functional consequences of missense mutations in the β-subunit of RNA polymerase (RNAP) in Mycobacterium leprae (M. leprae) contribute to phenotypic resistance to rifampin in leprosy. Here, we report in-silico saturation mutagenesis of all residues in the β-subunit of RNAP to all other 19 amino acid types (generating 21,394 mutations for 1126 residues) and predict their impacts on overall thermodynamic stability, on interactions at subunit interfaces, and on β-subunit-RNA and rifampin affinities (only for the rifampin binding site) using state-of-the-art structure, sequence and normal mode analysis-based methods. Mutations in the conserved residues that line the active-site cleft show largely destabilizing effects, resulting in increased relative solvent accessibility and a concomitant decrease in residue-depth (the extent to which a residue is buried in the protein structure space) of the mutant residues. The mutations at residue positions S437, G459, H451, P489, K884 and H1035 are identified as extremely detrimental as they induce highly destabilizing effects on the overall protein stability, and nucleic acid and rifampin affinities. Destabilizing effects were predicted for all the clinically/experimentally identified rifampin-resistant mutations in M. leprae indicating that this model can be used as a surveillance tool to monitor emerging detrimental mutations that destabilise RNAP-rifampin interactions and confer rifampin resistance in leprosy. Author summary The emergence of primary and secondary drug resistance to rifampin in leprosy is a growing concern and poses a threat to the leprosy control and elimination measures globally. In the absence of an effective in-vitro system to detect and monitor phenotypic resistance to rifampin in leprosy, diagnosis mainly relies on the presence of mutations in drug resistance determining regions of the rpoB gene that encodes the β-subunit of RNAP in M. leprae. Few labs in the world perform mouse food pad propagation of M. leprae in the presence of drugs (rifampin) to determine growth patterns and confirm resistance, however the duration of these methods lasts from 8 to 12 months making them impractical for diagnosis. Understanding molecular mechanisms of drug resistance is vital to associating mutations to clinically detected drug resistance in leprosy. Here we propose an in-silico saturation mutagenesis approach to comprehensively elucidate the structural implications of any mutations that exist or that can arise in the β-subunit of RNAP in M. leprae. Most of the predicted mutations may not occur in M. leprae due to fitness costs but the information thus generated by this approach help decipher the impacts of mutations across the structure and conversely enable identification of stable regions in the protein that are least impacted by mutations (mutation coolspots) which can be a potential choice for small molecule binding and structure guided drug discovery.
Collapse
Affiliation(s)
| | - Carlos H M Rodrigues
- Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, Parkville, VIC 3052, Australia.,Structural Biology and Bioinformatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
| | - Stephanie Portelli
- Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, Parkville, VIC 3052, Australia.,Structural Biology and Bioinformatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
| | - Marcin J Skwark
- Department of Biochemistry, University of Cambridge, Tennis Court Rd., CB2 1GA, UK
| | - Madhusmita Das
- Molecular Biology Laboratory, Schieffelin Institute of Heath-Research and Leprosy Center, Karigiri, Vellore, Tamil Nadu 632106, India
| | - David B Ascher
- Department of Biochemistry, University of Cambridge, Tennis Court Rd., CB2 1GA, UK.,Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, Parkville, VIC 3052, Australia.,Structural Biology and Bioinformatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Tennis Court Rd., CB2 1GA, UK
| | - Sony Malhotra
- Department of Biochemistry, University of Cambridge, Tennis Court Rd., CB2 1GA, UK
| |
Collapse
|
49
|
Nutschel C, Fulton A, Zimmermann O, Schwaneberg U, Jaeger KE, Gohlke H. Systematically Scrutinizing the Impact of Substitution Sites on Thermostability and Detergent Tolerance for Bacillus subtilis Lipase A. J Chem Inf Model 2020; 60:1568-1584. [PMID: 31905288 DOI: 10.1021/acs.jcim.9b00954] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Improving an enzyme's (thermo-)stability or tolerance against solvents and detergents is highly relevant in protein engineering and biotechnology. Recent developments have tended toward data-driven approaches, where available knowledge about the protein is used to identify substitution sites with high potential to yield protein variants with improved stability, and subsequently, substitutions are engineered by site-directed or site-saturation (SSM) mutagenesis. However, the development and validation of algorithms for data-driven approaches have been hampered by the lack of availability of large-scale data measured in a uniform way and being unbiased with respect to substitution types and locations. Here, we extend our knowledge on guidelines for protein engineering following a data-driven approach by scrutinizing the impact of substitution sites on thermostability or/and detergent tolerance for Bacillus subtilis lipase A (BsLipA) at very large scale. We systematically analyze a complete experimental SSM library of BsLipA containing all 3439 possible single variants, which was evaluated as to thermostability and tolerances against four detergents under respectively uniform conditions. Our results provide systematic and unbiased reference data at unprecedented scale for a biotechnologically important protein, identify consistently defined hot spot types for evaluating the performance of data-driven protein-engineering approaches, and show that the rigidity theory and ensemble-based approach Constraint Network Analysis yields hot spot predictions with an up to ninefold gain in precision over random classification.
Collapse
Affiliation(s)
- Christina Nutschel
- John von Neumann Institute for Computing (NIC) and Institute for Complex Systems-Structural Biochemistry (ICS-6), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany.,Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| | - Alexander Fulton
- Institute of Molecular Enzyme Technology, Heinrich Heine University Düsseldorf, 52425 Jülich, Germany
| | - Olav Zimmermann
- Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| | - Ulrich Schwaneberg
- Institute of Biotechnology, RWTH Aachen University, 52074 Aachen, Germany.,DWI-Leibniz-Institute for Interactive Materials, 52056 Aachen, Germany
| | - Karl-Erich Jaeger
- Institute of Molecular Enzyme Technology, Heinrich Heine University Düsseldorf, 52425 Jülich, Germany.,Institute of Bio- and Geosciences IBG-1: Biotechnology, Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| | - Holger Gohlke
- John von Neumann Institute for Computing (NIC) and Institute for Complex Systems-Structural Biochemistry (ICS-6), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany.,Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany.,Institute for Pharmaceutical and Medicinal Chemistry, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
| |
Collapse
|
50
|
Savojardo C, Martelli PL, Casadio R, Fariselli P. On the critical review of five machine learning-based algorithms for predicting protein stability changes upon mutation. Brief Bioinform 2019; 22:601-603. [PMID: 31885042 DOI: 10.1093/bib/bbz168] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Revised: 11/26/2019] [Accepted: 12/05/2019] [Indexed: 01/17/2023] Open
Abstract
A review, recently published in this journal by Fang (2019), showed that methods trained for the prediction of protein stability changes upon mutation have a very critical bias: they neglect that a protein variation (A- > B) and its reverse (B- > A) must have the opposite value of the free energy difference (ΔΔGAB = - ΔΔGBA). In this letter, we complement the Fang's paper presenting a more general view of the problem. In particular, a machine learning-based method, published in 2015 (INPS), addressed the bias issue directly. We include the analysis of the missing method, showing that INPS is nearly insensitive to the addressed problem.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Torino, Italy
| |
Collapse
|