1
|
Mondal S, Shrivastava P, Mehra R. Computing pathogenicity of mutations in human cytochrome P450 superfamily. BIOCHIMICA ET BIOPHYSICA ACTA. PROTEINS AND PROTEOMICS 2025; 1873:141078. [PMID: 40349948 DOI: 10.1016/j.bbapap.2025.141078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2025] [Revised: 04/22/2025] [Accepted: 05/08/2025] [Indexed: 05/14/2025]
Abstract
Cytochrome P450 (CYPs) are crucial heme-containing enzymes that metabolize drugs and endogenous compounds. In humans, 57 CYP isoforms have been identified, with over 200 mutations linked to severe disorders. Our comprehensive computational study assessed the reason for the pathogenicity of mutations by comparing pathogenic and non-pathogenic variants. We analyzed 25,94,151 mutations across 26 CYP structures using structure- and sequence-based methods, revealing a meaningful stability pattern: non-pathogenic > all > pathogenic mutation datasets. Notably, pathogenic mutations were predominantly buried within CYP structures, indicating a higher potential for pathogenesis. We identified three key amino acid properties affected by mutations: Gibbs free energy, isoelectric point, and volume. Furthermore, diseased mutations significantly reduced positive residue content, particularly due to arginine mutations, which directly influenced the isoelectric point. Our findings indicate a greater likelihood of pathogenic mutations occurring at conserved sites, disrupting CYP function. A higher frequency of pathogenic mutations was observed in heme sites, primarily involving arginine, which may interfere with arginine-heme interactions. Molecular docking revealed a differential binding of heme in wild-type and pathogenic CYPs. This study provides a foundational analysis of mutation effects across multiple CYPs. It models the chemical basis of CYP-related pathogenicity, facilitating the development of a semi-quantitative disease prediction model.
Collapse
Affiliation(s)
- Somnath Mondal
- Department of Chemistry, Indian Institute of Technology Bhilai, Durg 491002, Chhattisgarh, India
| | - Pranchal Shrivastava
- Department of Chemistry, Indian Institute of Technology Bhilai, Durg 491002, Chhattisgarh, India
| | - Rukmankesh Mehra
- Department of Chemistry, Indian Institute of Technology Bhilai, Durg 491002, Chhattisgarh, India; Department of Bioscience and Biomedical Engineering, Indian Institute of Technology Bhilai, Durg 491002, Chhattisgarh, India.
| |
Collapse
|
2
|
Xu K, Fu H, Chen Q, Sun R, Li R, Zhao X, Zhou J, Wang X. Engineering thermostability of industrial enzymes for enhanced application performance. Int J Biol Macromol 2025; 291:139067. [PMID: 39730046 DOI: 10.1016/j.ijbiomac.2024.139067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2024] [Revised: 12/17/2024] [Accepted: 12/19/2024] [Indexed: 12/29/2024]
Abstract
Thermostability is a key factor for the industrial application of enzymes. This review categorizes enzymes by their applications and discusses the importance of engineering thermostability for practical use. It summarizes fundamental theories and recent advancements in enzyme thermostability modification, including directed evolution, semi-rational design, and rational design. Directed evolution uses high-throughput screening to generate random mutations, while semi-rational design combines hotspot identification with screening. Rational design focuses on key residues to enhance stability by improving rigidity, foldability, and reducing aggregation. The review also covers rational strategies like engineering folding energy, surface charge, machine learning methods, and consensus design, along with tools that support these approaches. Practical examples are critically assessed to highlight the benefits and limitations of these strategies. Finally, the challenges and potential contributions of artificial intelligence in enzyme thermostability engineering are discussed.
Collapse
Affiliation(s)
- Kangjie Xu
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China; Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Haoran Fu
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China; Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Qiming Chen
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China; Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Ruoxi Sun
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China; Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Ruosong Li
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China; Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Xinyi Zhao
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China; Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Jingwen Zhou
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China; Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China; Jiangsu Province Engineering Research Center of Food Synthetic Biotechnology, Jiangnan University, Wuxi 214122, China; School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi 214122, China.
| | - Xinglong Wang
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China; Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China.
| |
Collapse
|
3
|
Francisco S, Lamacchia L, Turco A, Ermondi G, Caron G, Rossi Sebastiano M. Restoring adapter protein complex 4 function with small molecules: an in silico approach to spastic paraplegia 50. Protein Sci 2025; 34:e70006. [PMID: 39723768 DOI: 10.1002/pro.70006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2024] [Revised: 11/22/2024] [Accepted: 12/06/2024] [Indexed: 12/28/2024]
Abstract
This study focuses on spastic paraplegia type 50 (SPG50), an adapter protein complex 4 deficiency syndrome caused by mutations in the adapter protein complex 4 subunit mu-1 (AP4M1) gene, and on the downstream alterations of the AP4M1 protein. We applied a battery of heterogeneous computational resources, encompassing two in-house tools described here for the first time, to (a) assess the druggability potential of AP4M1, (b) characterize SPG50-associated mutations and their 3D scenario, (c) identify mutation-tailored drug candidates for SPG50, and (d) elucidate their mechanisms of action by means of structural considerations on homology models of the adapter protein complex 4 core. Altogether, the collected results indicate R367Q as the mutation with the most promising potential of being corrected by small-molecule drugs, and the flavonoid rutin as best candidate for this purpose. Rutin shows promise in rescuing the interaction between the AP4M1 and adapter protein complex subunit beta-1 (AP4B1) subunits by means of a glue-like mode of action. Overall, this approach offers a framework that could be systematically applied to the investigation of mutation-wise molecular mechanisms in different hereditary spastic paraplegias, too.
Collapse
Affiliation(s)
- Serena Francisco
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Torino, Italy
| | - Lorenzo Lamacchia
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Torino, Italy
| | - Attilio Turco
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Torino, Italy
| | - Giuseppe Ermondi
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Torino, Italy
| | - Giulia Caron
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Torino, Italy
| | - Matteo Rossi Sebastiano
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Torino, Italy
| |
Collapse
|
4
|
Sebastiano MR, Hadano S, Cesca F, Ermondi G. Preclinical alternative drug discovery programs for monogenic rare diseases. Should small molecules or gene therapy be used? The case of hereditary spastic paraplegias. Drug Discov Today 2024; 29:104138. [PMID: 39154774 DOI: 10.1016/j.drudis.2024.104138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 06/28/2024] [Accepted: 08/13/2024] [Indexed: 08/20/2024]
Abstract
Patients diagnosed with rare diseases and their and families search desperately to organize drug discovery campaigns. Alternative models that differ from default paradigms offer real opportunities. There are, however, no clear guidelines for the development of such models, which reduces success rates and raises costs. We address the main challenges in making the discovery of new preclinical treatments more accessible, using rare hereditary paraplegia as a paradigmatic case. First, we discuss the necessary expertise, and the patients' clinical and genetic data. Then, we revisit gene therapy, de novo drug development, and drug repurposing, discussing their applicability. Moreover, we explore a pool of recommended in silico tools for pathogenic variant and protein structure prediction, virtual screening, and experimental validation methods, discussing their strengths and weaknesses. Finally, we focus on successful case applications.
Collapse
Affiliation(s)
- Matteo Rossi Sebastiano
- University of Torino, Molecular Biotechnology and Health Sciences Department, CASSMedChem, Piazza Nizza, 10138 Torino, Italy
| | - Shinji Hadano
- Molecular Neuropathobiology Laboratory, Department of Physiology, Tokai University School of Medicine, Isehara, Japan
| | - Fabrizia Cesca
- Department of Life Sciences, University of Trieste, 34127 Trieste, Italy
| | - Giuseppe Ermondi
- University of Torino, Molecular Biotechnology and Health Sciences Department, CASSMedChem, Piazza Nizza, 10138 Torino, Italy.
| |
Collapse
|
5
|
Vila JA. Analysis of proteins in the light of mutations. EUROPEAN BIOPHYSICS JOURNAL : EBJ 2024; 53:255-265. [PMID: 38955858 DOI: 10.1007/s00249-024-01714-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 05/23/2024] [Accepted: 06/18/2024] [Indexed: 07/04/2024]
Abstract
Proteins have evolved through mutations-amino acid substitutions-since life appeared on Earth, some 109 years ago. The study of these phenomena has been of particular significance because of their impact on protein stability, function, and structure. This study offers a new viewpoint on how the most recent findings in these areas can be used to explore the impact of mutations on protein sequence, stability, and evolvability. Preliminary results indicate that: (1) mutations can be viewed as sensitive probes to identify 'typos' in the amino-acid sequence, and also to assess the resistance of naturally occurring proteins to unwanted sequence alterations; (2) the presence of 'typos' in the amino acid sequence, rather than being an evolutionary obstacle, could promote faster evolvability and, in turn, increase the likelihood of higher protein stability; (3) the mutation site is far more important than the substituted amino acid in terms of the marginal stability changes of the protein, and (4) the unpredictability of protein evolution at the molecular level-by mutations-exists even in the absence of epistasis effects. Finally, the Darwinian concept of evolution "descent with modification" and experimental evidence endorse one of the results of this study, which suggests that some regions of any protein sequence are susceptible to mutations while others are not. This work contributes to our general understanding of protein responses to mutations and may spur significant progress in our efforts to develop methods to accurately forecast changes in protein stability, their propensity for metamorphism, and their ability to evolve.
Collapse
Affiliation(s)
- Jorge A Vila
- IMASL-CONICET, Universidad Nacional de San Luis, Ejército de los Andes 950, 5700, San Luis, Argentina.
| |
Collapse
|
6
|
Kouba P, Kohout P, Haddadi F, Bushuiev A, Samusevich R, Sedlar J, Damborsky J, Pluskal T, Sivic J, Mazurenko S. Machine Learning-Guided Protein Engineering. ACS Catal 2023; 13:13863-13895. [PMID: 37942269 PMCID: PMC10629210 DOI: 10.1021/acscatal.3c02743] [Citation(s) in RCA: 45] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/20/2023] [Indexed: 11/10/2023]
Abstract
Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid in the discovery and annotation of promising enzymes, as well as in suggesting beneficial mutations for improving known targets. The field of machine learning for protein engineering is gathering steam, driven by recent success stories and notable progress in other areas. It already encompasses ambitious tasks such as understanding and predicting protein structure and function, catalytic efficiency, enantioselectivity, protein dynamics, stability, solubility, aggregation, and more. Nonetheless, the field is still evolving, with many challenges to overcome and questions to address. In this Perspective, we provide an overview of ongoing trends in this domain, highlight recent case studies, and examine the current limitations of machine learning-based methods. We emphasize the crucial importance of thorough experimental validation of emerging models before their use for rational protein design. We present our opinions on the fundamental problems and outline the potential directions for future research.
Collapse
Affiliation(s)
- Petr Kouba
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Faculty of
Electrical Engineering, Czech Technical
University in Prague, Technicka 2, 166 27 Prague 6, Czech Republic
| | - Pavel Kohout
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Faraneh Haddadi
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Anton Bushuiev
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Raman Samusevich
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Jiri Sedlar
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Jiri Damborsky
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Tomas Pluskal
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Josef Sivic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Stanislav Mazurenko
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| |
Collapse
|
7
|
Umerenkov D, Nikolaev F, Shashkova TI, Strashnov PV, Sindeeva M, Shevtsov A, Ivanisenko NV, Kardymon OL. PROSTATA: a framework for protein stability assessment using transformers. Bioinformatics 2023; 39:btad671. [PMID: 37935419 PMCID: PMC10651431 DOI: 10.1093/bioinformatics/btad671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 10/25/2023] [Accepted: 11/02/2023] [Indexed: 11/09/2023] Open
Abstract
MOTIVATION Accurate prediction of change in protein stability due to point mutations is an attractive goal that remains unachieved. Despite the high interest in this area, little consideration has been given to the transformer architecture, which is dominant in many fields of machine learning. RESULTS In this work, we introduce PROSTATA, a predictive model built in a knowledge-transfer fashion on a new curated dataset. PROSTATA demonstrates advantage over existing solutions based on neural networks. We show that the large improvement margin is due to both the architecture of the model and the quality of the new training dataset. This work opens up opportunities to develop new lightweight and accurate models for protein stability assessment. AVAILABILITY AND IMPLEMENTATION PROSTATA is available at https://github.com/AIRI-Institute/PROSTATA and https://prostata.airi.net.
Collapse
Affiliation(s)
| | | | | | - Pavel V Strashnov
- Bioinformatics Group, AIRI, Moscow 121170, Russia
- Department of Computer Design and Technology, Bauman Moscow State Technical University, Moscow 105005, Russia
| | | | - Andrey Shevtsov
- Bioinformatics Group, AIRI, Moscow 121170, Russia
- Regulatory Transcriptomics and Epigenomics Group, Institute of Bioengineering, Research Center of Biotechnology RAS, Moscow 117036, Russia
| | - Nikita V Ivanisenko
- Bioinformatics Group, AIRI, Moscow 121170, Russia
- Laboratory of Computational Proteomics, Institute of Cytology and Genetics SB RAS, Novosibirsk 630090, Russia
| | | |
Collapse
|
8
|
Chen Z, Wang X, Chen X, Huang J, Wang C, Wang J, Wang Z. Accelerating therapeutic protein design with computational approaches toward the clinical stage. Comput Struct Biotechnol J 2023; 21:2909-2926. [PMID: 38213894 PMCID: PMC10781723 DOI: 10.1016/j.csbj.2023.04.027] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 04/11/2023] [Accepted: 04/27/2023] [Indexed: 01/13/2024] Open
Abstract
Therapeutic protein, represented by antibodies, is of increasing interest in human medicine. However, clinical translation of therapeutic protein is still largely hindered by different aspects of developability, including affinity and selectivity, stability and aggregation prevention, solubility and viscosity reduction, and deimmunization. Conventional optimization of the developability with widely used methods, like display technologies and library screening approaches, is a time and cost-intensive endeavor, and the efficiency in finding suitable solutions is still not enough to meet clinical needs. In recent years, the accelerated advancement of computational methodologies has ushered in a transformative era in the field of therapeutic protein design. Owing to their remarkable capabilities in feature extraction and modeling, the integration of cutting-edge computational strategies with conventional techniques presents a promising avenue to accelerate the progression of therapeutic protein design and optimization toward clinical implementation. Here, we compared the differences between therapeutic protein and small molecules in developability and provided an overview of the computational approaches applicable to the design or optimization of therapeutic protein in several developability issues.
Collapse
Affiliation(s)
- Zhidong Chen
- Department of Pathology, The Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen 518033, China
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Xinpei Wang
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Xu Chen
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Juyang Huang
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Chenglin Wang
- Shenzhen Qiyu Biotechnology Co., Ltd, Shenzhen 518107, China
| | - Junqing Wang
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Zhe Wang
- Department of Pathology, The Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen 518033, China
| |
Collapse
|
9
|
Thakur S, Verma RK, Kepp KP, Mehra R. Modelling SARS-CoV-2 spike-protein mutation effects on ACE2 binding. J Mol Graph Model 2023; 119:108379. [PMID: 36481587 PMCID: PMC9690204 DOI: 10.1016/j.jmgm.2022.108379] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 11/04/2022] [Accepted: 11/21/2022] [Indexed: 11/26/2022]
Abstract
The binding affinity of the SARS-CoV-2 spike (S)-protein to the human membrane protein ACE2 is critical for virus function. Computational structure-based screening of new S-protein mutations for ACE2 binding lends promise to rationalize virus function directly from protein structure and ideally aid early detection of potentially concerning variants. We used a computational protocol based on cryo-electron microscopy structures of the S-protein to estimate the change in ACE2-affinity due to S-protein mutation (ΔΔGbind) in good trend agreement with experimental ACE2 affinities. We then expanded predictions to all possible S-protein mutations in 21 different S-protein-ACE2 complexes (400,000 ΔΔGbind data points in total), using mutation group comparisons to reduce systematic errors. The results suggest that mutations that have arisen in major variants as a group maintain ACE2 affinity significantly more than random mutations in the total protein, at the interface, and at evolvable sites. Omicron mutations as a group had a modest change in binding affinity compared to mutations in other major variants. The single-mutation effects seem consistent with ACE2 binding being optimized and maintained in omicron, despite increased importance of other selection pressures (antigenic drift), however, epistasis, glycosylation and in vivo conditions will modulate these effects. Computational prediction of SARS-CoV-2 evolution remains far from achieved, but the feasibility of large-scale computation is substantially aided by using many structures and mutation groups rather than single mutation effects, which are very uncertain. Our results demonstrate substantial challenges but indicate ways forward to improve the quality of computer models for assessing SARS-CoV-2 mutation effects.
Collapse
Affiliation(s)
- Shivani Thakur
- Department of Chemistry, Indian Institute of Technology Bhilai, Sejbahar, Raipur, 492015, Chhattisgarh, India
| | - Rajaneesh Kumar Verma
- Department of Chemistry, Indian Institute of Technology Bhilai, Sejbahar, Raipur, 492015, Chhattisgarh, India
| | - Kasper Planeta Kepp
- DTU Chemistry, Technical University of Denmark, Building 206, 2800, Kongens Lyngby, Denmark.
| | - Rukmankesh Mehra
- Department of Chemistry, Indian Institute of Technology Bhilai, Sejbahar, Raipur, 492015, Chhattisgarh, India.
| |
Collapse
|
10
|
Agosta F, Kellogg GE, Cozzini P. From oncoproteins to spike proteins: the evaluation of intramolecular stability using hydropathic force field. J Comput Aided Mol Des 2022; 36:797-804. [PMID: 36315295 PMCID: PMC9628575 DOI: 10.1007/s10822-022-00477-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 09/15/2022] [Indexed: 11/09/2022]
Abstract
Evaluation of the intramolecular stability of proteins plays a key role in the comprehension of their biological behavior and mechanism of action. Small structural alterations such as mutations induced by single nucleotide polymorphism can impact biological activity and pharmacological modulation. Covid-19 mutations, that affect viral replication and the susceptibility to antibody neutralization, and the action of antiviral drugs, are just one example. In this work, the intramolecular stability of mutated proteins, like Spike glycoprotein and its complexes with the human target, is evaluated through hydropathic intramolecular energy scoring originally conceived by Abraham and Kellogg based on the “Extension of the fragment method to calculate amino acid zwitterion and side-chain partition coefficients” by Abraham and Leo in Proteins: Struct. Funct. Genet. 1987, 2:130 − 52. HINT is proposed as a fast and reliable tool for the stability evaluation of any mutated system. This work has been written in honor of Prof. Donald J. Abraham (1936–2021).
Collapse
Affiliation(s)
- Federica Agosta
- Molecular Modeling Laboratory, Food and Drug Department, University of Parma, Parco Area delle Scienze 17/A, 43124, Parma, Italy
| | - Glen E Kellogg
- Department of Medicinal Chemistry and Institute for Structural Biology, Drug Discovery and Development, Virginia Commonwealth University, 3298-0133, Richmond, VG, USA
| | - Pietro Cozzini
- Molecular Modeling Laboratory, Food and Drug Department, University of Parma, Parco Area delle Scienze 17/A, 43124, Parma, Italy.
| |
Collapse
|
11
|
Stability and expression of SARS-CoV-2 spike-protein mutations. Mol Cell Biochem 2022; 478:1269-1280. [PMID: 36302994 PMCID: PMC9612610 DOI: 10.1007/s11010-022-04588-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 10/12/2022] [Indexed: 12/02/2022]
Abstract
Protein fold stability likely plays a role in SARS-CoV-2 S-protein evolution, together with ACE2 binding and antibody evasion. While few thermodynamic stability data are available for S-protein mutants, many systematic experimental data exist for their expression. In this paper, we explore whether such expression levels relate to the thermodynamic stability of the mutants. We studied mutation-induced SARS-CoV-2 S-protein fold stability, as computed by three very distinct methods and eight different protein structures to account for method- and structure-dependencies. For all methods and structures used (24 comparisons), computed stability changes correlate significantly (99% confidence level) with experimental yeast expression from the literature, such that higher expression is associated with relatively higher fold stability. Also significant, albeit weaker, correlations were seen between stability and ACE2 binding effects. The effect of thermodynamic fold stability may be direct or a correlate of amino acid or site properties, notably the solvent exposure of the site. Correlation between computed stability and experimental expression and ACE2 binding suggests that functional properties of the SARS-CoV-2 S-protein mutant space are largely determined by a few simple features, due to underlying correlations. Our study lends promise to the development of computational tools that may ideally aid in understanding and predicting SARS-CoV-2 S-protein evolution.
Collapse
|
12
|
Structural heterogeneity and precision of implications drawn from cryo-electron microscopy structures: SARS-CoV-2 spike-protein mutations as a test case. EUROPEAN BIOPHYSICS JOURNAL 2022; 51:555-568. [PMID: 36167828 PMCID: PMC9514682 DOI: 10.1007/s00249-022-01619-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 09/19/2022] [Indexed: 11/18/2022]
Abstract
Protein structures may be used to draw functional implications at the residue level, but how sensitive are these implications to the exact structure used? Calculation of the effects of SARS-CoV-2 S-protein mutations based on experimental cryo-electron microscopy structures have been abundant during the pandemic. To understand the precision of such estimates, we studied three distinct methods to estimate stability changes for all possible mutations in 23 different S-protein structures (3.69 million ΔΔG values in total) and explored how random and systematic errors can be remedied by structure-averaged mutation group comparisons. We show that computational estimates have low precision, due to method and structure heterogeneity making results for single mutations uninformative. However, structure-averaged differences in mean effects for groups of substitutions can yield significant results. Illustrating this protocol, functionally important natural mutations, despite individual variations, average to a smaller stability impact compared to other possible mutations, independent of conformational state (open, closed). In summary, we document substantial issues with precision in structure-based protein modeling and recommend sensitivity tests to quantify these effects, but also suggest partial solutions to the problem in the form of structure-averaged “ensemble” estimates for groups of residues when multiple structures are available.
Collapse
|
13
|
Bæk KT, Kepp KP. Assessment of AlphaFold2 for Human Proteins via Residue Solvent Exposure. J Chem Inf Model 2022; 62:3391-3400. [PMID: 35785970 DOI: 10.1021/acs.jcim.2c00243] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
As only 35% of human proteins feature (often partial) PDB structures, the protein structure prediction tool AlphaFold2 (AF2) could have massive impact on human biology and medicine fields, making independent benchmarks of interest. We studied AF2's ability to describe the backbone solvent exposure as a functionally important and easily interpretable "natural coordinate" of protein conformation, using human proteins as test case. After screening for appropriate comparative sets, we matched 1818 human proteins predicted by AF2 against 7585 unique experimental PDBs, and after curation for sequence overlap, we assessed 1264 comparative pairs comprising 115 unique AF2 structures and 652 unique experimental structures. AF2 performed markedly worse for multimers, whereas ligands, cofactors, and experimental resolution were interestingly not very important for performance. AF2 performed excellently for monomer proteins. Challenges relating to specific groups of residues and multimers were analyzed. We identified larger deviations for lower-confidence scores (pLDDT), and exposed residues and polar residues (e.g., Asp, Glu, Asn) being less accurately described than hydrophobic residues. Proline conformations were the hardest to predict, probably due to a common location in dynamic solvent-accessible parts. In summary, using solvent exposure as a metric, we quantified the performance of AF2 for human proteins and provided estimates of the expected agreement as a function of ligand presence, multimer/monomer status, local residue solvent exposure, pLDDT, and amino acid type. Overall performance was found to be excellent.
Collapse
Affiliation(s)
- Kristoffer T Bæk
- DTU Chemistry, Technical University of Denmark, Building 206, Kgs. Lyngby 2800, Denmark
| | - Kasper P Kepp
- DTU Chemistry, Technical University of Denmark, Building 206, Kgs. Lyngby 2800, Denmark
| |
Collapse
|
14
|
Ferla MP, Pagnamenta AT, Koukouflis L, Taylor JC, Marsden BD. Venus: Elucidating the Impact of Amino Acid Variants on Protein Function Beyond Structure Destabilisation. J Mol Biol 2022; 434:167567. [PMID: 35662467 PMCID: PMC9742853 DOI: 10.1016/j.jmb.2022.167567] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Revised: 03/11/2022] [Accepted: 03/22/2022] [Indexed: 12/15/2022]
Abstract
Exploring the functional effect of a non-synonymous coding variant at the protein level requires multiple pieces of information to be interpreted appropriately. This is particularly important when embarking on the study of a potentially pathogenic variant linked to a rare or monogenic disease. Whereas accurate protein stability predictions alone are generally informative, other effects, such as disruption of post-translational modifications or weakened ligand binding, may also contribute to the disease phenotype. Furthermore, consideration of nearby variants that are found in the healthy population may strengthen or refute a given mechanistic hypothesis. Whilst there are several bioinformatics tools available that score a genetic variant in terms of deleteriousness, there is no single tool that assembles multiple effects of a variant on the encoded protein, beyond structural stability, and presents them on the structure for inspection. Venus is a web application which, given a protein substitution, rapidly estimates the predicted effect on protein stability of the variant, flags if the variant affects a post-translational modification site, a predicted linear motif or known annotation, and determines the effect on protein stability of variants which affect nearby residues and have been identified in healthy populations. Venus is built upon Michelanglo and the results can be exported to it, allowing them to be annotated and shared with other researchers. Venus is freely accessible at https://venus.cmd.ox.ac.uk and its source code is openly available at https://github.com/CMD-Oxford/Michelanglo-and-Venus.
Collapse
Affiliation(s)
- Matteo P Ferla
- Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK; Oxford NIHR Biomedical Research Centre, Oxford, UK.
| | - Alistair T Pagnamenta
- Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK; Oxford NIHR Biomedical Research Centre, Oxford, UK. https://twitter.com/@alistairp2011
| | - Leonidas Koukouflis
- Centre for Medicines Discovery, University of Oxford, Old Road Campus Research Building, Oxford OX3 7DQ, UK
| | - Jenny C Taylor
- Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK; Oxford NIHR Biomedical Research Centre, Oxford, UK
| | - Brian D Marsden
- Centre for Medicines Discovery, University of Oxford, Old Road Campus Research Building, Oxford OX3 7DQ, UK; Kennedy Institute of Rheumatology, University of Oxford, Oxford OX3 7FY, UK. https://twitter.com/@bmarsden19
| |
Collapse
|
15
|
Horne J, Shukla D. Recent Advances in Machine Learning Variant Effect Prediction Tools for Protein Engineering. Ind Eng Chem Res 2022; 61:6235-6245. [PMID: 36051311 PMCID: PMC9432854 DOI: 10.1021/acs.iecr.1c04943] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Proteins are Nature's molecular machinery and comprise diverse roles while consisting of chemically similar building blocks. In recent years, protein engineering and design have become important research areas, with many applications in the pharmaceutical, energy, and biocatalysis fields, among others-where the aim is to ultimately create a protein given desired structural and functional properties. It is often critical to model the relationship between a protein's sequence, folded structure, and biological function to assist in such protein engineering pursuits. However, significant challenges remain in concretely mapping an amino acid sequence to specific protein properties and biological activities. Mutations may enhance or diminish molecular protein function, and the epistatic interactions between mutations result in an inherently complex mapping between genetic modifications and protein function. Therefore, estimating the quantitative effects of mutations on protein function(s) remains a grand challenge of biology, bioinformatics, and many related fields and would rapidly accelerate protein engineering tasks when successful. Such estimation is often known as variant effect prediction (VEP). However, progress has been demonstrated in recent years with the development of machine learning (ML) methods in modeling the relationship between mutations and protein function. In this Review, recent advances in variant effect prediction (VEP) are discussed as tools for protein engineering, focusing on techniques incorporating gains from the broader ML community and challenges in estimating biomolecular functional differences. Primary developments highlighted include convolutional neural networks, graph neural networks, and natural language embeddings for protein sequences.
Collapse
Affiliation(s)
- Jesse Horne
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign, Champaign, Illinois 61801, United States
| | - Diwakar Shukla
- Department of Chemical and Biomolecular Engineering and Department of Bioengineering, University of Illinois Urbana-Champaign, Champaign, Illinois 61801, United States; Department of Plant Biology, Cancer Center at Illinois, and Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Champaign, Illinois 61801, United States
| |
Collapse
|
16
|
García-Cebollada H, López A, Sancho J. Protposer: the web server that readily proposes protein stabilizing mutations with high PPV. Comput Struct Biotechnol J 2022; 20:2415-2433. [PMID: 35664235 PMCID: PMC9133766 DOI: 10.1016/j.csbj.2022.05.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 05/05/2022] [Accepted: 05/05/2022] [Indexed: 01/23/2023] Open
Abstract
Protein stability is a requisite for most biotechnological and medical applications of proteins. As natural proteins tend to suffer from a low conformational stability ex vivo, great efforts have been devoted toward increasing their stability through rational design and engineering of appropriate mutations. Unfortunately, even the best currently used predictors fail to compute the stability of protein variants with sufficient accuracy and their usefulness as tools to guide the rational stabilisation of proteins is limited. We present here Protposer, a protein stabilising tool based on a different approach. Instead of quantifying changes in stability, Protposer uses structure- and sequence-based screening modules to nominate candidate mutations for subsequent evaluation by a logistic regression model, carefully trained to avoid overfitting. Thus, Protposer analyses PDB files in search for stabilization opportunities and provides a ranked list of promising mutations with their estimated success rates (eSR), their probabilities of being stabilising by at least 0.5 kcal/mol. The agreement between eSRs and actual positive predictive values (PPV) on external datasets of mutations is excellent. When Protposer is used with its Optimal kappa selection threshold, its PPV is above 0.7. Even with less stringent thresholds, Protposer largely outperforms FoldX, Rosetta and PoPMusiC. Indicating the PDB file of the protein suffices to obtain a ranked list of mutations, their eSRs and hints on the likely source of the stabilization expected. Protposer is a distinct, straightforward and highly successful tool to design protein stabilising mutations, and it is freely available for academic use at http://webapps.bifi.es/the-protposer.
Collapse
|
17
|
Baek KT, Kepp KP. Data set and fitting dependencies when estimating protein mutant stability: Toward simple, balanced, and interpretable models. J Comput Chem 2022; 43:504-518. [PMID: 35040492 DOI: 10.1002/jcc.26810] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 12/13/2021] [Accepted: 01/03/2022] [Indexed: 12/27/2022]
Abstract
Accurate prediction of protein stability changes upon mutation (ΔΔG) is increasingly important to evolution studies, protein engineering, and screening of disease-causing gene variants but is challenged by biases in training data. We investigated 45 linear regression models trained on data sets that account systematically for destabilization bias and mutation-type bias BM . The models were externally validated on three test data sets probing different pathologies and for internal consistency (symmetry and neutrality). Model structure and performance substantially depended on training data and even fitting method. We developed two final models: SimBa-IB for typical natural mutations and SimBa-SYM for situations where stabilizing and destabilizing mutations occur to a similar extent. SimBa-SYM, despite is simplicity, is essentially non-biased (vs. the Ssym data set) while still performing well for all data sets (R ~ 0.46-0.54, MAE = 1.16-1.24 kcal/mol). The simple models provide advantage in terms of interpretability, use and future improvement, and are freely available on GitHub.
Collapse
Affiliation(s)
| | - Kasper P Kepp
- DTU Chemistry, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
18
|
Abstract
The spike protein (S-protein) of SARS-CoV-2, the protein that enables the virus to infect human cells, is the basis for many vaccines and a hotspot of concerning virus evolution. Here, we discuss the outstanding progress in structural characterization of the S-protein and how these structures facilitate analysis of virus function and evolution. We emphasize the differences in reported structures and that analysis of structure-function relationships is sensitive to the structure used. We show that the average residue solvent exposure in nearly complete structures is a good descriptor of open vs closed conformation states. Because of structural heterogeneity of functionally important surface-exposed residues, we recommend using averages of a group of high-quality protein structures rather than a single structure before reaching conclusions on specific structure-function relationships. To illustrate these points, we analyze some significant chemical tendencies of prominent S-protein mutations in the context of the available structures. In the discussion of new variants, we emphasize the selectivity of binding to ACE2 vs prominent antibodies rather than simply the antibody escape or ACE2 affinity separately. We note that larger chemical changes, in particular increased electrostatic charge or side-chain volume of exposed surface residues, are recurring in mutations of concern, plausibly related to adaptation to the negative surface potential of human ACE2. We also find indications that the fixated mutations of the S-protein in the main variants are less destabilizing than would be expected on average, possibly pointing toward a selection pressure on the S-protein. The richness of available structures for all of these situations provides an enormously valuable basis for future research into these structure-function relationships.
Collapse
Affiliation(s)
- Rukmankesh Mehra
- Department of Chemistry, Indian Institute
of Technology Bhilai, Sejbahar, Raipur 492015, Chhattisgarh,
India
| | - Kasper P. Kepp
- DTU Chemistry, Technical University of
Denmark, Building 206, 2800 Kongens Lyngby,
Denmark
| |
Collapse
|
19
|
El Harrar T, Davari MD, Jaeger KE, Schwaneberg U, Gohlke H. Critical assessment of structure-based approaches to improve protein resistance in aqueous ionic liquids by enzyme-wide saturation mutagenesis. Comput Struct Biotechnol J 2022; 20:399-409. [PMID: 35070165 PMCID: PMC8752993 DOI: 10.1016/j.csbj.2021.12.018] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Revised: 12/10/2021] [Accepted: 12/11/2021] [Indexed: 12/12/2022] Open
Abstract
Ionic liquids (IL) and aqueous ionic liquids (aIL) are attractive (co-)solvents for green industrial processes involving biocatalysts, but often reduce enzyme activity. Experimental and computational methods are applied to predict favorable substitution sites and, most often, subsequent site-directed surface charge modifications are introduced to enhance enzyme resistance towards aIL. However, almost no studies evaluate the prediction precision with random mutagenesis or the application of simple data-driven filtering processes. Here, we systematically and rigorously evaluated the performance of 22 previously described structure-based approaches to increase enzyme resistance to aIL based on an experimental complete site-saturation mutagenesis library of Bacillus subtilis Lipase A (BsLipA) screened against four aIL. We show that, surprisingly, most of the approaches yield low gain-in-precision (GiP) values, particularly for predicting relevant positions: 14 approaches perform worse than random mutagenesis. Encouragingly, exploiting experimental information on the thermostability of BsLipA or structural weak spots of BsLipA predicted by rigidity theory yields GiP = 3.03 and 2.39 for relevant variants and GiP = 1.61 and 1.41 for relevant positions. Combining five simple-to-compute physicochemical and evolutionary properties substantially increases the precision of predicting relevant variants and positions, yielding GiP = 3.35 and 1.29. Finally, combining these properties with predictions of structural weak spots identified by rigidity theory additionally improves GiP for relevant variants up to 4-fold to ∼10 and sustains or increases GiP for relevant positions, resulting in a prediction precision of ∼90% compared to ∼9% in random mutagenesis. This combination should be applicable to other enzyme systems for guiding protein engineering approaches towards improved aIL resistance.
Collapse
Affiliation(s)
- Till El Harrar
- Institute of Biotechnology, RWTH Aachen University, 52074 Aachen, Germany
- John-von-Neumann-Institute for Computing (NIC), Jülich Supercomputing Centre (JSC), Institute of Biological Information Processing (IBI-7: Structural Biochemistry), and Institute of Bio- and Geosciences (IBG-4: Bioinformatics), Forschungszentrum Jülich GmbH, 52428 Jülich, Germany
| | - Mehdi D. Davari
- Department of Bioorganic Chemistry, Leibniz Institute of Plant Biochemistry, 06120 Halle, Germany
| | - Karl-Erich Jaeger
- Institute of Molecular Enzyme Technology, Heinrich Heine University Düsseldorf, 52428 Jülich, Germany
- Institute of Bio- and Geosciences IBG-1: Biotechnology, Forschungszentrum Jülich GmbH, 52428 Jülich, Germany
| | - Ulrich Schwaneberg
- Institute of Biotechnology, RWTH Aachen University, 52074 Aachen, Germany
- DWI – Leibniz Institute for Interactive Materials e.V., 52074 Aachen, Germany
| | - Holger Gohlke
- John-von-Neumann-Institute for Computing (NIC), Jülich Supercomputing Centre (JSC), Institute of Biological Information Processing (IBI-7: Structural Biochemistry), and Institute of Bio- and Geosciences (IBG-4: Bioinformatics), Forschungszentrum Jülich GmbH, 52428 Jülich, Germany
- Institute for Pharmaceutical and Medicinal Chemistry, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
- Corresponding author at: John-von-Neumann-Institute for Computing (NIC), Jülich Supercomputing Centre (JSC), Institute of Biological Information Processing (IBI-7: Structural Biochemistry), and Institute of Bio- and Geosciences (IBG-4: Bioinformatics), Forschungszentrum Jülich GmbH, Wilhelm-Johnen-Str., 52428 Jülich, Germany.
| |
Collapse
|
20
|
Artificial intelligence challenges for predicting the impact of mutations on protein stability. Curr Opin Struct Biol 2021; 72:161-168. [PMID: 34922207 DOI: 10.1016/j.sbi.2021.11.001] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 09/15/2021] [Accepted: 11/08/2021] [Indexed: 01/17/2023]
Abstract
Stability is a key ingredient of protein fitness, and its modification through targeted mutations has applications in various fields, such as protein engineering, drug design, and deleterious variant interpretation. Many studies have been devoted over the past decades to build new, more effective methods for predicting the impact of mutations on protein stability based on the latest developments in artificial intelligence. We discuss their features, algorithms, computational efficiency, and accuracy estimated on an independent test set. We focus on a critical analysis of their limitations, the recurrent biases toward the training set, their generalizability, and interpretability. We found that the accuracy of the predictors has stagnated at around 1 kcal/mol for over 15 years. We conclude by discussing the challenges that need to be addressed to reach improved performance.
Collapse
|
21
|
Louis BBV, Abriata LA. Reviewing Challenges of Predicting Protein Melting Temperature Change Upon Mutation Through the Full Analysis of a Highly Detailed Dataset with High-Resolution Structures. Mol Biotechnol 2021; 63:863-884. [PMID: 34101125 PMCID: PMC8443528 DOI: 10.1007/s12033-021-00349-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 06/01/2021] [Indexed: 11/26/2022]
Abstract
Predicting the effects of mutations on protein stability is a key problem in fundamental and applied biology, still unsolved even for the relatively simple case of small, soluble, globular, monomeric, two-state-folder proteins. Many articles discuss the limitations of prediction methods and of the datasets used to train them, which result in low reliability for actual applications despite globally capturing trends. Here, we review these and other issues by analyzing one of the most detailed, carefully curated datasets of melting temperature change (ΔTm) upon mutation for proteins with high-resolution structures. After examining the composition of this dataset to discuss imbalances and biases, we inspect several of its entries assisted by an online app for data navigation and structure display and aided by a neural network that predicts ΔTm with accuracy close to that of programs available to this end. We pose that the ΔTm predictions of our network, and also likely those of other programs, account only for a baseline-like general effect of each type of amino acid substitution which then requires substantial corrections to reproduce the actual stability changes. The corrections are very different for each specific case and arise from fine structural details which are not well represented in the dataset and which, despite appearing reasonable upon visual inspection of the structures, are hard to encode and parametrize. Based on these observations, additional analyses, and a review of recent literature, we propose recommendations for developers of stability prediction methods and for efforts aimed at improving the datasets used for training. We leave our interactive interface for analysis available online at http://lucianoabriata.altervista.org/papersdata/proteinstability2021/s1626navigation.html so that users can further explore the dataset and baseline predictions, possibly serving as a tool useful in the context of structural biology and protein biotechnology research and as material for education in protein biophysics.
Collapse
Affiliation(s)
- Benjamin B V Louis
- Master of Life Sciences Engineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, CH-1015, Lausanne, Switzerland
| | - Luciano A Abriata
- Laboratory for Biomolecular Modeling, School of Life Sciences, École Polytechnique Fédérale de Lausanne, and Swiss Institute of Bioinformatics, CH-1015, Lausanne, Switzerland.
- Protein Production and Structure Core Facility, School of Life Sciences, École Polytechnique Fédérale de Lausanne, CH-1015, Lausanne, Switzerland.
| |
Collapse
|