1
|
Bernett J, Blumenthal DB, Grimm DG, Haselbeck F, Joeres R, Kalinina OV, List M. Guiding questions to avoid data leakage in biological machine learning applications. Nat Methods 2024; 21:1444-1453. [PMID: 39122953 DOI: 10.1038/s41592-024-02362-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 06/26/2024] [Indexed: 08/12/2024]
Abstract
Machine learning methods for extracting patterns from high-dimensional data are very important in the biological sciences. However, in certain cases, real-world applications cannot confirm the reported prediction performance. One of the main reasons for this is data leakage, which can be seen as the illicit sharing of information between the training data and the test data, resulting in performance estimates that are far better than the performance observed in the intended application scenario. Data leakage can be difficult to detect in biological datasets due to their complex dependencies. With this in mind, we present seven questions that should be asked to prevent data leakage when constructing machine learning models in biological domains. We illustrate the usefulness of our questions by applying them to nontrivial examples. Our goal is to raise awareness of potential data leakage problems and to promote robust and reproducible machine learning-based research in biology.
Collapse
Affiliation(s)
- Judith Bernett
- TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - David B Blumenthal
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
| | - Dominik G Grimm
- TUM Campus Straubing for Biotechnology and Sustainability, Technical University of Munich, Straubing, Germany.
- Bioinformatics, Weihenstephan-Triesdorf University of Applied Sciences, Straubing, Germany.
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
| | - Florian Haselbeck
- TUM Campus Straubing for Biotechnology and Sustainability, Technical University of Munich, Straubing, Germany
- Bioinformatics, Weihenstephan-Triesdorf University of Applied Sciences, Straubing, Germany
- Smart Farming, Weihenstephan-Triesdorf University of Applied Sciences, Freising, Germany
| | - Roman Joeres
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden
- Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
| | - Olga V Kalinina
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany.
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany.
- Medical Faculty, Saarland University, Homburg, Germany.
| | - Markus List
- TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
- Munich Data Science Institute (MDSI), Technical University of Munich, Garching, Germany.
| |
Collapse
|
2
|
Xu J, Gong J, Bo X, Tong Y, Ren Z, Ni M. A benchmark for evaluation of structure-based online tools for antibody-antigen binding affinity. Biophys Chem 2024; 311:107253. [PMID: 38768531 DOI: 10.1016/j.bpc.2024.107253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 04/08/2024] [Accepted: 04/28/2024] [Indexed: 05/22/2024]
Abstract
The prediction of binding affinity changes caused by missense mutations can elucidate antigen-antibody interactions. A few accessible structure-based online computational tools have been proposed. However, selecting suitable software for particular research is challenging, especially research on the SARS-CoV-2 spike protein with antibodies. Therefore, benchmarking of the mutation-diverse SARS-CoV-2 datasets is critical. Here, we collected the datasets including 1216 variants about the changes in binding affinity of antigens from 22 complexes for SARS-CoV-2 S proteins and 22 monoclonal antibodies as well as applied them to evaluate the performance of seven binding affinity prediction tools. The tested tools' Pearson correlations between predicted and measured changes in binding affinity were between -0.158 and 0.657, while accuracy in classification tasks on predicting increasing or decreasing affinity ranged from 0.444 to 0.834. These tools performed relatively better on predicting single mutations, especially at epitope sites, whereas poor performance on extremely decreasing affinity. The tested tools were relatively insensitive to the experimental techniques used to obtain structures of complexes. In summary, we constructed a list of datasets and evaluated a range of structure-based online prediction tools that will explicate relevant processes of antigen-antibody interactions and enhance the computational design of therapeutic monoclonal antibodies.
Collapse
Affiliation(s)
- Jiayi Xu
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
| | - Jianting Gong
- Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaochen Bo
- Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Yigang Tong
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China; Beijing Advanced Innovation Center for Soft Matter Science and Engineering, Beijing University of Chemical Technology, Beijing 100029, China.
| | - Zilin Ren
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China; Changchun Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Changchun 130122, China.
| | - Ming Ni
- Institute of Health Service and Transfusion Medicine, Beijing 100850, China.
| |
Collapse
|
3
|
Diaz DJ, Gong C, Ouyang-Zhang J, Loy JM, Wells J, Yang D, Ellington AD, Dimakis AG, Klivans AR. Stability Oracle: a structure-based graph-transformer framework for identifying stabilizing mutations. Nat Commun 2024; 15:6170. [PMID: 39043654 PMCID: PMC11266546 DOI: 10.1038/s41467-024-49780-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 06/14/2024] [Indexed: 07/25/2024] Open
Abstract
Engineering stabilized proteins is a fundamental challenge in the development of industrial and pharmaceutical biotechnologies. We present Stability Oracle: a structure-based graph-transformer framework that achieves SOTA performance on accurately identifying thermodynamically stabilizing mutations. Our framework introduces several innovations to overcome well-known challenges in data scarcity and bias, generalization, and computation time, such as: Thermodynamic Permutations for data augmentation, structural amino acid embeddings to model a mutation with a single structure, a protein structure-specific attention-bias mechanism that makes transformers a viable alternative to graph neural networks. We provide training/test splits that mitigate data leakage and ensure proper model evaluation. Furthermore, to examine our data engineering contributions, we fine-tune ESM2 representations (Prostata-IFML) and achieve SOTA for sequence-based models. Notably, Stability Oracle outperforms Prostata-IFML even though it was pretrained on 2000X less proteins and has 548X less parameters. Our framework establishes a path for fine-tuning structure-based transformers to virtually any phenotype, a necessary task for accelerating the development of protein-based biotechnologies.
Collapse
Affiliation(s)
- Daniel J Diaz
- UT Austin, Department of Computer Science, Austin, TX, 78712, USA.
- Intelligent Proteins, LLC, Austin, TX, 78712, USA.
- UT Austin, Department of Chemistry, Austin, TX, 78712, USA.
| | - Chengyue Gong
- UT Austin, Department of Computer Science, Austin, TX, 78712, USA
| | | | - James M Loy
- Intelligent Proteins, LLC, Austin, TX, 78712, USA
- UT Austin, Department of Molecular Biosciences, Austin, TX, 78712, USA
| | - Jordan Wells
- UT Austin, McKetta Department of Chemical Engineering, Austin, TX, 78712, USA
| | - David Yang
- UT Austin, Department of Molecular Biosciences, Austin, TX, 78712, USA
| | | | - Alexandros G Dimakis
- UT Austin, Chandra Family Department of Electrical and Computer Engineering, Austin, TX, 78712, USA
| | - Adam R Klivans
- UT Austin, Department of Computer Science, Austin, TX, 78712, USA
| |
Collapse
|
4
|
Cheng Z, Bi H, Liu S, Chen J, Misquitta AJ, Yu K. Developing a Differentiable Long-Range Force Field for Proteins with E(3) Neural Network-Predicted Asymptotic Parameters. J Chem Theory Comput 2024; 20:5598-5608. [PMID: 38888427 DOI: 10.1021/acs.jctc.4c00337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/20/2024]
Abstract
Accurately describing long-range interactions is a significant challenge in molecular dynamics (MD) simulations of proteins. High-quality long-range potential is also an important component of the range-separated machine learning force field. This study introduces a comprehensive asymptotic parameter database encompassing atomic multipole moments, polarizabilities, and dispersion coefficients. Leveraging active learning, our database comprehensively represents protein fragments with up to 8 heavy atoms, capturing their conformational diversity with merely 78,000 data points. Additionally, the E(3) neural network (E3NN) is employed to predict the asymptotic parameters directly from the local geometry. The E3NN models demonstrate exceptional accuracy and transferability across all asymptotic parameters, achieving an R2 of 0.999 for both protein fragments and 20 amino acid dipeptide test sets. The long-range electrostatic and dispersion energies can be obtained using the E3NN-predicted parameters, with an error of 0.07 and 0.02 kcal/mol, respectively, when compared to symmetry-adapted perturbation theory (SAPT). Therefore, our force fields demonstrate the capability to accurately describe long-range interactions in proteins, paving the way for next-generation protein force fields.
Collapse
Affiliation(s)
- Zheng Cheng
- School of Mathematical Sciences, Peking University, Beijing 100871, China
- AI for Science Institute, Beijing 100084, P. R. China
| | - Hangrui Bi
- School of Mathematical Sciences, Peking University, Beijing 100871, China
- DP Technology, Beijing 100080, P. R. China
| | - Siyuan Liu
- DP Technology, Beijing 100080, P. R. China
| | - Junmin Chen
- Tsinghua-Berkeley Shenzhen Institute, Shenzhen 518055, Guangdong, P. R. China
- Tsinghua Shenzhen International Graduate School, Shenzhen 518055, Guangdong, P. R. China
| | - Alston J Misquitta
- School of Physics and Astronomy, Queen Mary, University of London, London E1 4NS, U.K
| | - Kuang Yu
- Tsinghua-Berkeley Shenzhen Institute, Shenzhen 518055, Guangdong, P. R. China
- Tsinghua Shenzhen International Graduate School, Shenzhen 518055, Guangdong, P. R. China
| |
Collapse
|
5
|
Nguyen ATN, Nguyen DTN, Koh HY, Toskov J, MacLean W, Xu A, Zhang D, Webb GI, May LT, Halls ML. The application of artificial intelligence to accelerate G protein-coupled receptor drug discovery. Br J Pharmacol 2024; 181:2371-2384. [PMID: 37161878 DOI: 10.1111/bph.16140] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 04/14/2023] [Accepted: 04/27/2023] [Indexed: 05/11/2023] Open
Abstract
The application of artificial intelligence (AI) approaches to drug discovery for G protein-coupled receptors (GPCRs) is a rapidly expanding area. Artificial intelligence can be used at multiple stages during the drug discovery process, from aiding our understanding of the fundamental actions of GPCRs to the discovery of new ligand-GPCR interactions or the prediction of clinical responses. Here, we provide an overview of the concepts behind artificial intelligence, including the subfields of machine learning and deep learning. We summarise the published applications of artificial intelligence to different stages of the GPCR drug discovery process. Finally, we reflect on the benefits and limitations of artificial intelligence and share our vision for the exciting potential for further development of applications to aid GPCR drug discovery. In addition to making the drug discovery process "faster, smarter and cheaper," we anticipate that the application of artificial intelligence will create exciting new opportunities for GPCR drug discovery. LINKED ARTICLES: This article is part of a themed issue Therapeutic Targeting of G Protein-Coupled Receptors: hot topics from the Australasian Society of Clinical and Experimental Pharmacologists and Toxicologists 2021 Virtual Annual Scientific Meeting. To view the other articles in this section visit http://onlinelibrary.wiley.com/doi/10.1111/bph.v181.14/issuetoc.
Collapse
Affiliation(s)
- Anh T N Nguyen
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
| | - Diep T N Nguyen
- Department of Information Technology, Faculty of Engineering and Technology, Vietnam National University, Cau Giay, Hanoi, Vietnam
| | - Huan Yee Koh
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
- Monash Data Futures Institute and Department of Data Science and Artificial Intelligence, Monash University, Clayton, Victoria, Australia
| | - Jason Toskov
- Monash DeepNeuron, Monash University, Clayton, Victoria, Australia
| | - William MacLean
- Monash DeepNeuron, Monash University, Clayton, Victoria, Australia
| | - Andrew Xu
- Monash DeepNeuron, Monash University, Clayton, Victoria, Australia
| | - Daokun Zhang
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
- Monash Data Futures Institute and Department of Data Science and Artificial Intelligence, Monash University, Clayton, Victoria, Australia
| | - Geoffrey I Webb
- Monash Data Futures Institute and Department of Data Science and Artificial Intelligence, Monash University, Clayton, Victoria, Australia
| | - Lauren T May
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
| | - Michelle L Halls
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
| |
Collapse
|
6
|
Potera K, Tomala K. Using yeasts for the studies of nonfunctional factors in protein evolution. Yeast 2024. [PMID: 38895906 DOI: 10.1002/yea.3970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 05/08/2024] [Accepted: 06/06/2024] [Indexed: 06/21/2024] Open
Abstract
The evolution of protein sequence is driven not only by factors directly related to protein function and shape but also by nonfunctional factors. Such factors in protein evolution might be categorized as those connected to energetic costs, synthesis efficiency, and avoidance of misfolding and toxicity. A common approach to studying them is correlational analysis contrasting them with some characteristics of the protein, like amino acid composition, but these features are interdependent. To avoid possible bias, empirical studies are needed, and not enough work has been done to date. In this review, we describe the role of nonfunctional factors in protein evolution and present an experimental approach using yeast as a suitable model organism. The focus of the proposed approach is on the potential negative impact on the fitness of mutations that change protein properties not related to function and the frequency of mutations that change these properties. Experimental results of testing the misfolding avoidance hypothesis as an explanation for why highly expressed proteins evolve slowly are inconsistent with correlational research results. Therefore, more efforts should be made to empirically test the effects of nonfunctional factors in protein evolution and to contrast these results with the results of the correlational analysis approach.
Collapse
Affiliation(s)
- Katarzyna Potera
- Faculty of Biology, Institute of Environmental Sciences, Jagiellonian University, Krakow, Poland
- Doctoral School of Exact and Natural Sciences, Jagiellonian University, Krakow, Poland
| | - Katarzyna Tomala
- Faculty of Biology, Institute of Environmental Sciences, Jagiellonian University, Krakow, Poland
| |
Collapse
|
7
|
Sun X, Yang S, Wu Z, Su J, Hu F, Chang F, Li C. PMSPcnn: Predicting protein stability changes upon single point mutations with convolutional neural network. Structure 2024; 32:838-848.e3. [PMID: 38508191 DOI: 10.1016/j.str.2024.02.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 12/19/2023] [Accepted: 02/22/2024] [Indexed: 03/22/2024]
Abstract
Protein missense mutations and resulting protein stability changes are important causes for many human genetic diseases. However, the accurate prediction of stability changes due to mutations remains a challenging problem. To address this problem, we have developed an unbiased effective model: PMSPcnn that is based on a convolutional neural network. We have included an anti-symmetry property to build a balanced training dataset, which improves the prediction, in particular for stabilizing mutations. Persistent homology, which is an effective approach for characterizing protein structures, is used to obtain topological features. Additionally, a regression stratification cross-validation scheme has been proposed to improve the prediction for mutations with extreme ΔΔG. For three test datasets: Ssym, p53, and myoglobin, PMSPcnn achieves a better performance than currently existing predictors. PMSPcnn also outperforms currently available methods for membrane proteins. Overall, PMSPcnn is a promising method for the prediction of protein stability changes caused by single point mutations.
Collapse
Affiliation(s)
- Xiaohan Sun
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Shuang Yang
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Zhixiang Wu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Jingjie Su
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Fangrui Hu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Fubin Chang
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Chunhua Li
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
8
|
Qiu Y, Huang T, Cai YD. Review of predicting protein stability changes upon variations. Proteomics 2024; 24:e2300371. [PMID: 38643379 DOI: 10.1002/pmic.202300371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 04/07/2024] [Accepted: 04/08/2024] [Indexed: 04/22/2024]
Abstract
Forecasting alterations in protein stability caused by variations holds immense importance. Improving the thermal stability of proteins is important for biomedical and industrial applications. This review discusses the latest methods for predicting the effects of mutations on protein stability, databases containing protein mutations and thermodynamic parameters, and experimental techniques for efficiently assessing protein stability in high-throughput settings. Various publicly available databases for protein stability prediction are introduced. Furthermore, state-of-the-art computational approaches for anticipating protein stability changes due to variants are reviewed. Each method's types of features, base algorithm, and prediction results are also detailed. Additionally, some experimental approaches for verifying the prediction results of computational methods are introduced. Finally, the review summarizes the progress and challenges of protein stability prediction and discusses potential models for future research directions.
Collapse
Affiliation(s)
- Yiling Qiu
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- School of Mathematics and Statistics, Guangdong University of Technology, Guangzhou, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
9
|
Da Conceição LMA, Cabral LM, Pereira GRC, De Mesquita JF. An In Silico Analysis of Genetic Variants and Structural Modeling of the Human Frataxin Protein in Friedreich's Ataxia. Int J Mol Sci 2024; 25:5796. [PMID: 38891993 PMCID: PMC11172458 DOI: 10.3390/ijms25115796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 05/15/2024] [Accepted: 05/20/2024] [Indexed: 06/21/2024] Open
Abstract
Friedreich's Ataxia (FRDA) stands out as the most prevalent form of hereditary ataxias, marked by progressive movement ataxia, loss of vibratory sensitivity, and skeletal deformities, severely affecting daily functioning. To date, the only medication available for treating FRDA is Omaveloxolone (Skyclarys®), recently approved by the FDA. Missense mutations within the human frataxin (FXN) gene, responsible for intracellular iron homeostasis regulation, are linked to FRDA development. These mutations induce FXN dysfunction, fostering mitochondrial iron accumulation and heightened oxidative stress, ultimately triggering neuronal cell death pathways. This study amalgamated 226 FXN genetic variants from the literature and database searches, with only 18 previously characterized. Predictive analyses revealed a notable prevalence of detrimental and destabilizing predictions for FXN mutations, predominantly impacting conserved residues crucial for protein function. Additionally, an accurate, comprehensive three-dimensional model of human FXN was constructed, serving as the basis for generating genetic variants I154F and W155R. These variants, selected for their severe clinical implications, underwent molecular dynamics (MD) simulations, unveiling flexibility and essential dynamic alterations in their N-terminal segments, encompassing FXN42, FXN56, and FXN78 domains pivotal for protein maturation. Thus, our findings indicate potential interaction profile disturbances in the FXN42, FXN56, and FXN78 domains induced by I154F and W155R mutations, aligning with the existing literature.
Collapse
Affiliation(s)
- Loiane Mendonça Abrantes Da Conceição
- Laboratory of Bioinformatics and Computational Biology, Federal University of the State of Rio de Janeiro (UNIRIO), Avenida Pasteur, 296, Urca, Rio de Janeiro 22290-250, Brazil (J.F.D.M.)
| | - Lucio Mendes Cabral
- Pharmaceutical Industrial Technology Laboratory, Federal University of Rio de Janeiro (UFRJ), Avenida Carlos Chagas Filho, 373, Cidade Universitária, Rio de Janeiro 21941-590, Brazil
| | - Gabriel Rodrigues Coutinho Pereira
- Pharmaceutical Industrial Technology Laboratory, Federal University of Rio de Janeiro (UFRJ), Avenida Carlos Chagas Filho, 373, Cidade Universitária, Rio de Janeiro 21941-590, Brazil
- Laboratory of Molecular Modeling & QSAR, Federal University of Rio de Janeiro (UFRJ), Avenida Carlos Chagas Filho, 373, Cidade Universitária, Rio de Janeiro 21941-590, Brazil
| | - Joelma Freire De Mesquita
- Laboratory of Bioinformatics and Computational Biology, Federal University of the State of Rio de Janeiro (UNIRIO), Avenida Pasteur, 296, Urca, Rio de Janeiro 22290-250, Brazil (J.F.D.M.)
| |
Collapse
|
10
|
Zhang HL. Paradigm shifts of life science research in China: Challenges and coping strategies. Sci Bull (Beijing) 2024:S2095-9273(24)00359-1. [PMID: 38824121 DOI: 10.1016/j.scib.2024.05.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2024]
Affiliation(s)
- Hong-Liang Zhang
- Department of Life Sciences, National Natural Science Foundation of China, Beijing 100085, China.
| |
Collapse
|
11
|
Grønbæk-Thygesen M, Voutsinos V, Johansson KE, Schulze TK, Cagiada M, Pedersen L, Clausen L, Nariya S, Powell RL, Stein A, Fowler DM, Lindorff-Larsen K, Hartmann-Petersen R. Deep mutational scanning reveals a correlation between degradation and toxicity of thousands of aspartoacylase variants. Nat Commun 2024; 15:4026. [PMID: 38740822 DOI: 10.1038/s41467-024-48481-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 05/02/2024] [Indexed: 05/16/2024] Open
Abstract
Unstable proteins are prone to form non-native interactions with other proteins and thereby may become toxic. To mitigate this, destabilized proteins are targeted by the protein quality control network. Here we present systematic studies of the cytosolic aspartoacylase, ASPA, where variants are linked to Canavan disease, a lethal neurological disorder. We determine the abundance of 6152 of the 6260 ( ~ 98%) possible single amino acid substitutions and nonsense ASPA variants in human cells. Most low abundance variants are degraded through the ubiquitin-proteasome pathway and become toxic upon prolonged expression. The data correlates with predicted changes in thermodynamic stability, evolutionary conservation, and separate disease-linked variants from benign variants. Mapping of degradation signals (degrons) shows that these are often buried and the C-terminal region functions as a degron. The data can be used to interpret Canavan disease variants and provide insight into the relationship between protein stability, degradation and cell fitness.
Collapse
Affiliation(s)
- Martin Grønbæk-Thygesen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Vasileios Voutsinos
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Kristoffer E Johansson
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Thea K Schulze
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Matteo Cagiada
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Line Pedersen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Lene Clausen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Snehal Nariya
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Rachel L Powell
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Amelie Stein
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Department of Bioengineering, University of Washington, Seattle, WA, USA.
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | - Rasmus Hartmann-Petersen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
12
|
Korasick DA, Buckley DP, Palpacelli A, Cursio I, Cesaroni E, Cheng J, Tanner JJ. Biochemical, structural, and computational analyses of two new clinically identified missense mutations of ALDH7A1. Chem Biol Interact 2024; 394:110993. [PMID: 38604394 PMCID: PMC11073572 DOI: 10.1016/j.cbi.2024.110993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 03/30/2024] [Accepted: 04/04/2024] [Indexed: 04/13/2024]
Abstract
Aldehyde dehydrogenase 7A1 (ALDH7A1) catalyzes a step of lysine catabolism. Certain missense mutations in the ALDH7A1 gene cause pyridoxine dependent epilepsy (PDE), a rare autosomal neurometabolic disorder with recessive inheritance that affects almost 1:65,000 live births and is classically characterized by recurrent seizures from the neonatal period. We report a biochemical, structural, and computational study of two novel ALDH7A1 missense mutations that were identified in a child with rare recurrent seizures from the third month of life. The mutations affect two residues in the oligomer interfaces of ALDH7A1, Arg134 and Arg441 (Arg162 and Arg469 in the HGVS nomenclature). The corresponding enzyme variants R134S and R441C (p.Arg162Ser and p.Arg469Cys in the HGVS nomenclature) were expressed in Escherichia coli and purified. R134S and R441C have 10,000- and 50-fold lower catalytic efficiency than wild-type ALDH7A1, respectively. Sedimentation velocity analytical ultracentrifugation shows that R134S is defective in tetramerization, remaining locked in a dimeric state even in the presence of the tetramer-inducing coenzyme NAD+. Because the tetramer is the active form of ALDH7A1, the defect in oligomerization explains the very low catalytic activity of R134S. In contrast, R441C exhibits wild-type oligomerization behavior, and the 2.0 Å resolution crystal structure of R441C complexed with NAD+ revealed no obvious structural perturbations when compared to the wild-type enzyme structure. Molecular dynamics simulations suggest that the mutation of Arg441 to Cys may increase intersubunit ion pairs and alter the dynamics of the active site gate. Our biochemical, structural, and computational data on two novel clinical variants of ALDH7A1 add to the complexity of the molecular determinants underlying pyridoxine dependent epilepsy.
Collapse
Affiliation(s)
- David A Korasick
- Department of Biochemistry, University of Missouri, Columbia, MO, 65211, United States
| | - David P Buckley
- Department of Biochemistry, University of Missouri, Columbia, MO, 65211, United States
| | | | - Ida Cursio
- Child Neurology and Psychiatric Unit, Pediatric Hospital G. Salesi, United Hospitals of Marche, Ancona, Italy
| | - Elisabetta Cesaroni
- Child Neurology and Psychiatric Unit, Pediatric Hospital G. Salesi, United Hospitals of Marche, Ancona, Italy
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, United States
| | - John J Tanner
- Department of Biochemistry, University of Missouri, Columbia, MO, 65211, United States; Department of Chemistry, University of Missouri, Columbia, MO, 65211, United States.
| |
Collapse
|
13
|
da Silva ANR, Pereira GRC, Bonet LFS, Outeiro TF, De Mesquita JF. In silico analysis of alpha-synuclein protein variants and posttranslational modifications related to Parkinson's disease. J Cell Biochem 2024; 125:e30523. [PMID: 38239037 DOI: 10.1002/jcb.30523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 12/11/2023] [Accepted: 12/29/2023] [Indexed: 03/12/2024]
Abstract
Parkinson's disease (PD) is among the most prevalent neurodegenerative disorders, affecting over 10 million people worldwide. The protein encoded by the SNCA gene, alpha-synuclein (ASYN), is the major component of Lewy body (LB) aggregates, a histopathological hallmark of PD. Mutations and posttranslational modifications (PTMs) in ASYN are known to influence protein aggregation and LB formation, possibly playing a crucial role in PD pathogenesis. In this work, we applied computational methods to characterize the effects of missense mutations and PTMs on the structure and function of ASYN. Missense mutations in ASYN were compiled from the literature/databases and underwent a comprehensive predictive analysis. Phosphorylation and SUMOylation sites of ASYN were retrieved from databases and predicted by algorithms. ConSurf was used to estimate the evolutionary conservation of ASYN amino acids. Molecular dynamics (MD) simulations of ASYN wild-type and variants A30G, A30P, A53T, and G51D were performed using the GROMACS package. Seventy-seven missense mutations in ASYN were compiled. Although most mutations were not predicted to affect ASYN stability, aggregation propensity, amyloid formation, and chaperone binding, the analyzed mutations received relatively high rates of deleterious predictions and predominantly occurred at evolutionarily conserved sites within the protein. Moreover, our predictive analyses suggested that the following mutations may be possibly harmful to ASYN and, consequently, potential targets for future investigation: K6N, T22I, K34E, G36R, G36S, V37F, L38P, G41D, and K102E. The MD analyses pointed to remarkable flexibility and essential dynamics alterations at nearly all domains of the studied variants, which could lead to impaired contact between NAC and the C-terminal domain triggering protein aggregation. These alterations may have functional implications for ASYN and provide important insight into the molecular mechanism of PD, supporting the design of future biomedical research and improvements in existing therapies for the disease.
Collapse
Affiliation(s)
- Aloma N R da Silva
- Bioinformatics and Computational Biology Laboratory, Department of Genetics and Molecular Biology, Federal University of the State of Rio de Janeiro (UNIRIO), Rio de Janeiro, Rio de Janeiro, Brazil
| | - Gabriel R C Pereira
- Bioinformatics and Computational Biology Laboratory, Department of Genetics and Molecular Biology, Federal University of the State of Rio de Janeiro (UNIRIO), Rio de Janeiro, Rio de Janeiro, Brazil
| | - Luiz Felippe Sarmento Bonet
- Bioinformatics and Computational Biology Laboratory, Department of Genetics and Molecular Biology, Federal University of the State of Rio de Janeiro (UNIRIO), Rio de Janeiro, Rio de Janeiro, Brazil
| | - Tiago Fleming Outeiro
- Department of Experimental Neurodegeneration, Center for Biostructural Imaging of Neurodegeneration, University Medical Center Göttingen, Göttingen, Germany
- Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, UK
- Max Planck Institute for Experimental Medicine, Göttingen, Germany
| | - Joelma F De Mesquita
- Bioinformatics and Computational Biology Laboratory, Department of Genetics and Molecular Biology, Federal University of the State of Rio de Janeiro (UNIRIO), Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|
14
|
Jain S, Bakolitsa C, Brenner SE, Radivojac P, Moult J, Repo S, Hoskins RA, Andreoletti G, Barsky D, Chellapan A, Chu H, Dabbiru N, Kollipara NK, Ly M, Neumann AJ, Pal LR, Odell E, Pandey G, Peters-Petrulewicz RC, Srinivasan R, Yee SF, Yeleswarapu SJ, Zuhl M, Adebali O, Patra A, Beer MA, Hosur R, Peng J, Bernard BM, Berry M, Dong S, Boyle AP, Adhikari A, Chen J, Hu Z, Wang R, Wang Y, Miller M, Wang Y, Bromberg Y, Turina P, Capriotti E, Han JJ, Ozturk K, Carter H, Babbi G, Bovo S, Di Lena P, Martelli PL, Savojardo C, Casadio R, Cline MS, De Baets G, Bonache S, Díez O, Gutiérrez-Enríquez S, Fernández A, Montalban G, Ootes L, Özkan S, Padilla N, Riera C, De la Cruz X, Diekhans M, Huwe PJ, Wei Q, Xu Q, Dunbrack RL, Gotea V, Elnitski L, Margolin G, Fariselli P, Kulakovskiy IV, Makeev VJ, Penzar DD, Vorontsov IE, Favorov AV, Forman JR, Hasenahuer M, Fornasari MS, Parisi G, Avsec Z, Çelik MH, Nguyen TYD, Gagneur J, Shi FY, Edwards MD, Guo Y, Tian K, Zeng H, Gifford DK, Göke J, Zaucha J, Gough J, Ritchie GRS, Frankish A, Mudge JM, Harrow J, Young EL, Yu Y, Huff CD, Murakami K, Nagai Y, Imanishi T, Mungall CJ, Jacobsen JOB, Kim D, Jeong CS, Jones DT, Li MJ, Guthrie VB, Bhattacharya R, Chen YC, Douville C, Fan J, Kim D, Masica D, Niknafs N, Sengupta S, Tokheim C, Turner TN, Yeo HTG, Karchin R, Shin S, Welch R, Keles S, Li Y, Kellis M, Corbi-Verge C, Strokach AV, Kim PM, Klein TE, Mohan R, Sinnott-Armstrong NA, Wainberg M, Kundaje A, Gonzaludo N, Mak ACY, Chhibber A, Lam HYK, Dahary D, Fishilevich S, Lancet D, Lee I, Bachman B, Katsonis P, Lua RC, Wilson SJ, Lichtarge O, Bhat RR, Sundaram L, Viswanath V, Bellazzi R, Nicora G, Rizzo E, Limongelli I, Mezlini AM, Chang R, Kim S, Lai C, O’Connor R, Topper S, van den Akker J, Zhou AY, Zimmer AD, Mishne G, Bergquist TR, Breese MR, Guerrero RF, Jiang Y, Kiga N, Li B, Mort M, Pagel KA, Pejaver V, Stamboulian MH, Thusberg J, Mooney SD, Teerakulkittipong N, Cao C, Kundu K, Yin Y, Yu CH, Kleyman M, Lin CF, Stackpole M, Mount SM, Eraslan G, Mueller NS, Naito T, Rao AR, Azaria JR, Brodie A, Ofran Y, Garg A, Pal D, Hawkins-Hooker A, Kenlay H, Reid J, Mucaki EJ, Rogan PK, Schwarz JM, Searls DB, Lee GR, Seok C, Krämer A, Shah S, Huang CV, Kirsch JF, Shatsky M, Cao Y, Chen H, Karimi M, Moronfoye O, Sun Y, Shen Y, Shigeta R, Ford CT, Nodzak C, Uppal A, Shi X, Joseph T, Kotte S, Rana S, Rao A, Saipradeep VG, Sivadasan N, Sunderam U, Stanke M, Su A, Adzhubey I, Jordan DM, Sunyaev S, Rousseau F, Schymkowitz J, Van Durme J, Tavtigian SV, Carraro M, Giollo M, Tosatto SCE, Adato O, Carmel L, Cohen NE, Fenesh T, Holtzer T, Juven-Gershon T, Unger R, Niroula A, Olatubosun A, Väliaho J, Yang Y, Vihinen M, Wahl ME, Chang B, Chong KC, Hu I, Sun R, Wu WKK, Xia X, Zee BC, Wang MH, Wang M, Wu C, Lu Y, Chen K, Yang Y, Yates CM, Kreimer A, Yan Z, Yosef N, Zhao H, Wei Z, Yao Z, Zhou F, Folkman L, Zhou Y, Daneshjou R, Altman RB, Inoue F, Ahituv N, Arkin AP, Lovisa F, Bonvini P, Bowdin S, Gianni S, Mantuano E, Minicozzi V, Novak L, Pasquo A, Pastore A, Petrosino M, Puglisi R, Toto A, Veneziano L, Chiaraluce R, Ball MP, Bobe JR, Church GM, Consalvi V, Cooper DN, Buckley BA, Sheridan MB, Cutting GR, Scaini MC, Cygan KJ, Fredericks AM, Glidden DT, Neil C, Rhine CL, Fairbrother WG, Alontaga AY, Fenton AW, Matreyek KA, Starita LM, Fowler DM, Löscher BS, Franke A, Adamson SI, Graveley BR, Gray JW, Malloy MJ, Kane JP, Kousi M, Katsanis N, Schubach M, Kircher M, Mak ACY, Tang PLF, Kwok PY, Lathrop RH, Clark WT, Yu GK, LeBowitz JH, Benedicenti F, Bettella E, Bigoni S, Cesca F, Mammi I, Marino-Buslje C, Milani D, Peron A, Polli R, Sartori S, Stanzial F, Toldo I, Turolla L, Aspromonte MC, Bellini M, Leonardi E, Liu X, Marshall C, McCombie WR, Elefanti L, Menin C, Meyn MS, Murgia A, Nadeau KCY, Neuhausen SL, Nussbaum RL, Pirooznia M, Potash JB, Dimster-Denk DF, Rine JD, Sanford JR, Snyder M, Cote AG, Sun S, Verby MW, Weile J, Roth FP, Tewhey R, Sabeti PC, Campagna J, Refaat MM, Wojciak J, Grubb S, Schmitt N, Shendure J, Spurdle AB, Stavropoulos DJ, Walton NA, Zandi PP, Ziv E, Burke W, Chen F, Carr LR, Martinez S, Paik J, Harris-Wai J, Yarborough M, Fullerton SM, Koenig BA, McInnes G, Shigaki D, Chandonia JM, Furutsuki M, Kasak L, Yu C, Chen R, Friedberg I, Getz GA, Cong Q, Kinch LN, Zhang J, Grishin NV, Voskanian A, Kann MG, Tran E, Ioannidis NM, Hunter JM, Udani R, Cai B, Morgan AA, Sokolov A, Stuart JM, Minervini G, Monzon AM, Batzoglou S, Butte AJ, Greenblatt MS, Hart RK, Hernandez R, Hubbard TJP, Kahn S, O’Donnell-Luria A, Ng PC, Shon J, Veltman J, Zook JM. CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods. Genome Biol 2024; 25:53. [PMID: 38389099 PMCID: PMC10882881 DOI: 10.1186/s13059-023-03113-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2023] [Accepted: 11/17/2023] [Indexed: 02/24/2024] Open
Abstract
BACKGROUND The Critical Assessment of Genome Interpretation (CAGI) aims to advance the state-of-the-art for computational prediction of genetic variant impact, particularly where relevant to disease. The five complete editions of the CAGI community experiment comprised 50 challenges, in which participants made blind predictions of phenotypes from genetic data, and these were evaluated by independent assessors. RESULTS Performance was particularly strong for clinical pathogenic variants, including some difficult-to-diagnose cases, and extends to interpretation of cancer-related variants. Missense variant interpretation methods were able to estimate biochemical effects with increasing accuracy. Assessment of methods for regulatory variants and complex trait disease risk was less definitive and indicates performance potentially suitable for auxiliary use in the clinic. CONCLUSIONS Results show that while current methods are imperfect, they have major utility for research and clinical applications. Emerging methods and increasingly large, robust datasets for training and assessment promise further progress ahead.
Collapse
|
15
|
Ruf M, Cunningham S, Wandersee A, Brox R, Achenbach S, Strobel J, Hackstein H, Schneider S. SERPINC1 c.1247dupC: a novel SERPINC1 gene mutation associated with familial thrombosis results in a secretion defect and quantitative antithrombin deficiency. Thromb J 2024; 22:19. [PMID: 38347553 PMCID: PMC10860291 DOI: 10.1186/s12959-024-00589-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 02/01/2024] [Indexed: 02/16/2024] Open
Abstract
BACKGROUND Antithrombin (AT) is an important anticoagulant in hemostasis. We describe here the characterization of a novel AT mutation associated with clinically relevant thrombosis. A pair of sisters with confirmed type I AT protein deficiency was genetically analyzed on suspicion of an inherited SERPINC1 mutation. A frameshift mutation, c.1247dupC, was identified and the effect of this mutation was examined on the cellular and molecular level. METHODS Plasmids for the expression of wild-type (WT) and mutated SERPINC1 coding sequence (CDS) fused to green fluorescent protein (GFP) or hemagglutinin (HA) tag were transfected into HEK293T cells. Subcellular localization and secretion of the respective fusion proteins were analyzed by confocal laser scanning microscopy and Western blot. RESULTS The c.1247dupC mutation results in a frameshift in the CDS of the SERPINC1 gene and a subsequently altered amino acid sequence (p.Ser417LysfsTer48). This alteration affects the C-terminus of the AT antigen and results in impaired secretion as confirmed by GFP- and HA-tagged mutant AT analyzed in HEK293T cells. CONCLUSION The p.Ser417LysfsTer48 mutation leads to impaired secretion, thus resulting in a quantitative AT deficiency. This is in line with the type I AT deficiency observed in the patients.
Collapse
Affiliation(s)
- Maximilian Ruf
- Department of Transfusion Medicine and Hemostaseology, Friedrich-Alexander-University Erlangen-Nürnberg (FAU), University Hospital Erlangen, Krankenhausstr. 12, 91054, Erlangen, Germany
| | - Sarah Cunningham
- Department of Transfusion Medicine and Hemostaseology, Friedrich-Alexander-University Erlangen-Nürnberg (FAU), University Hospital Erlangen, Krankenhausstr. 12, 91054, Erlangen, Germany
| | - Alexandra Wandersee
- Department of Transfusion Medicine and Hemostaseology, Friedrich-Alexander-University Erlangen-Nürnberg (FAU), University Hospital Erlangen, Krankenhausstr. 12, 91054, Erlangen, Germany
| | - Regine Brox
- Department of Transfusion Medicine and Hemostaseology, Friedrich-Alexander-University Erlangen-Nürnberg (FAU), University Hospital Erlangen, Krankenhausstr. 12, 91054, Erlangen, Germany
| | - Susanne Achenbach
- Department of Transfusion Medicine and Hemostaseology, Friedrich-Alexander-University Erlangen-Nürnberg (FAU), University Hospital Erlangen, Krankenhausstr. 12, 91054, Erlangen, Germany
| | - Julian Strobel
- Department of Transfusion Medicine and Hemostaseology, Friedrich-Alexander-University Erlangen-Nürnberg (FAU), University Hospital Erlangen, Krankenhausstr. 12, 91054, Erlangen, Germany
| | - Holger Hackstein
- Department of Transfusion Medicine and Hemostaseology, Friedrich-Alexander-University Erlangen-Nürnberg (FAU), University Hospital Erlangen, Krankenhausstr. 12, 91054, Erlangen, Germany
| | - Sabine Schneider
- Department of Transfusion Medicine and Hemostaseology, Friedrich-Alexander-University Erlangen-Nürnberg (FAU), University Hospital Erlangen, Krankenhausstr. 12, 91054, Erlangen, Germany.
| |
Collapse
|
16
|
Liu B, Jiang Y, Yang Y, Chen JX. OmeDDG: Improved Protein Mutation Stability Prediction Based on Predicted 3D Structures. J Phys Chem B 2024; 128:67-76. [PMID: 38130113 DOI: 10.1021/acs.jpcb.3c05601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Determining changes in the protein's thermal stability following mutations is critical in protein engineering and understanding pathogenic missense mutations. Despite the development of various computational methods to predict the effects of single-point mutations, their accuracy remains limited. In this study, we propose a new computational method, OmeDDG, that more accurately predicts mutation-induced Gibbs free energy changes in protein folding (ΔΔG). OmeDDG takes the sequences of wild-type and mutant proteins as input, utilizes OmegaFold to obtain the 3D structure, employs a convolutional neural network to extract structural features, and combines them with protein mutation features and pretraining features to predict the stability of single-point mutations in proteins. We performed a comprehensive comparison between OmeDDG and other available prediction methods on four blind test datasets, confirming that OmeDDG can effectively enhance protein mutation prediction performance. Notably, on the antisymmetric dataset Ssym, OmeDDG achieves the best performance, demonstrating favorable antisymmetry with PCC = 0.79 and RMSE = 0.96 for forward mutations and PCC = 0.77 and RMSE = 0.97 for reverse mutant types.
Collapse
Affiliation(s)
- Baoying Liu
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu 611756, Sichuan, China
| | - Yongquan Jiang
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu 611756, Sichuan, China
- Artificial Intelligence Research Institute, Southwest Jiaotong University, Chengdu 611756, Sichuan, China
| | - Yan Yang
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu 611756, Sichuan, China
- Artificial Intelligence Research Institute, Southwest Jiaotong University, Chengdu 611756, Sichuan, China
| | - Jim X Chen
- Department of Computer Science, George Mason University, Fairfax, Virginia 22030-4444, United States
| |
Collapse
|
17
|
Zheng F, Liu Y, Yang Y, Wen Y, Li M. Assessing computational tools for predicting protein stability changes upon missense mutations using a new dataset. Protein Sci 2024; 33:e4861. [PMID: 38084013 PMCID: PMC10751734 DOI: 10.1002/pro.4861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 11/14/2023] [Accepted: 12/06/2023] [Indexed: 12/28/2023]
Abstract
Insight into how mutations affect protein stability is crucial for protein engineering, understanding genetic diseases, and exploring protein evolution. Numerous computational methods have been developed to predict the impact of amino acid substitutions on protein stability. Nevertheless, comparing these methods poses challenges due to variations in their training data. Moreover, it is observed that they tend to perform better at predicting destabilizing mutations than stabilizing ones. Here, we meticulously compiled a new dataset from three recently published databases: ThermoMutDB, FireProtDB, and ProThermDB. This dataset, which does not overlap with the well-established S2648 dataset, consists of 4038 single-point mutations, including over 1000 stabilizing mutations. We assessed these mutations using 27 computational methods, including the latest ones utilizing mega-scale stability datasets and transfer learning. We excluded entries with overlap or similarity to training datasets to ensure fairness. Pearson correlation coefficients for the tested tools ranged from 0.20 to 0.53 on unseen data, and none of the methods could accurately predict stabilizing mutations, even those performing well in anti-symmetric property analysis. While most methods present consistent trends for predicting destabilizing mutations across various properties such as solvent exposure and secondary conformation, stabilizing mutations do not exhibit a clear pattern. Our study also suggests that solely addressing training dataset bias may not significantly enhance accuracy of predicting stabilizing mutations. These findings emphasize the importance of developing precise predictive methods for stabilizing mutations.
Collapse
Affiliation(s)
- Feifan Zheng
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yang Liu
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yan Yang
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yuhao Wen
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Minghui Li
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| |
Collapse
|
18
|
Wang S, Tang H, Shan P, Wu Z, Zuo L. ProS-GNN: Predicting effects of mutations on protein stability using graph neural networks. Comput Biol Chem 2023; 107:107952. [PMID: 37643501 DOI: 10.1016/j.compbiolchem.2023.107952] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 08/18/2023] [Accepted: 08/25/2023] [Indexed: 08/31/2023]
Abstract
Predicting protein stability change upon variation through a computational approach is a valuable tool to unveil the mechanisms of mutation-induced drug failure and develop immunotherapy strategies. Some previous machine learning-based techniques exhibit anti-symmetric bias toward destabilizing situations, whereas others struggle with generalization to unseen examples. To address these issues, we propose a gated graph neural network-based approach to predict changes in protein stability upon mutation. The model uses message passing to encode the links between the molecular structure and property after eliminating the non-mutant structure and creating input feature vectors. While doing so, it also incorporates the coordinates of the raw atoms to provide spatial insights into the chemical systems. We test the model on the Ssym, Myoglobin, Broom, and p53 datasets to demonstrate the generalization performance. Compared to existing approaches, our proposed method achieves improved linearity with symmetry in less time. The code for this study is available at: https://github.com/HongzhouTang/Pros-GNN.
Collapse
Affiliation(s)
- Shuyu Wang
- Department of Control Engineering, Northeastern University, Qinhuangdao Campus, Qinhuangdao 066001, China.
| | - Hongzhou Tang
- Department of Control Engineering, Northeastern University, Qinhuangdao Campus, Qinhuangdao 066001, China
| | - Peng Shan
- Department of Control Engineering, Northeastern University, Qinhuangdao Campus, Qinhuangdao 066001, China
| | - Zhaoxia Wu
- Department of Control Engineering, Northeastern University, Qinhuangdao Campus, Qinhuangdao 066001, China
| | - Lei Zuo
- Department of Marine Engineering, University of Michigan, Ann Arbor 48109, USA
| |
Collapse
|
19
|
Skiadopoulou D, Vašíček J, Kuznetsova K, Bouyssié D, Käll L, Vaudel M. Retention Time and Fragmentation Predictors Increase Confidence in Identification of Common Variant Peptides. J Proteome Res 2023; 22:3190-3199. [PMID: 37656829 PMCID: PMC10563157 DOI: 10.1021/acs.jproteome.3c00243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Indexed: 09/03/2023]
Abstract
Precision medicine focuses on adapting care to the individual profile of patients, for example, accounting for their unique genetic makeup. Being able to account for the effect of genetic variation on the proteome holds great promise toward this goal. However, identifying the protein products of genetic variation using mass spectrometry has proven very challenging. Here we show that the identification of variant peptides can be improved by the integration of retention time and fragmentation predictors into a unified proteogenomic pipeline. By combining these intrinsic peptide characteristics using the search-engine post-processor Percolator, we demonstrate improved discrimination power between correct and incorrect peptide-spectrum matches. Our results demonstrate that the drop in performance that is induced when expanding a protein sequence database can be compensated, hence enabling efficient identification of genetic variation products in proteomics data. We anticipate that this enhancement of proteogenomic pipelines can provide a more refined picture of the unique proteome of patients and thereby contribute to improving patient care.
Collapse
Affiliation(s)
- Dafni Skiadopoulou
- Mohn
Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, NO-5020 Bergen, Norway
- Computational
Biology Unit, Department of Informatics, University of Bergen, NO-5020 Bergen, Norway
| | - Jakub Vašíček
- Mohn
Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, NO-5020 Bergen, Norway
- Computational
Biology Unit, Department of Informatics, University of Bergen, NO-5020 Bergen, Norway
| | - Ksenia Kuznetsova
- Mohn
Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, NO-5020 Bergen, Norway
- Computational
Biology Unit, Department of Informatics, University of Bergen, NO-5020 Bergen, Norway
| | - David Bouyssié
- Institut
de Pharmacologie et de Biologie Structurale (IPBS), Université
de Toulouse, CNRS, Université Toulouse III—Paul Sabatier
(UT3), 31000 Toulouse, France
| | - Lukas Käll
- Science
for Life Laboratory, School of Engineering Sciences in Chemistry,
Biotechnology and Health, KTH Royal Institute
of Technology, SE-100 44 Stockholm, Sweden
| | - Marc Vaudel
- Mohn
Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, NO-5020 Bergen, Norway
- Computational
Biology Unit, Department of Informatics, University of Bergen, NO-5020 Bergen, Norway
- Department
of Genetics and Bioinformatics, Health Data and Digitalization, Norwegian Institute of Public Health, N-0213 Oslo, Norway
| |
Collapse
|
20
|
Bhonde SB, Wagh SK, Prasad JR. Identification of cancer types from gene expressions using learning techniques. Comput Methods Biomech Biomed Engin 2023; 26:1951-1965. [PMID: 36562388 DOI: 10.1080/10255842.2022.2160243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 10/15/2022] [Accepted: 11/15/2022] [Indexed: 12/24/2022]
Abstract
Tumor is the major cause of death all around the world in recent days. Early detection and prediction of a cancer type are important for a patient's well-being. Functional genomic data has recently been used in the effective and early detection of cancer. According to previous research, the use of microarray data in cancer prediction has evidenced two main problems as high dimensionality and limited sample size. Several researchers have used numerous statistical and machine learning-based methods to classify cancer types but still, limitations are there which makes cancer classification a difficult job. Deep Learning (DL) and Convolutional Neural Networks (CNN) have been proven with effective analyses of unstructured data including gene expression data. In the proposed method gene expression data for five types of cancer is collected from The Cancer Genome Atlas (TCGA). Prominent features are selected using a hybrid Particle Swarm Optimization (PSO) and Random Forest (RF) algorithm followed by the use of Principal Component Analysis (PCA) for dimensionality reduction. Finally, for classification blend of Convolutional Neural Network (CNN) and Bi-directional Long Short Term Memory (Bi-LSTM) is used to predict the target type of cancer. Experimental results demonstrate that accuracy of the proposed method is 96.89%. As compared to existing work, our method outperformed with better results.
Collapse
Affiliation(s)
- Swati B Bhonde
- Smt. Kashibai Navale College of Engineering, Pune, India
| | | | | |
Collapse
|
21
|
Berber I, Erten C, Kazan H. Predator: Predicting the Impact of Cancer Somatic Mutations on Protein-Protein Interactions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3163-3172. [PMID: 37030791 DOI: 10.1109/tcbb.2023.3262119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Since many biological processes are governed by protein-protein interactions, understanding which mutations lead to a disruption in these interactions is profoundly important for cancer research. Most of the existing methods focus on the stability of the protein without considering the specific effects of a mutation on its interactions with other proteins. Here, we focus on somatic mutations that appear on the interface regions of the protein and predict the interactions that would be affected by a mutation of interest. We build an ensemble model, Predator, that classifies the interface mutations as disruptive or nondisruptive based on the predicted effects of mutations on specific protein-protein interactions. We show that Predator outperforms existing approaches in literature in terms of prediction accuracy. We then apply Predator on various TCGA cancer cohorts and perform comprehensive analysis at cohort level, patient level, and gene level in determining the genes whose interface mutations tend to yield a disruption in its interactions. The predictions obtained by Predator shed light on interesting patterns on several genes for each cohort regarding their potential as cancer drivers. Our analyses further reveal that the identified genes and their frequently disrupted partners exhibit patterns of mutually exclusivity across cancer cohorts under study.
Collapse
|
22
|
Gerasimavicius L, Livesey BJ, Marsh JA. Correspondence between functional scores from deep mutational scans and predicted effects on protein stability. Protein Sci 2023; 32:e4688. [PMID: 37243972 PMCID: PMC10273344 DOI: 10.1002/pro.4688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 04/19/2023] [Accepted: 05/24/2023] [Indexed: 05/29/2023]
Abstract
Many methodologically diverse computational methods have been applied to the growing challenge of predicting and interpreting the effects of protein variants. As many pathogenic mutations have a perturbing effect on protein stability or intermolecular interactions, one highly interpretable approach is to use protein structural information to model the physical impacts of variants and predict their likely effects on protein stability and interactions. Previous efforts have assessed the accuracy of stability predictors in reproducing thermodynamically accurate values and evaluated their ability to distinguish between known pathogenic and benign mutations. Here, we take an alternate approach, and explore how well stability predictor scores correlate with functional impacts derived from deep mutational scanning (DMS) experiments. In this work, we compare the predictions of 9 protein stability-based tools against mutant protein fitness values from 49 independent DMS datasets, covering 170,940 unique single amino acid variants. We find that FoldX and Rosetta show the strongest correlations with DMS-based functional scores, similar to their previous top performance in distinguishing between pathogenic and benign variants. For both methods, performance is considerably improved when considering intermolecular interactions from protein complex structures, when available. Furthermore, using these two predictors, we derive a "Foldetta" consensus score, which improves upon the performance of both, and manages to match dedicated variant effect predictors in reflecting variant functional impacts. Finally, we also highlight that predicted stability effects show consistently higher correlations with certain DMS experimental phenotypes, particularly those based upon protein abundance, and, in certain cases, can significantly outcompete sequence-based variant effect prediction methodologies for predicting functional scores from DMS experiments.
Collapse
Affiliation(s)
- Lukas Gerasimavicius
- MRC Human Genetics Unit, Institute of Genetics & CancerUniversity of EdinburghEdinburghUK
| | - Benjamin J. Livesey
- MRC Human Genetics Unit, Institute of Genetics & CancerUniversity of EdinburghEdinburghUK
| | - Joseph A. Marsh
- MRC Human Genetics Unit, Institute of Genetics & CancerUniversity of EdinburghEdinburghUK
| |
Collapse
|
23
|
Rakib TM, Islam MS, Uddin MM, Rahman MM, Yabuki A, Yamagami T, Morozumi M, Uchida K, Maki S, Faruq AA, Yamato O. Novel Mutation in the Feline NPC2 Gene in Cats with Niemann-Pick Disease. Animals (Basel) 2023; 13:1744. [PMID: 37458497 PMCID: PMC10252137 DOI: 10.3390/ani13111744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 05/17/2023] [Accepted: 05/22/2023] [Indexed: 07/20/2023] Open
Abstract
Niemann-Pick disease (NP) type C is an autosomal, recessive, and inherited neurovisceral genetic disorder characterized by the accumulation of unesterified cholesterol and glycolipids in cellular lysosomes and late endosomes, with a wide spectrum of clinical phenotypes. This study aimed to determine the molecular genetic alterations in two cases of felines with NP in Japan, a Siamese cat in 1989 and a Japanese domestic (JD) cat in 1998. Sanger sequencing was performed on 25 exons of the feline NPC1 gene and 4 exons of the feline NPC2 gene, using genomic DNA extracted from paraffin-embedded tissue specimens. The sequenced exons were compared with reference sequences retrieved from the GenBank database. The identified mutations and alterations were then analyzed using different prediction algorithms. No pathogenic mutations were found in feline NPC1; however, c.376G>A (p.V126M) was identified as a pathogenic mutation in the NPC2 gene. The Siamese cat was found to be homozygous for this mutation. The JD cat was heterozygous for the same mutation, but no other exonic NPC2 mutation was found. Furthermore, the JD cat had a homozygous splice variant (c.364-4C>T) in the NPC2 gene, which is not known to be associated with this disease. The NPC2:c.376G>A (p.V126M) mutation is the second reported pathogenic mutation in the feline NPC2 gene that may be present in the Japanese cat population.
Collapse
Affiliation(s)
- Tofazzal Md Rakib
- Laboratory of Clinical Pathology, Joint Faculty of Veterinary Medicine, Kagoshima University, Korimoto, Kagoshima 890-0065, Japan; (T.M.R.); (M.S.I.); (M.M.U.); (M.M.R.); (A.Y.); (S.M.); (A.A.F.)
- Faculty of Veterinary Medicine, Chattogram Veterinary and Animal Sciences University, Khulshi, Chattogram 4225, Bangladesh
| | - Md Shafiqul Islam
- Laboratory of Clinical Pathology, Joint Faculty of Veterinary Medicine, Kagoshima University, Korimoto, Kagoshima 890-0065, Japan; (T.M.R.); (M.S.I.); (M.M.U.); (M.M.R.); (A.Y.); (S.M.); (A.A.F.)
- Faculty of Veterinary Medicine, Chattogram Veterinary and Animal Sciences University, Khulshi, Chattogram 4225, Bangladesh
| | - Mohammad Mejbah Uddin
- Laboratory of Clinical Pathology, Joint Faculty of Veterinary Medicine, Kagoshima University, Korimoto, Kagoshima 890-0065, Japan; (T.M.R.); (M.S.I.); (M.M.U.); (M.M.R.); (A.Y.); (S.M.); (A.A.F.)
- Faculty of Veterinary Medicine, Chattogram Veterinary and Animal Sciences University, Khulshi, Chattogram 4225, Bangladesh
| | - Mohammad Mahbubur Rahman
- Laboratory of Clinical Pathology, Joint Faculty of Veterinary Medicine, Kagoshima University, Korimoto, Kagoshima 890-0065, Japan; (T.M.R.); (M.S.I.); (M.M.U.); (M.M.R.); (A.Y.); (S.M.); (A.A.F.)
- Faculty of Veterinary Medicine, Chattogram Veterinary and Animal Sciences University, Khulshi, Chattogram 4225, Bangladesh
| | - Akira Yabuki
- Laboratory of Clinical Pathology, Joint Faculty of Veterinary Medicine, Kagoshima University, Korimoto, Kagoshima 890-0065, Japan; (T.M.R.); (M.S.I.); (M.M.U.); (M.M.R.); (A.Y.); (S.M.); (A.A.F.)
| | - Tetsushi Yamagami
- Japan Small Animal Medical Center, Saitama, Tokorozawa 359-0023, Japan;
| | | | - Kazuyuki Uchida
- Laboratory of Veterinary Pathology, Graduate School of Agricultural and Life Sciences, University of Tokyo, Bunkyō, Tokyo 113-8657, Japan;
| | - Shinichiro Maki
- Laboratory of Clinical Pathology, Joint Faculty of Veterinary Medicine, Kagoshima University, Korimoto, Kagoshima 890-0065, Japan; (T.M.R.); (M.S.I.); (M.M.U.); (M.M.R.); (A.Y.); (S.M.); (A.A.F.)
| | - Abdullah Al Faruq
- Laboratory of Clinical Pathology, Joint Faculty of Veterinary Medicine, Kagoshima University, Korimoto, Kagoshima 890-0065, Japan; (T.M.R.); (M.S.I.); (M.M.U.); (M.M.R.); (A.Y.); (S.M.); (A.A.F.)
- Faculty of Veterinary Medicine, Chattogram Veterinary and Animal Sciences University, Khulshi, Chattogram 4225, Bangladesh
| | - Osamu Yamato
- Laboratory of Clinical Pathology, Joint Faculty of Veterinary Medicine, Kagoshima University, Korimoto, Kagoshima 890-0065, Japan; (T.M.R.); (M.S.I.); (M.M.U.); (M.M.R.); (A.Y.); (S.M.); (A.A.F.)
| |
Collapse
|
24
|
Dristy TT, Noor AR, Dey P, Saha A. Structural analysis and conformational dynamics of SOCS1 gene mutations involved in diffuse large B-cell lymphoma. Gene 2023; 864:147293. [PMID: 36813059 DOI: 10.1016/j.gene.2023.147293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 01/28/2023] [Accepted: 02/15/2023] [Indexed: 02/22/2023]
Abstract
OBJECTIVES The SOCS1 gene is frequently mutated in primary Diffuse Large B-Cell Lymphoma (DLBCL) patients and is associated with a reduced survival rate. Using various computational techniques, the current study aims to identify Single Nucleotide Polymorphisms (SNPs) in the SOCS1 gene that are associated with the mortality rate of DLBCL patients. This study also evaluates the effects of SNPs on the structural instability of the SOCS1 protein in DLBCL patient. METHODS The cBioPortal webserver was used for mutations and determining how the SNP mutations affect the SOCS1 protein with various algorithms (PolyPhen-2.0, Provean, PhD-SNPg, SNPs&GO, SIFT, FATHMM, Predict SNP and SNAP). Five webservers (I-Mutant 2.0, MUpro, mCSM, DUET and SDM) were used for protein instability and the conserved status and were also predicted through different tools (ConSurf, Expasy, SOMPA). Lastly, MD simulations were run on the two chosen mutations (S116N and V128G) using GROMACS 5.0.1 to study how the mutations change the structure of SOCS1. RESULTS Among the 93 SOCS1 mutations detected in DLBCL patients, nine mutations were found to have a detrimental effect (damaging/deleterious/pathogenic/altered) on the SOCS1 protein. All the nine selected mutations are in the conserved region and four are on the extended strand site, four on the random coil site and one on the alpha helix position of the secondary protein structure. After anticipating the structural effects of these nine mutations, two were chosen (S116N and V128G) based on mutational frequency, location within the protein, structural effect (primary, secondary and tertiary) on stability and conservation status within the SOCS1 protein. The simulation of a 50 ns time interval revealed that the Rg value of S116N (2.17 nm) is higher than that of WT (1.98 nm), indicating a loss of structural compactness. In the case of the RMSD value, this mutated type (V128G) shows more deviation (1.54 nm) in comparison to the wild-type (2.14 nm) and another mutant type (S116N) (2.12 nm). The average RMSF values of wild-type and mutant types (V128G and S116N) were 0.88 nm, 0.49 nm, and 0.93 nm, respectively. The RMSF result shows that the mutant V128G structure is more stable than the wild-type and mutant S116N structures. CONCLUSION Based on all these computational predictions, this study finds that certain mutations, particularly S116N, have a destabilising and robust effect on the SOCS1 protein. These results can be used to learn more about the importance of SOCS1 mutations in DLBCL patients and to develop new ways to treat DLBCL.
Collapse
Affiliation(s)
- Tamanna Tasnim Dristy
- Department of Genetic Engineering and Biotechnology, East West University (EWU), Bangladesh
| | - Al-Rownoka Noor
- Department of Genetic Engineering and Biotechnology, East West University (EWU), Bangladesh
| | - Puja Dey
- Faculty of Medicine, Shimane University, Japan
| | - Ayan Saha
- Department of Bioinformatics and Biotechnology, Asian University for Women, Bangladesh.
| |
Collapse
|
25
|
Feng R, Yin Y, Wei Y, Li Y, Li L, Zhu R, Yu X, Liu Y, Zhao Y, Liu Z. Mutant p53 activates hnRNPA2B1-AGAP1-mediated exosome formation to promote esophageal squamous cell carcinoma progression. Cancer Lett 2023; 562:216154. [PMID: 37030635 DOI: 10.1016/j.canlet.2023.216154] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 03/28/2023] [Accepted: 03/28/2023] [Indexed: 04/10/2023]
Abstract
p53 mutations predispose cancer cell development, promote their survival and metastasis, and lead to ineffective therapeutic responses and unfavorable prognosis. No drug that abrogates the oncogenic functions of mutant p53 has been approved for cancer treatment. Here, we performed whole-genome sequencing of 663 esophageal squamous cell carcinoma (ESCC) tumor tissues and paired normal tissues. The results indicated that ESCC samples from our cohort had a more dispersed distribution of TP53 mutants and a higher proportion of nonsense mutants than European and American ESCC samples in the International Agency for Research on Cancer (IARC) database. The most frequent p53 mutations disrupt the inhibition of proliferation, migration, and invasion mediated by wild-type p53 in ESCC. Furthermore, p53 mutations alter its protein nucleoplasmic localization and protein stability. The p53 mutation G245S (p53-G245S) interacts with heterogeneous nuclear ribonucleoprotein A2B1 (hnRNPA2B1) to increase protein translation of phosphatidylinositol-dependent Arf GAP (AGAP1) by promoting AGAP1 mRNA stability. AGAP1 promotes cancer cell proliferation and metastasis by enhancing exosome formation. Furthermore, we explored the combination of the HSP90 inhibitor HSP90i and the AGAP1 inhibitor QS11 could inhibit ESCC cell proliferation and metastasis. Thus, the p53-G245S/hnRNPA2B1/AGAP1 axis promotes ESCC progression by enhancing exosome formation, and the combination of an HSP90 inhibitor and an AGAP1 inhibitor may serve as a potential therapeutic strategy.
Collapse
Affiliation(s)
- Riyue Feng
- State Key Laboratory of Molecular Oncology, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Yin Yin
- State Key Laboratory of Molecular Oncology, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Yuge Wei
- State Key Laboratory of Molecular Oncology, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Yang Li
- State Key Laboratory of Molecular Oncology, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Lei Li
- State Key Laboratory of Molecular Oncology, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Rui Zhu
- State Key Laboratory of Molecular Oncology, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Xiao Yu
- State Key Laboratory of Molecular Oncology, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Yuhao Liu
- State Key Laboratory of Molecular Oncology, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Yahui Zhao
- State Key Laboratory of Molecular Oncology, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China.
| | - Zhihua Liu
- State Key Laboratory of Molecular Oncology, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China.
| |
Collapse
|
26
|
Haseeb M, Amir A, Ikram A. In Silico Analysis of SARS-CoV-2 Spike Proteins of Different Field Variants. Vaccines (Basel) 2023; 11:vaccines11040736. [PMID: 37112648 PMCID: PMC10145761 DOI: 10.3390/vaccines11040736] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 03/04/2023] [Accepted: 03/07/2023] [Indexed: 03/29/2023] Open
Abstract
Coronaviruses belong to the group of RNA family of viruses that trigger diseases in birds, humans, and mammals, which can cause respiratory tract infections. The COVID-19 pandemic has badly affected every part of the world. Our study aimed to explore the genome of SARS-CoV-2, followed by in silico analysis of its proteins. Different nucleotide and protein variants of SARS-CoV-2 were retrieved from NCBI. Contigs and consensus sequences were developed to identify these variants using SnapGene. Data of the variants that significantly differed from each other was run through Predict Protein software to understand the changes produced in the protein structure. The SOPMA web server was used to predict the secondary structure of the proteins. Tertiary structure details of the selected proteins were analyzed using the web server SWISS-MODEL. Sequencing results showed numerous single nucleotide polymorphisms in the surface glycoprotein, nucleocapsid, ORF1a, and ORF1ab polyprotein while the envelope, membrane, ORF3a, ORF6, ORF7a, ORF8, and ORF10 genes had no or few SNPs. Contigs were used to identify variations in the Alpha and Delta variants of SARS-CoV-2 with the reference strain (Wuhan). Some of the secondary structures of the SARS-CoV-2 proteins were predicted by using Sopma software and were further compared with reference strains of SARS-CoV-2 (Wuhan) proteins. The tertiary structure details of only spike proteins were analyzed through the SWISS-MODEL and Ramachandran plots. Through the Swiss-model, a comparison of the tertiary structure model of the SARS-CoV-2 spike protein of the Alpha and Delta variants was made with the reference strain (Wuhan). Alpha and Delta variants of the SARS-CoV-2 isolates submitted in GISAID from Pakistan with changes in structural and nonstructural proteins were compared with the reference strain, and 3D structure mapping of the spike glycoprotein and mutations in the amino acids were seen. The surprisingly increased rate of SARS-CoV-2 transmission has forced numerous countries to impose a total lockdown due to an unusual occurrence. In this research, we employed in silico computational tools to analyze the SARS-CoV-2 genomes worldwide to detect vital variations in structural proteins and dynamic changes in all SARS-CoV-2 proteins, mainly spike proteins, produced due to many mutations. Our analysis revealed substantial differences in the functionality, immunological, physicochemical, and structural variations in the SARS-CoV-2 isolates. However, the real impact of these SNPs can only be determined further by experiments. Our results can aid in vivo and in vitro experiments in the future.
Collapse
|
27
|
Xu G, Wang Q, Ma J. OPUS-Mut: Studying the Effect of Protein Mutation through Side-Chain Modeling. J Chem Theory Comput 2023; 19:1629-1640. [PMID: 36813264 PMCID: PMC10018731 DOI: 10.1021/acs.jctc.2c00847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
Abstract
Predicting the effect of protein mutation is crucial in many applications such as protein design, protein evolution, and genetic disease analysis. Structurally, mutation is basically the replacement of the side chain of a particular residue. Therefore, accurate side-chain modeling is useful in studying the effect of mutation. Here, we propose a computational method, namely, OPUS-Mut, which significantly outperforms other backbone-dependent side-chain modeling methods including our previous method OPUS-Rota4. We evaluate OPUS-Mut by four case studies on Myoglobin, p53, HIV-1 protease, and T4 lysozyme. The results show that the predicted structures of side chains of different mutants are consistent well with their experimentally determined results. In addition, when the residues with significant structural shifts upon the mutation are considered, it is found that the extent of the predicted structural shift of these affected residues can be correlated reasonably well with the functional changes of the mutant measured by experiments. OPUS-Mut can also help one to identify the harmful and benign mutations and thus may guide the construction of a protein with relatively low sequence homology but with a similar structure.
Collapse
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China.,Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China.,Shanghai AI Laboratory, Shanghai 200030, China
| | - Qinghua Wang
- Center for Biomolecular Innovation, Harcam Biomedicines, Shanghai 200131, China
| | - Jianpeng Ma
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China.,Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China.,Shanghai AI Laboratory, Shanghai 200030, China
| |
Collapse
|
28
|
Zaji HD, Seyedalipour B, Hanun HM, Baziyar P, Hosseinkhani S, Akhlaghi M. Computational insight into in silico analysis and molecular dynamics simulation of the dimer interface residues of ALS-linked hSOD1 forms in apo/holo states: a combined experimental and bioinformatic perspective. 3 Biotech 2023; 13:92. [PMID: 36845075 PMCID: PMC9944573 DOI: 10.1007/s13205-023-03514-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 02/03/2023] [Indexed: 02/23/2023] Open
Abstract
The aggregation of misfolded SOD1 proteins in neurodegenerative illnesses is a key pathological hallmark in amyotrophic lateral sclerosis (ALS). SOD1 is stabilized and enzymatically activated after binding to Cu/Zn and forming intramolecular disulfide. SOD1 aggregation/oligomerization is triggered by the dissociation of Cu and/or Zn ions. Therefore, we compared the possible effects of ALS-associated point mutations of the holo/apo forms of WT/I149T/V148G SOD1 variants located at the dimer interface to determine structural characterization using spectroscopic methods, computational approaches as well as molecular dynamics (MD) simulations. Predictive results of computational analysis of single-nucleotide polymorphisms (SNPs) suggested that mutant SOD1 has a deleterious effect on activity and structure destabilization. MD data analysis indicated that changes in flexibility, stability, hydrophobicity of the protein as well as increased intramolecular interactions of apo-SOD1 were more than holo-SOD1. Furthermore, a decrease in enzymatic activity in apo-SOD1 was observed compared to holo-SOD1. Comparative intrinsic and ANS fluorescence results of holo/apo-WT-hSOD1 and mutants indicated structural alterations in the local environment of tryptophan residue and hydrophobic patches, respectively. Experimental and MD data supported that substitution effect and metal deficiency of mutants (apo forms) in the dimer interface may promote the tendency to protein mis-folding and aggregation, consequently disrupting the dimer-monomer equilibrium and increased propensity to dissociation dimer into SOD-monomer ultimately leading to loss of stability and function. Overall, data analysis of apo/holo SOD1 forms on protein structure and function using computational and experimental studies will contribute to a better understanding of ALS pathogenicity.
Collapse
Affiliation(s)
- Hamza Dakhil Zaji
- Department of Molecular and Cell Biology, Faculty of Basic Science, University of Mazandaran, Babolsar, Iran
| | - Bagher Seyedalipour
- Department of Molecular and Cell Biology, Faculty of Basic Science, University of Mazandaran, Babolsar, Iran
| | - Haider Munzer Hanun
- Department of Molecular and Cell Biology, Faculty of Basic Science, University of Mazandaran, Babolsar, Iran
| | - Payam Baziyar
- Department of Molecular and Cell Biology, Faculty of Basic Science, University of Mazandaran, Babolsar, Iran
| | - Saman Hosseinkhani
- Department of Biochemistry, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Mona Akhlaghi
- Department of Molecular and Cell Biology, Faculty of Basic Science, University of Mazandaran, Babolsar, Iran
| |
Collapse
|
29
|
Keskin Karakoyun H, Yüksel ŞK, Amanoglu I, Naserikhojasteh L, Yeşilyurt A, Yakıcıer C, Timuçin E, Akyerli CB. Evaluation of AlphaFold structure-based protein stability prediction on missense variations in cancer. Front Genet 2023; 14:1052383. [PMID: 36896237 PMCID: PMC9988940 DOI: 10.3389/fgene.2023.1052383] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 02/08/2023] [Indexed: 02/23/2023] Open
Abstract
Identifying pathogenic missense variants in hereditary cancer is critical to the efforts of patient surveillance and risk-reduction strategies. For this purpose, many different gene panels consisting of different number and/or set of genes are available and we are particularly interested in a panel of 26 genes with a varying degree of hereditary cancer risk consisting of ABRAXAS1, ATM, BARD1, BLM, BRCA1, BRCA2, BRIP1, CDH1, CHEK2, EPCAM, MEN1, MLH1, MRE11, MSH2, MSH6, MUTYH, NBN, PALB2, PMS2, PTEN, RAD50, RAD51C, RAD51D, STK11, TP53, and XRCC2. In this study, we have compiled a collection of the missense variations reported in any of these 26 genes. More than a thousand missense variants were collected from ClinVar and the targeted screen of a breast cancer cohort of 355 patients which contributed to this set with 160 novel missense variations. We analyzed the impact of the missense variations on protein stability by five different predictors including both sequence- (SAAF2EC and MUpro) and structure-based (Maestro, mCSM, CUPSAT) predictors. For the structure-based tools, we have utilized the AlphaFold (AF2) protein structures which comprise the first structural analysis of this hereditary cancer proteins. Our results agreed with the recent benchmarks that computed the power of stability predictors in discriminating the pathogenic variants. Overall, we reported a low-to-medium-level performance for the stability predictors in discriminating pathogenic variants, except MUpro which had an AUROC of 0.534 (95% CI [0.499-0.570]). The AUROC values ranged between 0.614-0.719 for the total set and 0.596-0.682 for the set with high AF2 confidence regions. Furthermore, our findings revealed that the confidence score for a given variant in the AF2 structure could alone predict pathogenicity more robustly than any of the tested stability predictors with an AUROC of 0.852. Altogether, this study represents the first structural analysis of the 26 hereditary cancer genes underscoring 1) the thermodynamic stability predicted from AF2 structures as a moderate and 2) the confidence score of AF2 as a strong descriptor for variant pathogenicity.
Collapse
Affiliation(s)
- Hilal Keskin Karakoyun
- Department of Biochemistry and Molecular Biology, Institute of Health Sciences, Acibadem Mehmet Ali Aydinlar University, Istanbul, Türkiye
| | - Şirin K. Yüksel
- Department of Biochemistry and Molecular Biology, Institute of Health Sciences, Acibadem Mehmet Ali Aydinlar University, Istanbul, Türkiye
| | - Ilayda Amanoglu
- Department of Biostatistics and Bioinformatics, Institute of Health Sciences, Acibadem Mehmet Ali Aydinlar University, Istanbul, Türkiye
| | - Lara Naserikhojasteh
- Department of Biostatistics and Bioinformatics, Institute of Health Sciences, Acibadem Mehmet Ali Aydinlar University, Istanbul, Türkiye
| | - Ahmet Yeşilyurt
- Acibadem Labgen Genetic Diagnosis Centre, Acibadem Health Group, Istanbul, Türkiye
| | - Cengiz Yakıcıer
- Acibadem Pathology Laboratories, Acibadem Health Group, Istanbul, Türkiye
| | - Emel Timuçin
- Department of Biostatistics and Medical Informatics, School of Medicine, Acibadem Mehmet Ali Aydinlar University, Istanbul, Türkiye
| | - Cemaliye B. Akyerli
- Department of Medical Biology, School of Medicine, Acibadem Mehmet Ali Aydinlar University, Istanbul, Türkiye
| |
Collapse
|
30
|
Lihan M, Lupyan D, Oehme D. Target-template relationships in protein structure prediction and their effect on the accuracy of thermostability calculations. Protein Sci 2023; 32:e4557. [PMID: 36573828 PMCID: PMC9878467 DOI: 10.1002/pro.4557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 12/22/2022] [Accepted: 12/23/2022] [Indexed: 12/28/2022]
Abstract
Improving protein thermostability has been a labor- and time-consuming process in industrial applications of protein engineering. Advances in computational approaches have facilitated the development of more efficient strategies to allow the prioritization of stabilizing mutants. Among these is FEP+, a free energy perturbation implementation that uses a thoroughly tested physics-based method to achieve unparalleled accuracy in predicting changes in protein thermostability. To gauge the applicability of FEP+ to situations where crystal structures are unavailable, here we have applied the FEP+ approach to homology models of 12 different proteins covering 316 mutations. By comparing predictions obtained with homology models to those obtained using crystal structures, we have identified that local rather than global sequence conservation between target and template sequence is a determining factor in the accuracy of predictions. By excluding mutation sites with low local sequence identity (<40%) to a template structure, we have obtained predictions with comparable performance to crystal structures (R2 of 0.67 and 0.63 and an RMSE of 1.20 and 1.16 kcal/mol for crystal structure and homology model predictions, respectively) for identifying stabilizing mutations when incorporating residue scanning into a cascade screening strategy. Additionally, we identify and discuss inherent limitations in sequence alignments and homology modeling protocols that translate into the poor FEP+ performance of a few select examples. Overall, our retrospective study provides detailed guidelines for the application of the FEP+ approach using homology models for protein thermostability predictions, which will greatly extend this approach to studies that were previously limited by structure availability.
Collapse
Affiliation(s)
- Muyun Lihan
- NIH Center for Macromolecular Modeling and Bioinformatics, Beckman Institute for Advanced Science and Technology, and Center for Biophysics and Quantitative BiologyUniversity of Illinois Urbana‐ChampaignUrbanaIllinoisUSA
- Schrödinger Inc.CambridgeMassachusettsUSA
| | | | | |
Collapse
|
31
|
Hernández IM, Dehouck Y, Bastolla U, López-Blanco JR, Chacón P. Predicting protein stability changes upon mutation using a simple orientational potential. Bioinformatics 2023; 39:6984713. [PMID: 36629451 PMCID: PMC9850275 DOI: 10.1093/bioinformatics/btad011] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Revised: 11/17/2022] [Accepted: 01/10/2023] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION Structure-based stability prediction upon mutation is crucial for protein engineering and design, and for understanding genetic diseases or drug resistance events. For this task, we adopted a simple residue-based orientational potential that considers only three backbone atoms, previously applied in protein modeling. Its application to stability prediction only requires parametrizing 12 amino acid-dependent weights using cross-validation strategies on a curated dataset in which we tried to reduce the mutations that belong to protein-protein or protein-ligand interfaces, extreme conditions and the alanine over-representation. RESULTS Our method, called KORPM, accurately predicts mutational effects on an independent benchmark dataset, whether the wild-type or mutated structure is used as starting point. Compared with state-of-the-art methods on this balanced dataset, our approach obtained the lowest root mean square error (RMSE) and the highest correlation between predicted and experimental ΔΔG measures, as well as better receiver operating characteristics and precision-recall curves. Our method is almost anti-symmetric by construction, and it performs thus similarly for the direct and reverse mutations with the corresponding wild-type and mutated structures. Despite the strong limitations of the available experimental mutation data in terms of size, variability, and heterogeneity, we show competitive results with a simple sum of energy terms, which is more efficient and less prone to overfitting. AVAILABILITY AND IMPLEMENTATION https://github.com/chaconlab/korpm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Iván Martín Hernández
- Department of Biological Physical Chemistry, Rocasolano Institute of Physical Chemistry, CSIC, 28006 Madrid, Spain
| | - Yves Dehouck
- Bioinformatic Unit, Centro de Biología Molecular “Severo Ochoa,” CSIC-UAM Cantoblanco, Madrid 28049, Spain
| | - Ugo Bastolla
- Bioinformatic Unit, Centro de Biología Molecular “Severo Ochoa,” CSIC-UAM Cantoblanco, Madrid 28049, Spain
| | - José Ramón López-Blanco
- Department of Biological Physical Chemistry, Rocasolano Institute of Physical Chemistry, CSIC, 28006 Madrid, Spain
| | | |
Collapse
|
32
|
Benevenuta S, Birolo G, Sanavia T, Capriotti E, Fariselli P. Challenges in predicting stabilizing variations: An exploration. Front Mol Biosci 2023; 9:1075570. [PMID: 36685278 PMCID: PMC9849384 DOI: 10.3389/fmolb.2022.1075570] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 12/15/2022] [Indexed: 01/06/2023] Open
Abstract
An open challenge of computational and experimental biology is understanding the impact of non-synonymous DNA variations on protein function and, subsequently, human health. The effects of these variants on protein stability can be measured as the difference in the free energy of unfolding (ΔΔG) between the mutated structure of the protein and its wild-type form. Throughout the years, bioinformaticians have developed a wide variety of tools and approaches to predict the ΔΔG. Although the performance of these tools is highly variable, overall they are less accurate in predicting ΔΔG stabilizing variations rather than the destabilizing ones. Here, we analyze the possible reasons for this difference by focusing on the relationship between experimentally-measured ΔΔG and seven protein properties on three widely-used datasets (S2648, VariBench, Ssym) and a recently introduced one (S669). These properties include protein structural information, different physical properties and statistical potentials. We found that two highly used input features, i.e., hydrophobicity and the Blosum62 substitution matrix, show a performance close to random choice when trying to separate stabilizing variants from either neutral or destabilizing ones. We then speculate that, since destabilizing variations are the most abundant class in the available datasets, the overall performance of the methods is higher when including features that improve the prediction for the destabilizing variants at the expense of the stabilizing ones. These findings highlight the need of designing predictive methods able to exploit also input features highly correlated with the stabilizing variants. New tools should also be tested on a not-artificially balanced dataset, reporting the performance on all the three classes (i.e., stabilizing, neutral and destabilizing variants) and not only the overall results.
Collapse
Affiliation(s)
| | - Giovanni Birolo
- Department of Medical Sciences, University of Torino, Torino, Italy
| | - Tiziana Sanavia
- Department of Medical Sciences, University of Torino, Torino, Italy
| | - Emidio Capriotti
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Torino, Italy,*Correspondence: Piero Fariselli,
| |
Collapse
|
33
|
Gong J, Wang J, Zong X, Ma Z, Xu D. Prediction of protein stability changes upon single-point variant using 3D structure profile. Comput Struct Biotechnol J 2022; 21:354-364. [PMID: 36582438 PMCID: PMC9791599 DOI: 10.1016/j.csbj.2022.12.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 12/04/2022] [Accepted: 12/05/2022] [Indexed: 12/13/2022] Open
Abstract
Identifying protein thermodynamic stability changes upon single-point variants is crucial for studying mutation-induced alterations in protein biophysics, genomic variants, and mutation-related diseases. In the last decade, various computational methods have been developed to predict the effects of single-point variants, but the prediction accuracy is still far from satisfactory for practical applications. Herein, we review approaches and tools for predicting stability changes upon the single-point variant. Most of these methods require tertiary protein structure as input to achieve reliable predictions. However, the availability of protein structures limits the immediate application of these tools. To improve the performance of a computational prediction from a protein sequence without experimental structural information, we introduce a new computational framework: MU3DSP. This method assesses the effects of single-point variants on protein thermodynamic stability based on point mutated protein 3D structure profile. Given a protein sequence with a single variant as input, MU3DSP integrates both sequence-level features and averaged features of 3D structures obtained from sequence alignment to PDB to assess the change of thermodynamic stability induced by the substitution. MU3DSP outperforms existing methods on various benchmarks, making it a reliable tool to assess both somatic and germline substitution variants and assist in protein design. MU3DSP is available as an open-source tool at https://github.com/hurraygong/MU3DSP.
Collapse
Affiliation(s)
- Jianting Gong
- School of Information Science and Technology, and Institution of Computational Biology, Northeast Normal University, Changchun 130117, China
- Department of Electrical Engineering and Computer Science, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Juexin Wang
- Department of Electrical Engineering and Computer Science, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University Indianapolis, Indianapolis, IN, USA
| | - Xizeng Zong
- School of Computer Science and Engineering, Changchun University of Technology, Changchun 130117, China
| | - Zhiqiang Ma
- School of Information Science and Technology, and Institution of Computational Biology, Northeast Normal University, Changchun 130117, China
- Department of Computer Science, College of Humanities & Sciences of Northeast Normal University, Changchun 130117, China
- Corresponding authors.
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
- Corresponding authors.
| |
Collapse
|
34
|
Agapova YK, Petrenko DE, Timofeev VI, Rakitina TV. Comparative Analysis of the Interfaces between Monomers in the Dimers of Bacterial Histone-Like HU Proteins by the MM-GBSA Method. CRYSTALLOGR REP+ 2022. [DOI: 10.1134/s1063774522060025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
|
35
|
Beheshti Shirazi SS, Sakhaee F, Sotoodehnejadnematalahi F, Zamani MS, Ahmadi I, Anvari E, Fateh A. rs12329760 Polymorphism in Transmembrane Serine Protease 2 Gene and Risk of Coronavirus Disease 2019 Mortality. BIOMED RESEARCH INTERNATIONAL 2022; 2022:7841969. [PMID: 36457338 PMCID: PMC9708353 DOI: 10.1155/2022/7841969] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Revised: 09/04/2022] [Accepted: 11/12/2022] [Indexed: 08/29/2023]
Abstract
The protease produced by the transmembrane serine protease 2 (TMPRSS2) gene enhances viral infections and has been linked to severe acute respiratory syndrome coronavirus 2 pathogenesis. Therefore, this study evaluated the association between TMPRSS2 and coronavirus disease 2019 (COVID-19) mortality. TMPRSS2 rs12329760 polymorphism was genotyped using the tetraprimer amplification refractory mutation system-polymerase chain reaction method in 592 dead and 693 improved patients. In the current study, the frequency of TMPRSS2 rs12329760 CC than TT genotypes was significantly lower in improved patients than in dead patients. According to the findings of the multivariate logistic regression test, higher levels of mean age, creatinine, erythrocyte sedimentation rate, C-reactive protein, aspartate aminotransferase, lower levels of 25-hydroxyvitamin D, uric acid, and real-time PCR Ct values and TMPRSS2 rs12329760 CC genotype were observed to be associated with increased COVID-19 mortality rates. In conclusion, the TMPRSS2 rs12329760 CC genotype was a polymorphism linked to a significantly higher incidence of severe COVID-19. Further studies are required to corroborate the obtained findings.
Collapse
Affiliation(s)
| | - Fatemeh Sakhaee
- Department of Mycobacteriology and Pulmonary Research, Pasteur Institute of Iran, Tehran, Iran
| | | | | | - Iraj Ahmadi
- Department of Physiology, School of Medicine, Ilam University of Medical Science, Ilam, Iran
| | - Enayat Anvari
- Department of Physiology, School of Medicine, Ilam University of Medical Science, Ilam, Iran
| | - Abolfazl Fateh
- Department of Mycobacteriology and Pulmonary Research, Pasteur Institute of Iran, Tehran, Iran
- Microbiology Research Center (MRC), Pasteur Institute of Iran, Tehran, Iran
| |
Collapse
|
36
|
Benamri I, Azzouzi M, Moussa A, Radouani F. An in silico analysis of rpoB mutations to affect Chlamydia trachomatis sensitivity to rifamycin. J Genet Eng Biotechnol 2022; 20:146. [DOI: 10.1186/s43141-022-00428-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 10/08/2022] [Indexed: 11/10/2022]
Abstract
Abstract
Background
Chlamydia trachomatis is an obligate intracellular gram-negative pathogen, responsible for diverse affections, mainly trachoma and sexually transmitted diseases. Antibiotics are the commonly used drugs to tackle chlamydiae infections. However, when overused or wrongly used this may lead to strains’ resistance to antibiotics, this phenomenon represents a real health problem worldwide. Numerous studies showed the association of Chlamydia trachomatis resistance with mutations in different genes; these mutations could have a deleterious or neutral impacts on the encoded proteins. The aim of this study is to perform an in silico analysis of C. trachomatis rpoB-encoded proteins using numerous bioinformatics tools and to identify the functional and structural-related effects of the mutations and consequently their impact on the bacteria sensitivity to antibiotics.
Results
The analysis revealed that the prediction of the damaging impact related to the mutations in rpoB-encoded proteins showed eight mutations: V136F, Q458K, V466A, A467T, H471N, H471Y, H471L, and I517M with big deleterious effects. Among them, six mutations, V136F, Q458K, V466A, A467T, H471N, and I517M, are located in a highly conserved regions decreasing the protein’s stability. Furthermore, the structures analysis showed that the mutations A467T, H471N, I517M, and V136F models had a high deviation compared to the wild type. Moreover, the prediction of protein-protein network indicated that rpoB wild type interacts strongly with 10 proteins of C. trachomatis, which are playing different roles at different levels.
Conclusion
As conclusion, the present study revealed that the changes observed in the encoded proteins can affect their functions and structures, in addition to their interactions with other proteins which impact the bacteria sensitivity to antibiotics. Consequently, the information revealed through this in silico analysis would be useful for deeper exploration to understand the mechanisms of C. trachomatis resistance and enable managing the infection to avoid its complications. We recommend further investigations and perform deeper experimental analysis with collaboration between bioinformaticians, physicians, biologists, pharmacists, and chemistry and biochemistry scientists.
Collapse
|
37
|
Liu Y, Yeung WSB, Chiu PCN, Cao D. Computational approaches for predicting variant impact: An overview from resources, principles to applications. Front Genet 2022; 13:981005. [PMID: 36246661 PMCID: PMC9559863 DOI: 10.3389/fgene.2022.981005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 08/08/2022] [Indexed: 11/13/2022] Open
Abstract
One objective of human genetics is to unveil the variants that contribute to human diseases. With the rapid development and wide use of next-generation sequencing (NGS), massive genomic sequence data have been created, making personal genetic information available. Conventional experimental evidence is critical in establishing the relationship between sequence variants and phenotype but with low efficiency. Due to the lack of comprehensive databases and resources which present clinical and experimental evidence on genotype-phenotype relationship, as well as accumulating variants found from NGS, different computational tools that can predict the impact of the variants on phenotype have been greatly developed to bridge the gap. In this review, we present a brief introduction and discussion about the computational approaches for variant impact prediction. Following an innovative manner, we mainly focus on approaches for non-synonymous variants (nsSNVs) impact prediction and categorize them into six classes. Their underlying rationale and constraints, together with the concerns and remedies raised from comparative studies are discussed. We also present how the predictive approaches employed in different research. Although diverse constraints exist, the computational predictive approaches are indispensable in exploring genotype-phenotype relationship.
Collapse
Affiliation(s)
- Ye Liu
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
| | - William S. B. Yeung
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
- Department of Obstetrics and Gynaecology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Philip C. N. Chiu
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
- Department of Obstetrics and Gynaecology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Dandan Cao
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
| |
Collapse
|
38
|
Structural heterogeneity and precision of implications drawn from cryo-electron microscopy structures: SARS-CoV-2 spike-protein mutations as a test case. EUROPEAN BIOPHYSICS JOURNAL 2022; 51:555-568. [PMID: 36167828 PMCID: PMC9514682 DOI: 10.1007/s00249-022-01619-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 09/19/2022] [Indexed: 11/18/2022]
Abstract
Protein structures may be used to draw functional implications at the residue level, but how sensitive are these implications to the exact structure used? Calculation of the effects of SARS-CoV-2 S-protein mutations based on experimental cryo-electron microscopy structures have been abundant during the pandemic. To understand the precision of such estimates, we studied three distinct methods to estimate stability changes for all possible mutations in 23 different S-protein structures (3.69 million ΔΔG values in total) and explored how random and systematic errors can be remedied by structure-averaged mutation group comparisons. We show that computational estimates have low precision, due to method and structure heterogeneity making results for single mutations uninformative. However, structure-averaged differences in mean effects for groups of substitutions can yield significant results. Illustrating this protocol, functionally important natural mutations, despite individual variations, average to a smaller stability impact compared to other possible mutations, independent of conformational state (open, closed). In summary, we document substantial issues with precision in structure-based protein modeling and recommend sensitivity tests to quantify these effects, but also suggest partial solutions to the problem in the form of structure-averaged “ensemble” estimates for groups of residues when multiple structures are available.
Collapse
|
39
|
PSP-GNM: Predicting Protein Stability Changes upon Point Mutations with a Gaussian Network Model. Int J Mol Sci 2022; 23:ijms231810711. [PMID: 36142614 PMCID: PMC9505940 DOI: 10.3390/ijms231810711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 09/05/2022] [Accepted: 09/09/2022] [Indexed: 11/26/2022] Open
Abstract
Understanding the effects of missense mutations on protein stability is a widely acknowledged significant biological problem. Genomic missense mutations may alter one or more amino acids, leading to increased or decreased stability of the encoded proteins. In this study, we describe a novel approach—Protein Stability Prediction with a Gaussian Network Model (PSP-GNM)—to measure the unfolding Gibbs free energy change (ΔΔG) and evaluate the effects of single amino acid substitutions on protein stability. Specifically, PSP-GNM employs a coarse-grained Gaussian Network Model (GNM) that has interactions between amino acids weighted by the Miyazawa–Jernigan statistical potential. We used PSP-GNM to simulate partial unfolding of the wildtype and mutant protein structures, and then used the difference in the energies and entropies of the unfolded wildtype and mutant proteins to calculate ΔΔG. The extent of the agreement between the ΔΔG calculated by PSP-GNM and the experimental ΔΔG was evaluated on three benchmark datasets: 350 forward mutations (S350 dataset), 669 forward and reverse mutations (S669 dataset) and 611 forward and reverse mutations (S611 dataset). We observed a Pearson correlation coefficient as high as 0.61, which is comparable to many of the existing state-of-the-art methods. The agreement with experimental ΔΔG further increased when we considered only those measurements made close to 25 °C and neutral pH, suggesting dependence on experimental conditions. We also assessed for the antisymmetry (ΔΔGreverse = −ΔΔGforward) between the forward and reverse mutations on the Ssym+ dataset, which has 352 forward and reverse mutations. While most available methods do not display significant antisymmetry, PSP-GNM demonstrated near-perfect antisymmetry, with a Pearson correlation of −0.97. PSP-GNM is written in Python and can be downloaded as a stand-alone code.
Collapse
|
40
|
Iqbal S, Ge F, Li F, Akutsu T, Zheng Y, Gasser RB, Yu DJ, Webb GI, Song J. PROST: AlphaFold2-aware Sequence-Based Predictor to Estimate Protein Stability Changes upon Missense Mutations. J Chem Inf Model 2022; 62:4270-4282. [PMID: 35973091 DOI: 10.1021/acs.jcim.2c00799] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
An essential step in engineering proteins and understanding disease-causing missense mutations is to accurately model protein stability changes when such mutations occur. Here, we developed a new sequence-based predictor for the protein stability (PROST) change (Gibb's free energy change, ΔΔG) upon a single-point missense mutation. PROST extracts multiple descriptors from the most promising sequence-based predictors, such as BoostDDG, SAAFEC-SEQ, and DDGun. RPOST also extracts descriptors from iFeature and AlphaFold2. The extracted descriptors include sequence-based features, physicochemical properties, evolutionary information, evolutionary-based physicochemical properties, and predicted structural features. The PROST predictor is a weighted average ensemble model based on extreme gradient boosting (XGBoost) decision trees and an extra-trees regressor; PROST is trained on both direct and hypothetical reverse mutations using the S5294 (S2647 direct mutations + S2647 inverse mutations). The parameters for the PROST model are optimized using grid searching with 5-fold cross-validation, and feature importance analysis unveils the most relevant features. The performance of PROST is evaluated in a blinded manner, employing nine distinct data sets and existing state-of-the-art sequence-based and structure-based predictors. This method consistently performs well on frataxin, S217, S349, Ssym, S669, Myoglobin, and CAGI5 data sets in blind tests and similarly to the state-of-the-art predictors for p53 and S276 data sets. When the performance of PROST is compared with the latest predictors such as BoostDDG, SAAFEC-SEQ, ACDC-NN-seq, and DDGun, PROST dominates these predictors. A case study of mutation scanning of the frataxin protein for nine wild-type residues demonstrates the utility of PROST. Taken together, these findings indicate that PROST is a well-suited predictor when no protein structural information is available. The source code of PROST, data sets, examples, and pretrained models along with how to use PROST are available at https://github.com/ShahidIqb/PROST and https://prost.erc.monash.edu/seq.
Collapse
Affiliation(s)
- Shahid Iqbal
- Department of Data Science and AI, Faculty of IT, Monash University, Clayton, Victoria 3800, Australia.,Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, Victoria 3800, Australia.,Monash Data Futures Institute, Monash University, Clayton, Victoria 3800, Australia
| | - Fang Ge
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, Victoria 3800, Australia.,Monash Data Futures Institute, Monash University, Clayton, Victoria 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Yuanting Zheng
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Robin B Gasser
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Geoffrey I Webb
- Department of Data Science and AI, Faculty of IT, Monash University, Clayton, Victoria 3800, Australia.,Monash Data Futures Institute, Monash University, Clayton, Victoria 3800, Australia
| | - Jiangning Song
- Department of Data Science and AI, Faculty of IT, Monash University, Clayton, Victoria 3800, Australia.,Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, Victoria 3800, Australia.,Monash Data Futures Institute, Monash University, Clayton, Victoria 3800, Australia
| |
Collapse
|
41
|
Rahban M, Zolghadri S, Salehi N, Ahmad F, Haertlé T, Rezaei-Ghaleh N, Sawyer L, Saboury AA. Thermal stability enhancement: Fundamental concepts of protein engineering strategies to manipulate the flexible structure. Int J Biol Macromol 2022; 214:642-654. [DOI: 10.1016/j.ijbiomac.2022.06.154] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 06/22/2022] [Accepted: 06/23/2022] [Indexed: 01/28/2023]
|
42
|
Loss-of-function, gain-of-function and dominant-negative mutations have profoundly different effects on protein structure. Nat Commun 2022; 13:3895. [PMID: 35794153 PMCID: PMC9259657 DOI: 10.1038/s41467-022-31686-6] [Citation(s) in RCA: 74] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 06/29/2022] [Indexed: 12/12/2022] Open
Abstract
Most known pathogenic mutations occur in protein-coding regions of DNA and change the way proteins are made. Taking protein structure into account has therefore provided great insight into the molecular mechanisms underlying human genetic disease. While there has been much focus on how mutations can disrupt protein structure and thus cause a loss of function (LOF), alternative mechanisms, specifically dominant-negative (DN) and gain-of-function (GOF) effects, are less understood. Here, we investigate the protein-level effects of pathogenic missense mutations associated with different molecular mechanisms. We observe striking differences between recessive vs dominant, and LOF vs non-LOF mutations, with dominant, non-LOF disease mutations having much milder effects on protein structure, and DN mutations being highly enriched at protein interfaces. We also find that nearly all computational variant effect predictors, even those based solely on sequence conservation, underperform on non-LOF mutations. However, we do show that non-LOF mutations could potentially be identified by their tendency to cluster in three-dimensional space. Overall, our work suggests that many pathogenic mutations that act via DN and GOF mechanisms are likely being missed by current variant prioritisation strategies, but that there is considerable scope to improve computational predictions through consideration of molecular disease mechanisms. Most known pathogenic mutations occur in protein-coding regions of DNA and change the way proteins are made. Here the authors analyse the locations of thousands of human disease mutations and their predicted effects on protein structure and show that,while loss-of-function mutations tend to be highly disruptive, non-loss-of-function mutations are in general much milder at a protein structural level.
Collapse
|
43
|
Abdullaev A, Abdurakhimov A, Mirakbarova Z, Ibragimova S, Tsoy V, Nuriddinov S, Dalimova D, Turdikulova S, Abdurakhmonov I. Genome sequence diversity of SARS-CoV-2 obtained from clinical samples in Uzbekistan. PLoS One 2022; 17:e0270314. [PMID: 35759503 PMCID: PMC9236271 DOI: 10.1371/journal.pone.0270314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Accepted: 06/07/2022] [Indexed: 11/27/2022] Open
Abstract
Tracking temporal and spatial genomic changes and evolution of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) are among the most urgent research topics worldwide, which help to elucidate the coronavirus disease 2019 (COVID-19) pathogenesis and the effect of deleterious variants. Our current study concentrates genetic diversity of SARS-CoV-2 variants in Uzbekistan and their associations with COVID-19 severity. Thirty-nine whole genome sequences (WGS) of SARS-CoV-2 isolated from PCR-positive patients from Tashkent, Uzbekistan for the period of July-August 2021, were generated and further subjected to further genomic analysis. Genome-wide annotations of clinical isolates from our study have revealed a total of 223 nucleotide-level variations including SNPs and 34 deletions at different positions throughout the entire genome of SARS-CoV-2. These changes included two novel mutations at the Nonstructural protein (Nsp) 13: A85P and Nsp12: Y479N, which were unreported previously. There were two groups of co-occurred substitution patterns: the missense mutations in the Spike (S): D614G, Open Reading Frame (ORF) 1b: P314L, Nsp3: F924, 5`UTR:C241T; Nsp3:P2046L and Nsp3:P2287S, and the synonymous mutations in the Nsp4:D2907 (C8986T), Nsp6:T3646A and Nsp14:A1918V regions, respectively. The “Nextstrain” clustered the largest number of SARS-CoV-2 strains into the Delta clade (n = 32; 82%), followed by two Alpha-originated (n = 4; 10,3%) and 20A (n = 3; 7,7%) clades. Geographically the Delta clade sample sequences were grouped into several clusters with the SARS-CoV genotypes from Russia, Denmark, USA, Egypt and Bangladesh. Phylogenetically, the Delta isolates in our study belong to the two main subclades 21A (56%) and 21J (44%). We found that females were more affected by 21A, whereas males by 21J variant (χ2 = 4.57; p ≤ 0.05, n = 32). The amino acid substitution ORF7a:P45L in the Delta isolates found to be significantly associated with disease severity. In conclusion, this study evidenced that Identified novel substitutions Nsp13: A85P and Nsp12: Y479N, have a destabilizing effect, while missense substitution ORF7a: P45L significantly associated with disease severity.
Collapse
Affiliation(s)
| | | | | | | | - Vladimir Tsoy
- Center for Advanced Technologies, Tashkent, Uzbekistan
| | | | | | | | - Ibrokhim Abdurakhmonov
- Center for Advanced Technologies, Tashkent, Uzbekistan
- Center of Genomics and Bioinformatics, Academy of Sciences of Uzbekistan, Qibray Region, Tashkent, Republic of Uzbekistan
| |
Collapse
|
44
|
Ferla MP, Pagnamenta AT, Koukouflis L, Taylor JC, Marsden BD. Venus: Elucidating the Impact of Amino Acid Variants on Protein Function Beyond Structure Destabilisation. J Mol Biol 2022; 434:167567. [PMID: 35662467 PMCID: PMC9742853 DOI: 10.1016/j.jmb.2022.167567] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Revised: 03/11/2022] [Accepted: 03/22/2022] [Indexed: 12/15/2022]
Abstract
Exploring the functional effect of a non-synonymous coding variant at the protein level requires multiple pieces of information to be interpreted appropriately. This is particularly important when embarking on the study of a potentially pathogenic variant linked to a rare or monogenic disease. Whereas accurate protein stability predictions alone are generally informative, other effects, such as disruption of post-translational modifications or weakened ligand binding, may also contribute to the disease phenotype. Furthermore, consideration of nearby variants that are found in the healthy population may strengthen or refute a given mechanistic hypothesis. Whilst there are several bioinformatics tools available that score a genetic variant in terms of deleteriousness, there is no single tool that assembles multiple effects of a variant on the encoded protein, beyond structural stability, and presents them on the structure for inspection. Venus is a web application which, given a protein substitution, rapidly estimates the predicted effect on protein stability of the variant, flags if the variant affects a post-translational modification site, a predicted linear motif or known annotation, and determines the effect on protein stability of variants which affect nearby residues and have been identified in healthy populations. Venus is built upon Michelanglo and the results can be exported to it, allowing them to be annotated and shared with other researchers. Venus is freely accessible at https://venus.cmd.ox.ac.uk and its source code is openly available at https://github.com/CMD-Oxford/Michelanglo-and-Venus.
Collapse
Affiliation(s)
- Matteo P Ferla
- Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK; Oxford NIHR Biomedical Research Centre, Oxford, UK.
| | - Alistair T Pagnamenta
- Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK; Oxford NIHR Biomedical Research Centre, Oxford, UK. https://twitter.com/@alistairp2011
| | - Leonidas Koukouflis
- Centre for Medicines Discovery, University of Oxford, Old Road Campus Research Building, Oxford OX3 7DQ, UK
| | - Jenny C Taylor
- Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK; Oxford NIHR Biomedical Research Centre, Oxford, UK
| | - Brian D Marsden
- Centre for Medicines Discovery, University of Oxford, Old Road Campus Research Building, Oxford OX3 7DQ, UK; Kennedy Institute of Rheumatology, University of Oxford, Oxford OX3 7FY, UK. https://twitter.com/@bmarsden19
| |
Collapse
|
45
|
Scafuri B, Verdino A, D'Arminio N, Marabotti A. Computational methods to assist in the discovery of pharmacological chaperones for rare diseases. Brief Bioinform 2022; 23:6590149. [PMID: 35595532 DOI: 10.1093/bib/bbac198] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 04/13/2022] [Accepted: 04/28/2022] [Indexed: 12/21/2022] Open
Abstract
Pharmacological chaperones are chemical compounds able to bind proteins and stabilize them against denaturation and following degradation. Some pharmacological chaperones have been approved, or are under investigation, for the treatment of rare inborn errors of metabolism, caused by genetic mutations that often can destabilize the structure of the wild-type proteins expressed by that gene. Given that, for rare diseases, there is a general lack of pharmacological treatments, many expectations are poured out on this type of compounds. However, their discovery is not straightforward. In this review, we would like to focus on the computational methods that can assist and accelerate the search for these compounds, showing also examples in which these methods were successfully applied for the discovery of promising molecules belonging to this new category of pharmacologically active compounds.
Collapse
Affiliation(s)
- Bernardina Scafuri
- Department of Chemistry and Biology "A. Zambelli", University of Salerno, via Giovanni Paolo II, 132, 84084 Fisciano (SA), Italy
| | - Anna Verdino
- Department of Chemistry and Biology "A. Zambelli", University of Salerno, via Giovanni Paolo II, 132, 84084 Fisciano (SA), Italy
| | - Nancy D'Arminio
- Department of Chemistry and Biology "A. Zambelli", University of Salerno, via Giovanni Paolo II, 132, 84084 Fisciano (SA), Italy
| | - Anna Marabotti
- Department of Chemistry and Biology "A. Zambelli", University of Salerno, via Giovanni Paolo II, 132, 84084 Fisciano (SA), Italy
| |
Collapse
|
46
|
Bogomolovas J, Gravenhorst P, Mayans O. Production and analysis of titin kinase: Exploiting active/inactive kinase homologs in pseudokinase validation. Methods Enzymol 2022; 667:147-181. [PMID: 35525541 DOI: 10.1016/bs.mie.2022.03.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Protein pseudokinases are key regulators of the eukaryotic cell. Understanding their unconventional molecular mechanisms relies on deciphering their putative potential to perform phosphotransfer, their scaffolding properties and the nature of their regulation. Titin pseudokinase (TK) is the defining member of a family of poorly characterized muscle-specific kinases thought to act as sensors and transducers of mechanical signals in the sarcomere. The functional mechanisms of TK remain obscure due to the challenges posed by its production and analysis. Here, we provide guidelines and tailored research approaches for the study of TK, including profiting from its close structure-function relationship to the catalytically active homolog twitchin kinase (TwcK) from C. elegans. We describe a methodological pipeline to produce recombinant TK and TwcK samples; design, prioritize and validate mutated and truncated variants; assess sample stability and perform activity assays. The strategy is exportable to other pseudokinase members of the TK-like kinase family.
Collapse
Affiliation(s)
- Julius Bogomolovas
- School of Medicine, University of California, San Diego, La Jolla, CA, United States
| | | | - Olga Mayans
- Department of Biology, University of Konstanz, Konstanz, Germany.
| |
Collapse
|
47
|
Casadio R, Savojardo C, Fariselli P, Capriotti E, Martelli PL. Turning Failures into Applications: The Problem of Protein ΔΔG Prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2449:169-185. [PMID: 35507262 DOI: 10.1007/978-1-0716-2095-3_6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
After nearly two decades of research in the field of computational methods based on machine learning and knowledge-based potentials for ΔG and ΔΔG prediction upon variations, we now realize that all the approaches are poorly performing when tested on specific cases and that there is large space for improvement. Why this is so? Is it wrong the underlying assumption that experimental protein thermodynamics in solution reflects the thermodynamics of a single protein? Both machine learning and knowledge-based computational methods are rigorous and we know the solid theory behind. We are now in a critical situation, which suggests that predictions of protein instability upon variation should be considered with care. In the following, we will show how to cope with the problem of understanding which protein positions may be of interest for biotechnological and biomedical purposes. By applying a consensus procedure, we indicate possible strategies for the result interpretation.
Collapse
Affiliation(s)
- Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Turin, Italy
| | - Emidio Capriotti
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| |
Collapse
|
48
|
Montanucci L, Capriotti E, Birolo G, Benevenuta S, Pancotti C, Lal D, Fariselli P. DDGun: an untrained predictor of protein stability changes upon amino acid variants. Nucleic Acids Res 2022; 50:W222-W227. [PMID: 35524565 PMCID: PMC9252764 DOI: 10.1093/nar/gkac325] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 04/15/2022] [Accepted: 05/04/2022] [Indexed: 01/22/2023] Open
Abstract
Estimating the functional effect of single amino acid variants in proteins is fundamental for predicting the change in the thermodynamic stability, measured as the difference in the Gibbs free energy of unfolding, between the wild-type and the variant protein (ΔΔG). Here, we present the web-server of the DDGun method, which was previously developed for the ΔΔG prediction upon amino acid variants. DDGun is an untrained method based on basic features derived from evolutionary information. It is antisymmetric, as it predicts opposite ΔΔG values for direct (A → B) and reverse (B → A) single and multiple site variants. DDGun is available in two versions, one based on only sequence information and the other one based on sequence and structure information. Despite being untrained, DDGun reaches prediction performances comparable to those of trained methods. Here we make DDGun available as a web server. For the web server version, we updated the protein sequence database used for the computation of the evolutionary features, and we compiled two new data sets of protein variants to do a blind test of its performances. On these blind data sets of single and multiple site variants, DDGun confirms its prediction performance, reaching an average correlation coefficient between experimental and predicted ΔΔG of 0.45 and 0.49 for the sequence-based and structure-based versions, respectively. Besides being used for the prediction of ΔΔG, we suggest that DDGun should be adopted as a benchmark method to assess the predictive capabilities of newly developed methods. Releasing DDGun as a web-server, stand-alone program and docker image will facilitate the necessary process of method comparison to improve ΔΔG prediction.
Collapse
Affiliation(s)
- Ludovica Montanucci
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, 9500 Euclid Avenue, Cleveland, OH 44195, USA
| | - Emidio Capriotti
- BioFolD Unit, Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Via F. Selmi 3, 40126 Bologna, Italy
| | - Giovanni Birolo
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126, Torino, Italy
| | - Silvia Benevenuta
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126, Torino, Italy
| | - Corrado Pancotti
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126, Torino, Italy
| | - Dennis Lal
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, 9500 Euclid Avenue, Cleveland, OH 44195, USA
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126, Torino, Italy
| |
Collapse
|
49
|
Vila JA. Proteins' Evolution upon Point Mutations. ACS OMEGA 2022; 7:14371-14376. [PMID: 35573218 PMCID: PMC9089682 DOI: 10.1021/acsomega.2c01407] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Accepted: 04/05/2022] [Indexed: 05/03/2023]
Abstract
As the reader must be already aware, state-of-the-art protein folding prediction methods have reached a smashing success in their goal of accurately determining the three-dimensional structures of proteins. Yet, a solution to simple problems such as the effects of protein point mutations on their (i) native conformation; (ii) marginal stability; (iii) ensemble of high-energy nativelike conformations; and (iv) metamorphism propensity and, hence, their evolvability, remains as an unsolved problem. As a plausible solution to the latter, some properties of the amide hydrogen-deuterium exchange, a highly sensitive probe of the structure, stability, and folding of proteins, are assessed from a new perspective. The preliminary results indicate that the protein marginal stability change upon point mutations provides the necessary and sufficient information to estimate, through a Boltzmann factor, the evolution of the amide hydrogen exchange protection factors and, consequently, that of the ensemble of folded conformations coexisting with the native state. This work contributes to our general understanding of the effects of point mutations on proteins and may spur significant progress in our efforts to develop methods to determine the appearance of new folds and functions accurately.
Collapse
|
50
|
Rokni M, Heidari Nia M, Sarhadi M, Mirinejad S, Sargazi S, Moudi M, Saravani R, Rahdar S, Kargar M. Association of TMPRSS2 Gene Polymorphisms with COVID-19 Severity and Mortality: a Case-Control Study with Computational Analyses. Appl Biochem Biotechnol 2022; 194:3507-3526. [PMID: 35386063 PMCID: PMC8986508 DOI: 10.1007/s12010-022-03885-w] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 03/14/2022] [Indexed: 12/12/2022]
Abstract
Coronavirus disease 2019 (COVID-19) is a severe disease caused by a new variant of beta-coronavirus that first appeared in China. Human genetic factors, including polymorphisms, serve pivotal roles in the high transmission of SARS-CoV-2 and the stubbornly progressing sickness seen in a small but significant percentage of infected people; however, but these factors remain ill-defined. A total of 288 COVID-19 patients and 288 controls were genotyped for TMPRSS2 polymorphisms using both restriction fragment length polymorphism polymerase chain reaction (RFLP-PCR) and amplification refractory mutation system (ARMS)-PCR techniques. Different genotypes of TMPRSS2 polymorphisms were compared in terms of disease susceptibility and mortality. The statistical analysis showed that minor alleles of all studied variants statistically increased the risk of COVID-19, except for the rs75603675 C > A variant. The T allele of rs12329760 conferred an increased risk of COVID-19. Moreover, the AG/AC/TT/AG combination of genotypes significantly enhanced the risk of COVID-19 in our population. Different haplotypes of rs17854725/rs75603675/rs12329760/rs4303795 polymorphisms, including GACA, GACG, GATG, GATA, AATA, ACCG, ACTG, ACTA, GCCA, and GCTG, were found to be associated with increased risk of the disease (odds ratio > 1). Regarding the clinical and paraclinical characteristics, a statistically significant difference was found between non-severe and severe forms except for gender, platelet, C-reactive protein (CRP), erythrocyte sedimentation rate (ESR), and underlying diseases. In addition, case genotypes of TMPRSS2 rs17854725 A > G, rs12329760 C > T, and rs4303795 A > G were significantly different regarding severe and non-severe forms of the disease (P-value < 0.001). Specifically, death was more frequent in carriers of the AG genotype of rs17854725 A > G (P-value = 0.022). Patients who carry the minor alleles of the four studied TMPRSS2 variants were rather vulnerable to COVID-19 infection. Our findings indicated that rs17854725 A > G (AA vs. AG and AA vs. GG), rs12329760 C > T (CC vs. CT and CC vs. TT), and rs4303795 A > G (AA vs. AG) genotypes of TMPRSS2 variations are associated with a more invasive disorder pattern. More studies on larger populations are needed to confirm our results.
Collapse
Affiliation(s)
- Mohsen Rokni
- Department of Immunology, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran.,Department of Immunology, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Milad Heidari Nia
- Cellular and Molecular Research Center, Research Institute of Cellular and Molecular Sciences in Infectious Diseases, Zahedan University of Medical Sciences, Zahedan, 9816743463, Iran
| | - Mohammad Sarhadi
- Cellular and Molecular Research Center, Research Institute of Cellular and Molecular Sciences in Infectious Diseases, Zahedan University of Medical Sciences, Zahedan, 9816743463, Iran
| | - Shekoufeh Mirinejad
- Cellular and Molecular Research Center, Research Institute of Cellular and Molecular Sciences in Infectious Diseases, Zahedan University of Medical Sciences, Zahedan, 9816743463, Iran
| | - Saman Sargazi
- Cellular and Molecular Research Center, Research Institute of Cellular and Molecular Sciences in Infectious Diseases, Zahedan University of Medical Sciences, Zahedan, 9816743463, Iran.
| | - Mahdiyeh Moudi
- Genetics of Non-Communicable Disease Research Center, Zahedan University of Medical Sciences, Zahedan, Iran
| | - Ramin Saravani
- Cellular and Molecular Research Center, Research Institute of Cellular and Molecular Sciences in Infectious Diseases, Zahedan University of Medical Sciences, Zahedan, 9816743463, Iran.,Department of Clinical Biochemistry, School of Medicine, Zahedan University of Medical Sciences, Zahedan, Iran
| | - Sara Rahdar
- Cellular and Molecular Research Center, Research Institute of Cellular and Molecular Sciences in Infectious Diseases, Zahedan University of Medical Sciences, Zahedan, 9816743463, Iran
| | - Maryam Kargar
- Department of Laboratory Hematology and Blood Bank, School of Allied Medical Science, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|