1
|
Xu W, Li A, Zhao Y, Peng Y. Decoding the effects of mutation on protein interactions using machine learning. BIOPHYSICS REVIEWS 2025; 6:011307. [PMID: 40013003 PMCID: PMC11857871 DOI: 10.1063/5.0249920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2024] [Accepted: 01/14/2025] [Indexed: 02/28/2025]
Abstract
Accurately predicting mutation-caused binding free energy changes (ΔΔGs) on protein interactions is crucial for understanding how genetic variations affect interactions between proteins and other biomolecules, such as proteins, DNA/RNA, and ligands, which are vital for regulating numerous biological processes. Developing computational approaches with high accuracy and efficiency is critical for elucidating the mechanisms underlying various diseases, identifying potential biomarkers for early diagnosis, and developing targeted therapies. This review provides a comprehensive overview of recent advancements in predicting the impact of mutations on protein interactions across different interaction types, which are central to understanding biological processes and disease mechanisms, including cancer. We summarize recent progress in predictive approaches, including physicochemical-based, machine learning, and deep learning methods, evaluating the strengths and limitations of each. Additionally, we discuss the challenges related to the limitations of mutational data, including biases, data quality, and dataset size, and explore the difficulties in developing accurate prediction tools for mutation-induced effects on protein interactions. Finally, we discuss future directions for advancing these computational tools, highlighting the capabilities of advancing technologies, such as artificial intelligence to drive significant improvements in mutational effects prediction.
Collapse
Affiliation(s)
- Wang Xu
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan 430079, China
| | - Anbang Li
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan 430079, China
| | - Yunjie Zhao
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan 430079, China
| | - Yunhui Peng
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan 430079, China
| |
Collapse
|
2
|
Gromiha MM, Harini K. Protein-nucleic acid complexes: Docking and binding affinity. Curr Opin Struct Biol 2025; 90:102955. [PMID: 39616716 DOI: 10.1016/j.sbi.2024.102955] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 10/02/2024] [Accepted: 11/04/2024] [Indexed: 02/05/2025]
Abstract
Protein-nucleic interactions play essential roles in several biological processes, such as gene regulation, replication, transcription, repair and packaging. The knowledge of three-dimensional structures of protein-nucleic acid complexes and their binding affinities helps to understand these functions. In this review, we focus on two major aspects namely, (i) deciphering the three-dimensional structures of protein-nucleic acid complexes and (ii) predicting their binding affinities. The first part is devoted to the state-of-the-art methods for predicting the native structures and their performances including recent CASP targets. The second part is focused on different aspects of investigating the binding affinity of protein-nucleic acid complexes: (i) databases for thermodynamic parameters to understand the binding affinity, (ii) important features determining protein-nucleic acid binding affinity, (iii) predicting the binding affinity of protein-nucleic acid complexes using sequence and structure-based parameters and (iv) change in binding affinity upon mutation. It includes the latest developments in protein-nucleic acid docking algorithms and binding affinity predictions along with a list of computational resources for understanding protein-DNA and protein-RNA interactions.
Collapse
Affiliation(s)
- M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, Tamil Nadu, India.
| | - K Harini
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, Tamil Nadu, India
| |
Collapse
|
3
|
Rimal P, Paul SK, Panday SK, Alexov E. Further Development of SAMPDI-3D: A Machine Learning Method for Predicting Binding Free Energy Changes Caused by Mutations in Either Protein or DNA. Genes (Basel) 2025; 16:101. [PMID: 39858648 PMCID: PMC11764785 DOI: 10.3390/genes16010101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2024] [Revised: 01/15/2025] [Accepted: 01/16/2025] [Indexed: 01/27/2025] Open
Abstract
BACKGROUND/OBJECTIVES Predicting the effects of protein and DNA mutations on the binding free energy of protein-DNA complexes is crucial for understanding how DNA variants impact wild-type cellular function. As many cellular interactions involve protein-DNA binding, accurately predicting changes in binding free energy (ΔΔG) is valuable for distinguishing pathogenic mutations from benign ones. METHODS This study describes the development and optimization of the SAMPDI-3Dv2 machine learning method, which is trained on an expanded database of experimentally measured ΔΔGs. This enhanced model incorporates new features, including the 3D structure of the mutant protein, features of the mutant structure, and a position-specific scoring matrix (PSSM). Benchmarking was conducted using 5-fold cross-validation. RESULTS The updated SAMPDI-3D model (SAMPDI-3Dv2) achieved Pearson correlation coefficients (PCCs) of 0.68 for protein and 0.80 for DNA mutations. These results represent significant improvements over existing tools. Additionally, the method's rapid execution time enables genome-scale predictions. CONCLUSIONS The improved SAMPDI-3Dv2 shows enhanced predictive performance for analyzing mutations in protein-DNA complexes. By leveraging structural information and an expanded training dataset, SAMPDI-3Dv2 provides researchers with a more accurate and efficient tool for mutation analysis, contributing to identifying pathogenic variants and improving our understanding of cellular function.
Collapse
Affiliation(s)
| | | | | | - Emil Alexov
- Department of Physics and Astronomy, College of Science, Clemson University, Clemson, SC 29634, USA; (P.R.); (S.K.P.); (S.K.P.)
| |
Collapse
|
4
|
Harini K, Sekijima M, Gromiha MM. Bioinformatics Approaches for Understanding the Binding Affinity of Protein-Nucleic Acid Complexes. Methods Mol Biol 2025; 2867:315-330. [PMID: 39576589 DOI: 10.1007/978-1-0716-4196-5_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
Protein-nucleic acid interactions are involved in various biological processes such as gene expression, replication, transcription, translation, and packaging. Understanding the recognition mechanism of the protein-nucleic acid complexes has been investigated from different perspectives, including the binding affinities of protein-DNA and protein-RNA complexes. Experimentally, protein-nucleic acid interactions are analyzed using X-ray crystallography, Isothermal Titration Calorimetry (ITC), DNA/RNA pull-down assays, DNA/RNA footprinting, and systematic evolution of ligands by exponential enrichment (SELEX). On the other hand, numerous databases and computational tools have been developed to study protein-nucleic acid complexes based on their binding sites, specific interactions between them, and binding affinity. In this chapter, we discuss various databases for protein-nucleic acid complex structures and the tools available to extract features from them. Further, we provide details on databases and prediction methods reported for exploring the binding affinity of protein-nucleic acid complexes along with important structure-based parameters, which govern the binding affinity.
Collapse
Affiliation(s)
- K Harini
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, India
| | - Masakazu Sekijima
- Department of Computer Science, Tokyo Institute of Technology, Yokohama, Japan
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, India.
- International Research Frontiers Initiative, School of Computing, Tokyo Institute of Technology, Yokohama, Japan.
| |
Collapse
|
5
|
Xiao SR, Zhang YK, Liu KY, Huang YX, Liu R. PNBACE: an ensemble algorithm to predict the effects of mutations on protein-nucleic acid binding affinity. BMC Biol 2024; 22:203. [PMID: 39256728 PMCID: PMC11389284 DOI: 10.1186/s12915-024-02006-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Accepted: 09/03/2024] [Indexed: 09/12/2024] Open
Abstract
BACKGROUND Mutations occurring in nucleic acids or proteins may affect the binding affinities of protein-nucleic acid interactions. Although many efforts have been devoted to the impact of protein mutations, few computational studies have addressed the effect of nucleic acid mutations and explored whether the identical methodology could be applied to the prediction of binding affinity changes caused by these two mutation types. RESULTS Here, we developed a generalized algorithm named PNBACE for both DNA and protein mutations. We first demonstrated that DNA mutations could induce varying degrees of changes in binding affinity from multiple perspectives. We then designed a group of energy-based topological features based on different energy networks, which were combined with our previous partition-based energy features to construct individual prediction models through feature selections. Furthermore, we created an ensemble model by integrating the outputs of individual models using a differential evolution algorithm. In addition to predicting the impact of single-point mutations, PNBACE could predict the influence of multiple-point mutations and identify mutations significantly reducing binding affinities. Extensive comparisons indicated that PNBACE largely performed better than existing methods on both regression and classification tasks. CONCLUSIONS PNBACE is an effective method for estimating the binding affinity changes of protein-nucleic acid complexes induced by DNA or protein mutations, therefore improving our understanding of the interactions between proteins and DNA/RNA.
Collapse
Affiliation(s)
- Si-Rui Xiao
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, People's Republic of China
| | - Yao-Kun Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, People's Republic of China
| | - Kai-Yu Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, People's Republic of China
| | - Yu-Xiang Huang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, People's Republic of China
| | - Rong Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, People's Republic of China.
| |
Collapse
|
6
|
Fang Z, Li Z, Li M, Yue Z, Li K. Prediction of Protein-DNA Interface Hot Spots Based on Empirical Mode Decomposition and Machine Learning. Genes (Basel) 2024; 15:676. [PMID: 38927611 PMCID: PMC11202800 DOI: 10.3390/genes15060676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 05/20/2024] [Accepted: 05/22/2024] [Indexed: 06/28/2024] Open
Abstract
Protein-DNA complex interactivity plays a crucial role in biological activities such as gene expression, modification, replication and transcription. Understanding the physiological significance of protein-DNA binding interfacial hot spots, as well as the development of computational biology, depends on the precise identification of these regions. In this paper, a hot spot prediction method called EC-PDH is proposed. First, we extracted features of these hot spots' solid solvent-accessible surface area (ASA) and secondary structure, and then the mean, variance, energy and autocorrelation function values of the first three intrinsic modal components (IMFs) of these conventional features were extracted as new features via the empirical modal decomposition algorithm (EMD). A total of 218 dimensional features were obtained. For feature selection, we used the maximum correlation minimum redundancy sequence forward selection method (mRMR-SFS) to obtain an optimal 11-dimensional-feature subset. To address the issue of data imbalance, we used the SMOTE-Tomek algorithm to balance positive and negative samples and finally used cat gradient boosting (CatBoost) to construct our hot spot prediction model for protein-DNA binding interfaces. Our method performs well on the test set, with AUC, MCC and F1 score values of 0.847, 0.543 and 0.772, respectively. After a comparative evaluation, EC-PDH outperforms the existing state-of-the-art methods in identifying hot spots.
Collapse
Affiliation(s)
- Zirui Fang
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei 230036, China; (Z.F.); (Z.L.); (M.L.)
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei 230036, China
| | - Zixuan Li
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei 230036, China; (Z.F.); (Z.L.); (M.L.)
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei 230036, China
| | - Ming Li
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei 230036, China; (Z.F.); (Z.L.); (M.L.)
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei 230036, China
| | - Zhenyu Yue
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei 230036, China; (Z.F.); (Z.L.); (M.L.)
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei 230036, China
| | - Ke Li
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei 230036, China; (Z.F.); (Z.L.); (M.L.)
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei 230036, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei 230088, China
| |
Collapse
|
7
|
Tao L, Zhou T, Wu Z, Hu F, Yang S, Kong X, Li C. ESPDHot: An Effective Machine Learning-Based Approach for Predicting Protein-DNA Interaction Hotspots. J Chem Inf Model 2024; 64:3548-3557. [PMID: 38587997 DOI: 10.1021/acs.jcim.3c02011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/10/2024]
Abstract
Protein-DNA interactions are pivotal to various cellular processes. Precise identification of the hotspot residues for protein-DNA interactions holds great significance for revealing the intricate mechanisms in protein-DNA recognition and for providing essential guidance for protein engineering. Aiming at protein-DNA interaction hotspots, this work introduces an effective prediction method, ESPDHot based on a stacked ensemble machine learning framework. Here, the interface residue whose mutation leads to a binding free energy change (ΔΔG) exceeding 2 kcal/mol is defined as a hotspot. To tackle the imbalanced data set issue, the adaptive synthetic sampling (ADASYN), an oversampling technique, is adopted to synthetically generate new minority samples, thereby rectifying data imbalance. As for molecular characteristics, besides traditional features, we introduce three new characteristic types including residue interface preference proposed by us, residue fluctuation dynamics characteristics, and coevolutionary features. Combining the Boruta method with our previously developed Random Grouping strategy, we obtained an optimal set of features. Finally, a stacking classifier is constructed to output prediction results, which integrates three classical predictors, Support Vector Machine (SVM), XGBoost, and Artificial Neural Network (ANN) as the first layer, and Logistic Regression (LR) algorithm as the second one. Notably, ESPDHot outperforms the current state-of-the-art predictors, achieving superior performance on the independent test data set, with F1, MCC, and AUC reaching 0.571, 0.516, and 0.870, respectively.
Collapse
Affiliation(s)
- Lianci Tao
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Tong Zhou
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Zhixiang Wu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Fangrui Hu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Shuang Yang
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Xiaotian Kong
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Chunhua Li
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| |
Collapse
|
8
|
Pandey U, Behara SM, Sharma S, Patil RS, Nambiar S, Koner D, Bhukya H. DeePNAP: A Deep Learning Method to Predict Protein-Nucleic Acid Binding Affinity from Their Sequences. J Chem Inf Model 2024; 64:1806-1815. [PMID: 38458968 DOI: 10.1021/acs.jcim.3c01151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/10/2024]
Abstract
Predicting the protein-nucleic acid (PNA) binding affinity solely from their sequences is of paramount importance for the experimental design and analysis of PNA interactions (PNAIs). A large number of currently developed models for binding affinity prediction are limited to specific PNAIs while also relying on the sequence and structural information of the PNA complexes for both training and testing, and also as inputs. As the PNA complex structures available are scarce, this significantly limits the diversity and generalizability due to the small training data set. Additionally, a majority of the tools predict a single parameter, such as binding affinity or free energy changes upon mutations, rendering a model less versatile for usage. Hence, we propose DeePNAP, a machine learning-based model built from a vast and heterogeneous data set with 14,401 entries (from both eukaryotes and prokaryotes) from the ProNAB database, consisting of wild-type and mutant PNA complex binding parameters. Our model precisely predicts the binding affinity and free energy changes due to the mutation(s) of PNAIs exclusively from their sequences. While other similar tools extract features from both sequence and structure information, DeePNAP employs sequence-based features to yield high correlation coefficients between the predicted and experimental values with low root mean squared errors for PNA complexes in predicting KD and ΔΔG, implying the generalizability of DeePNAP. Additionally, we have also developed a web interface hosting DeePNAP that can serve as a powerful tool to rapidly predict binding affinities for a myriad of PNAIs with high precision toward developing a deeper understanding of their implications in various biological systems. Web interface: http://14.139.174.41:8080/.
Collapse
Affiliation(s)
- Uddeshya Pandey
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| | - Sasi M Behara
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| | - Siddhant Sharma
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| | - Rachit S Patil
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| | - Souparnika Nambiar
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| | - Debasish Koner
- Department of Chemistry, Indian Institute of Technology Hyderabad, Kandi 502284, India
| | - Hussain Bhukya
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| |
Collapse
|
9
|
Xu W, Zhang H, Guo W, Jiang L, Zhao Y, Peng Y. Deciphering principles of nucleosome interactions and impact of cancer-associated mutations from comprehensive interaction network analysis. Brief Bioinform 2024; 25:bbad532. [PMID: 38329268 PMCID: PMC10851104 DOI: 10.1093/bib/bbad532] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 11/30/2023] [Accepted: 12/23/2023] [Indexed: 02/09/2024] Open
Abstract
Nucleosomes represent hubs in chromatin organization and gene regulation and interact with a plethora of chromatin factors through different modes. In addition, alterations in histone proteins such as cancer mutations and post-translational modifications have profound effects on histone/nucleosome interactions. To elucidate the principles of histone interactions and the effects of those alterations, we developed histone interactomes for comprehensive mapping of histone-histone interactions (HHIs), histone-DNA interactions (HDIs), histone-partner interactions (HPIs) and DNA-partner interactions (DPIs) of 37 organisms, which contains a total of 3808 HPIs from 2544 binding proteins and 339 HHIs, 100 HDIs and 142 DPIs across 110 histone variants. With the developed networks, we explored histone interactions at different levels of granularities (protein-, domain- and residue-level) and performed systematic analysis on histone interactions at a large scale. Our analyses have characterized the preferred binding hotspots on both nucleosomal/linker DNA and histone octamer and unraveled diverse binding modes between nucleosome and different classes of binding partners. Last, to understand the impact of histone cancer-associated mutations on histone/nucleosome interactions, we complied one comprehensive cancer mutation dataset including 7940 cancer-associated histone mutations and further mapped those mutations onto 419,125 histone interactions at the residue level. Our quantitative analyses point to histone cancer-associated mutations' strongly disruptive effects on HHIs, HDIs and HPIs. We have further predicted 57 recurrent histone cancer mutations that have large effects on histone/nucleosome interactions and may have driver status in oncogenesis.
Collapse
Affiliation(s)
- Wang Xu
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan 430079, China
| | - Houfang Zhang
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan 430079, China
| | - Wenhan Guo
- Computational Science Program, University of Texas at El Paso, El Paso, TX 79902, USA
| | - Lijun Jiang
- Hubei Key Laboratory of Genetic Regulation & Integrative Biology, School of Life Sciences, Central China Normal University, Wuhan 430079, China
| | - Yunjie Zhao
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan 430079, China
| | - Yunhui Peng
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan 430079, China
| |
Collapse
|
10
|
Shirvanizadeh N, Vihinen M. VariBench, new variation benchmark categories and data sets. FRONTIERS IN BIOINFORMATICS 2023; 3:1248732. [PMID: 37795169 PMCID: PMC10546188 DOI: 10.3389/fbinf.2023.1248732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 09/08/2023] [Indexed: 10/06/2023] Open
Affiliation(s)
| | - Mauno Vihinen
- Department of Experimental Medical Science, Lund University, Lund, Sweden
| |
Collapse
|
11
|
Zhang X, Mei LC, Gao YY, Hao GF, Song BA. Web tools support predicting protein-nucleic acid complexes stability with affinity changes. WILEY INTERDISCIPLINARY REVIEWS. RNA 2023; 14:e1781. [PMID: 36693636 DOI: 10.1002/wrna.1781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 11/10/2022] [Accepted: 11/28/2022] [Indexed: 01/26/2023]
Abstract
Numerous biological processes, such as transcription, replication, and translation, rely on protein-nucleic acid interactions (PNIs). Demonstrating the binding stability of protein-nucleic acid complexes is vital to deciphering the code for PNIs. Numerous web-based tools have been developed to attach importance to protein-nucleic acid stability, facilitating the prediction of PNIs characteristics rapidly. However, the data and tools are dispersed and lack comprehensive integration to understand the stability of PNIs better. In this review, we first summarize existing databases for evaluating the stability of protein-nucleic acid binding. Then, we compare and evaluate the pros and cons of web tools for forecasting the interaction energies of protein-nucleic acid complexes. Finally, we discuss the application of combining models and capabilities of PNIs. We may hope these web-based tools will facilitate the discovery of recognition mechanisms for protein-nucleic acid binding stability. This article is categorized under: RNA Interactions with Proteins and Other Molecules > Protein-RNA Recognition RNA Interactions with Proteins and Other Molecules > RNA-Protein Complexes RNA Interactions with Proteins and Other Molecules > Protein-RNA Interactions: Functional Implications.
Collapse
Affiliation(s)
- Xiao Zhang
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang, China
| | - Long-Can Mei
- National Key Laboratory of Green Pesticide, Central China Normal University, Wuhan, China
| | - Yang-Yang Gao
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang, China
| | - Ge-Fei Hao
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang, China
- National Key Laboratory of Green Pesticide, Central China Normal University, Wuhan, China
| | - Bao-An Song
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang, China
| |
Collapse
|
12
|
Pandey P, Panday SK, Rimal P, Ancona N, Alexov E. Predicting the Effect of Single Mutations on Protein Stability and Binding with Respect to Types of Mutations. Int J Mol Sci 2023; 24:12073. [PMID: 37569449 PMCID: PMC10418460 DOI: 10.3390/ijms241512073] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 07/24/2023] [Accepted: 07/26/2023] [Indexed: 08/13/2023] Open
Abstract
The development of methods and algorithms to predict the effect of mutations on protein stability, protein-protein interaction, and protein-DNA/RNA binding is necessitated by the needs of protein engineering and for understanding the molecular mechanism of disease-causing variants. The vast majority of the leading methods require a database of experimentally measured folding and binding free energy changes for training. These databases are collections of experimental data taken from scientific investigations typically aimed at probing the role of particular residues on the above-mentioned thermodynamic characteristics, i.e., the mutations are not introduced at random and do not necessarily represent mutations originating from single nucleotide variants (SNV). Thus, the reported performance of the leading algorithms assessed on these databases or other limited cases may not be applicable for predicting the effect of SNVs seen in the human population. Indeed, we demonstrate that the SNVs and non-SNVs are not equally presented in the corresponding databases, and the distribution of the free energy changes is not the same. It is shown that the Pearson correlation coefficients (PCCs) of folding and binding free energy changes obtained in cases involving SNVs are smaller than for non-SNVs, indicating that caution should be used in applying them to reveal the effect of human SNVs. Furthermore, it is demonstrated that some methods are sensitive to the chemical nature of the mutations, resulting in PCCs that differ by a factor of four across chemically different mutations. All methods are found to underestimate the energy changes by roughly a factor of 2.
Collapse
Affiliation(s)
- Preeti Pandey
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA; (P.P.); (S.K.P.); (P.R.)
| | - Shailesh Kumar Panday
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA; (P.P.); (S.K.P.); (P.R.)
| | - Prawin Rimal
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA; (P.P.); (S.K.P.); (P.R.)
| | - Nicolas Ancona
- Department of Biological Sciences, Clemson University, Clemson, SC 29634, USA;
| | - Emil Alexov
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA; (P.P.); (S.K.P.); (P.R.)
| |
Collapse
|
13
|
Pandey P, Ghimire S, Wu B, Alexov E. On the linkage of thermodynamics and pathogenicity. Curr Opin Struct Biol 2023; 80:102572. [PMID: 36965249 PMCID: PMC10239362 DOI: 10.1016/j.sbi.2023.102572] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 02/16/2023] [Accepted: 02/21/2023] [Indexed: 03/27/2023]
Abstract
This review outlines the effect of disease-causing mutations on proteins' thermodynamics. Two major thermodynamics quantities, which are essential for structural integrity, the folding and binding free energy changes caused by missense mutations, are considered. It is emphasized that disease effects in case of complex diseases may originate from several mutations over several genes, while monogenic diseases are caused by mutation is a single gene. Nevertheless, in both cases it is shown that pathogenic mutations cause larger perturbations of the above-mentioned thermodynamics quantities as compared with the benign mutations. Recent works demonstrating the effect of pathogenic mutations on the above-mentioned thermodynamics quantities, as well as on structural dynamics and allosteric pathways, are reviewed.
Collapse
Affiliation(s)
- Preeti Pandey
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| | - Sanjeev Ghimire
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| | - Bohua Wu
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| | - Emil Alexov
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA.
| |
Collapse
|
14
|
Sun Y, Wu H, Xu Z, Yue Z, Li K. Prediction of hot spots in protein-DNA binding interfaces based on discrete wavelet transform and wavelet packet transform. BMC Bioinformatics 2023; 24:129. [PMID: 37016308 PMCID: PMC10074722 DOI: 10.1186/s12859-023-05263-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 03/30/2023] [Indexed: 04/06/2023] Open
Abstract
BACKGROUND Identification of hot spots in protein-DNA binding interfaces is extremely important for understanding the underlying mechanisms of protein-DNA interactions and drug design. Since experimental methods for identifying hot spots are time-consuming and expensive, and most of the existing computational methods are based on traditional protein-DNA features to predict hot spots, unable to make full use of the effective information in the features. RESULTS In this work, a method named WTL-PDH is proposed for hot spots prediction. To deal with the unbalanced dataset, we used the Synthetic Minority Over-sampling Technique to generate minority class samples to achieve the balance of dataset. First, we extracted the solvent accessible surface area features and structural features, and then processed the traditional features using discrete wavelet transform and wavelet packet transform to extract the wavelet energy information and wavelet entropy information, and obtained a total of 175 dimensional features. In order to obtain the best feature subset, we systematically evaluate these features in various feature selection strategies. Finally, light gradient boosting machine (LightGBM) was used to establish the model. CONCLUSIONS Our method achieved good results on independent test set with AUC, MCC and F1 scores of 0.838, 0.533 and 0.750, respectively. WTL-PDH can achieve generally better performance in predicting hot spots when compared with state-of-the-art methods. The dataset and source code are available at https://github.com/chase2555/WTL-PDH .
Collapse
Affiliation(s)
- Yu Sun
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Hongwei Wu
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Zhengrong Xu
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Zhenyu Yue
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Ke Li
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China.
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, Hefei, 230601, Anhui, China.
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| |
Collapse
|