1
|
Harini K, Sekijima M, Gromiha MM. Bioinformatics Approaches for Understanding the Binding Affinity of Protein-Nucleic Acid Complexes. Methods Mol Biol 2025; 2867:315-330. [PMID: 39576589 DOI: 10.1007/978-1-0716-4196-5_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
Protein-nucleic acid interactions are involved in various biological processes such as gene expression, replication, transcription, translation, and packaging. Understanding the recognition mechanism of the protein-nucleic acid complexes has been investigated from different perspectives, including the binding affinities of protein-DNA and protein-RNA complexes. Experimentally, protein-nucleic acid interactions are analyzed using X-ray crystallography, Isothermal Titration Calorimetry (ITC), DNA/RNA pull-down assays, DNA/RNA footprinting, and systematic evolution of ligands by exponential enrichment (SELEX). On the other hand, numerous databases and computational tools have been developed to study protein-nucleic acid complexes based on their binding sites, specific interactions between them, and binding affinity. In this chapter, we discuss various databases for protein-nucleic acid complex structures and the tools available to extract features from them. Further, we provide details on databases and prediction methods reported for exploring the binding affinity of protein-nucleic acid complexes along with important structure-based parameters, which govern the binding affinity.
Collapse
Affiliation(s)
- K Harini
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, India
| | - Masakazu Sekijima
- Department of Computer Science, Tokyo Institute of Technology, Yokohama, Japan
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, India.
- International Research Frontiers Initiative, School of Computing, Tokyo Institute of Technology, Yokohama, Japan.
| |
Collapse
|
2
|
Fang Z, Li Z, Li M, Yue Z, Li K. Prediction of Protein-DNA Interface Hot Spots Based on Empirical Mode Decomposition and Machine Learning. Genes (Basel) 2024; 15:676. [PMID: 38927611 PMCID: PMC11202800 DOI: 10.3390/genes15060676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 05/20/2024] [Accepted: 05/22/2024] [Indexed: 06/28/2024] Open
Abstract
Protein-DNA complex interactivity plays a crucial role in biological activities such as gene expression, modification, replication and transcription. Understanding the physiological significance of protein-DNA binding interfacial hot spots, as well as the development of computational biology, depends on the precise identification of these regions. In this paper, a hot spot prediction method called EC-PDH is proposed. First, we extracted features of these hot spots' solid solvent-accessible surface area (ASA) and secondary structure, and then the mean, variance, energy and autocorrelation function values of the first three intrinsic modal components (IMFs) of these conventional features were extracted as new features via the empirical modal decomposition algorithm (EMD). A total of 218 dimensional features were obtained. For feature selection, we used the maximum correlation minimum redundancy sequence forward selection method (mRMR-SFS) to obtain an optimal 11-dimensional-feature subset. To address the issue of data imbalance, we used the SMOTE-Tomek algorithm to balance positive and negative samples and finally used cat gradient boosting (CatBoost) to construct our hot spot prediction model for protein-DNA binding interfaces. Our method performs well on the test set, with AUC, MCC and F1 score values of 0.847, 0.543 and 0.772, respectively. After a comparative evaluation, EC-PDH outperforms the existing state-of-the-art methods in identifying hot spots.
Collapse
Affiliation(s)
- Zirui Fang
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei 230036, China; (Z.F.); (Z.L.); (M.L.)
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei 230036, China
| | - Zixuan Li
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei 230036, China; (Z.F.); (Z.L.); (M.L.)
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei 230036, China
| | - Ming Li
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei 230036, China; (Z.F.); (Z.L.); (M.L.)
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei 230036, China
| | - Zhenyu Yue
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei 230036, China; (Z.F.); (Z.L.); (M.L.)
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei 230036, China
| | - Ke Li
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei 230036, China; (Z.F.); (Z.L.); (M.L.)
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei 230036, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei 230088, China
| |
Collapse
|
3
|
Sun Y, Wu H, Xu Z, Yue Z, Li K. Prediction of hot spots in protein-DNA binding interfaces based on discrete wavelet transform and wavelet packet transform. BMC Bioinformatics 2023; 24:129. [PMID: 37016308 PMCID: PMC10074722 DOI: 10.1186/s12859-023-05263-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 03/30/2023] [Indexed: 04/06/2023] Open
Abstract
BACKGROUND Identification of hot spots in protein-DNA binding interfaces is extremely important for understanding the underlying mechanisms of protein-DNA interactions and drug design. Since experimental methods for identifying hot spots are time-consuming and expensive, and most of the existing computational methods are based on traditional protein-DNA features to predict hot spots, unable to make full use of the effective information in the features. RESULTS In this work, a method named WTL-PDH is proposed for hot spots prediction. To deal with the unbalanced dataset, we used the Synthetic Minority Over-sampling Technique to generate minority class samples to achieve the balance of dataset. First, we extracted the solvent accessible surface area features and structural features, and then processed the traditional features using discrete wavelet transform and wavelet packet transform to extract the wavelet energy information and wavelet entropy information, and obtained a total of 175 dimensional features. In order to obtain the best feature subset, we systematically evaluate these features in various feature selection strategies. Finally, light gradient boosting machine (LightGBM) was used to establish the model. CONCLUSIONS Our method achieved good results on independent test set with AUC, MCC and F1 scores of 0.838, 0.533 and 0.750, respectively. WTL-PDH can achieve generally better performance in predicting hot spots when compared with state-of-the-art methods. The dataset and source code are available at https://github.com/chase2555/WTL-PDH .
Collapse
Affiliation(s)
- Yu Sun
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Hongwei Wu
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Zhengrong Xu
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Zhenyu Yue
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Ke Li
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China.
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, Hefei, 230601, Anhui, China.
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| |
Collapse
|
4
|
Ovek D, Abali Z, Zeylan ME, Keskin O, Gursoy A, Tuncbag N. Artificial intelligence based methods for hot spot prediction. Curr Opin Struct Biol 2021; 72:209-218. [PMID: 34954608 DOI: 10.1016/j.sbi.2021.11.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 10/07/2021] [Accepted: 11/08/2021] [Indexed: 11/29/2022]
Abstract
Proteins interact through their interfaces to fulfill essential functions in the cell. They bind to their partners in a highly specific manner and form complexes that have a profound effect on understanding the biological pathways they are involved in. Any abnormal interactions may cause diseases. Therefore, the identification of small molecules which modulate protein interactions through their interfaces has high therapeutic potential. However, discovering such molecules is challenging. Most protein-protein binding affinity is attributed to a small set of amino acids found in protein interfaces known as hot spots. Recent studies demonstrate that drug-like small molecules specifically may bind to hot spots. Therefore, hot spot prediction is crucial. As experimental data accumulates, artificial intelligence begins to be used for computational hot spot prediction. First, we review machine learning and deep learning for computational hot spot prediction and then explain the significance of hot spots toward drug design.
Collapse
Affiliation(s)
- Damla Ovek
- College of Engineering, Koc University, 34450 Istanbul, Turkey
| | - Zeynep Abali
- College of Engineering, Koc University, 34450 Istanbul, Turkey
| | | | - Ozlem Keskin
- College of Engineering, Koc University, 34450 Istanbul, Turkey.
| | - Attila Gursoy
- College of Engineering, Koc University, 34450 Istanbul, Turkey.
| | - Nurcan Tuncbag
- College of Engineering, Koc University, 34450 Istanbul, Turkey; School of Medicine, Koc University, 34450 Istanbul, Turkey.
| |
Collapse
|