1
|
Lv SQ, Zeng X, Su GP, Du WF, Li Y, Wen ML. Improving Identification of Drug-Target Binding Sites Based on Structures of Targets Using Residual Graph Transformer Network. Biomolecules 2025; 15:221. [PMID: 40001524 PMCID: PMC11853427 DOI: 10.3390/biom15020221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Revised: 01/28/2025] [Accepted: 01/28/2025] [Indexed: 02/27/2025] Open
Abstract
Improving identification of drug-target binding sites can significantly aid in drug screening and design, thereby accelerating the drug development process. However, due to challenges such as insufficient fusion of multimodal information from targets and imbalanced datasets, enhancing the performance of drug-target binding sites prediction models remains exceptionally difficult. Leveraging structures of targets, we proposed a novel deep learning framework, RGTsite, which employed a Residual Graph Transformer Network to improve the identification of drug-target binding sites. First, a residual 1D convolutional neural network (1D-CNN) and the pre-trained model ProtT5 were employed to extract the local and global sequence features from the target, respectively. These features were then combined with the physicochemical properties of amino acid residues to serve as the vertex features in graph. Next, the edge features were incorporated, and the residual graph transformer network (GTN) was applied to extract the more comprehensive vertex features. Finally, a fully connected network was used to classify whether the vertex was a binding site. Experimental results showed that RGTsite outperformed the existing state-of-the-art methods in key evaluation metrics, such as F1-score (F1) and Matthews Correlation Coefficient (MCC), across multiple benchmark datasets. Additionally, we conducted interpretability analysis for RGTsite through the real-world cases, and the results confirmed that RGTsite can effectively identify drug-target binding sites in practical applications.
Collapse
Affiliation(s)
- Shuang-Qing Lv
- Faculty of Surveying and Information Engineering, West Yunnan University of Applied Sciences, Dali 671000, China;
| | - Xin Zeng
- College of Mathematics and Computer Science, Dali University, Dali 671003, China; (X.Z.)
| | - Guang-Peng Su
- College of Mathematics and Computer Science, Dali University, Dali 671003, China; (X.Z.)
| | - Wen-Feng Du
- College of Mathematics and Computer Science, Dali University, Dali 671003, China; (X.Z.)
| | - Yi Li
- College of Mathematics and Computer Science, Dali University, Dali 671003, China; (X.Z.)
| | - Meng-Liang Wen
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming 650000, China
| |
Collapse
|
2
|
Chai L, Gao J, Li Z, Sun H, Liu J, Wang Y, Zhang L. Predicting CTCF cell type active binding sites in human genome. Sci Rep 2024; 14:31744. [PMID: 39738353 PMCID: PMC11686126 DOI: 10.1038/s41598-024-82238-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Accepted: 12/03/2024] [Indexed: 01/02/2025] Open
Abstract
The CCCTC-binding factor (CTCF) is pivotal in orchestrating diverse biological functions across the human genome, yet the mechanisms driving its cell type-active DNA binding affinity remain underexplored. Here, we collected ChIP-seq data from 67 cell lines in ENCODE, constructed a unique dataset of cell type-active CTCF binding sites (CBS), and trained convolutional neural networks (CNN) to dissect the patterns of CTCF binding activity. Our analysis reveals that transcription factors RAD21/SMC3 and chromatin accessibility are more predictive compared to sequence motifs and histone modifications. Integrating them together achieved AUPRC values consistently above 0.868, highlighting their utility in deciphering CTCF transcription factor binding dynamics. This study provides a deeper understanding of the regulatory functions of CTCF via machine learning framework.
Collapse
Affiliation(s)
- Lu Chai
- School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, People's Republic of China
| | - Jie Gao
- School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, People's Republic of China
| | - Zihan Li
- School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, People's Republic of China
| | - Hao Sun
- School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, People's Republic of China
| | - Junjie Liu
- School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, People's Republic of China
| | - Yong Wang
- CEMS, NCMIS, HCMS, MDIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, People's Republic of China.
| | - Lirong Zhang
- School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, People's Republic of China.
| |
Collapse
|
3
|
Wang W, Su X, Liu D, Zhang H, Wang X, Zhou Y. Predicting DNA-binding protein and coronavirus protein flexibility using protein dihedral angle and sequence feature. Proteins 2023; 91:497-507. [PMID: 36321218 PMCID: PMC9877568 DOI: 10.1002/prot.26443] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Revised: 09/07/2022] [Accepted: 10/20/2022] [Indexed: 11/07/2022]
Abstract
The flexibility of protein structure is related to various biological processes, such as molecular recognition, allosteric regulation, catalytic activity, and protein stability. At the molecular level, protein dynamics and flexibility are important factors to understand protein function. DNA-binding proteins and Coronavirus proteins are of great concern and relatively unique proteins. However, exploring the flexibility of DNA-binding proteins and Coronavirus proteins through experiments or calculations is a difficult process. Since protein dihedral rotational motion can be used to predict protein structural changes, it provides key information about protein local conformation. Therefore, this paper introduces a method to improve the accuracy of protein flexibility prediction, DihProFle (Prediction of DNA-binding proteins and Coronavirus proteins flexibility introduces the calculated dihedral Angle information). Based on protein dihedral Angle information, protein evolution information, and amino acid physical and chemical properties, DihProFle realizes the prediction of protein flexibility in two cases on DNA-binding proteins and Coronavirus proteins, and assigns flexibility class to each protein sequence position. In this study, compared with the flexible prediction using sequence evolution information, and physicochemical properties of amino acids, the flexible prediction accuracy based on protein dihedral Angle information, sequence evolution information and physicochemical properties of amino acids improved by 2.2% and 3.1% in the nonstrict and strict conditions, respectively. And DihProFle achieves better performance than previous methods for protein flexibility analysis. In addition, we further analyzed the correlation of amino acid properties and protein dihedral angles with residues flexibility. The results show that the charged hydrophilic residues have higher proportion in the flexible region, and the rigid region tends to be in the angular range of the protein dihedral angle (such as the ψ angle of amino acid residues is more flexible than rigid in the range of 91°-120°). Therefore, the results indicate that hydrophilic residues and protein dihedral angle information play an important role in protein flexibility.
Collapse
Affiliation(s)
- Wei Wang
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, China.,Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province, Xinxiang, China
| | - Xili Su
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
| | - Dong Liu
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
| | - Hongjun Zhang
- School of Computer Science and Technology, Anyang University, Anyang, China
| | - Xianfang Wang
- College of Computer Science and Technology Engineering, Henan Institute of Technology, Xinxiang, China
| | - Yun Zhou
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
| |
Collapse
|
4
|
Yu Y, Xu S, He R, Liang G. Application of Molecular Simulation Methods in Food Science: Status and Prospects. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2023; 71:2684-2703. [PMID: 36719790 DOI: 10.1021/acs.jafc.2c06789] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Molecular simulation methods, such as molecular docking, molecular dynamic (MD) simulation, and quantum chemical (QC) calculation, have become popular as characterization and/or virtual screening tools because they can visually display interaction details that in vitro experiments can not capture and quickly screen bioactive compounds from large databases with millions of molecules. Currently, interdisciplinary research has expanded molecular simulation technology from computer aided drug design (CADD) to food science. More food scientists are supporting their hypotheses/results with this technology. To understand better the use of molecular simulation methods, it is necessary to systematically summarize the latest applications and usage trends of molecular simulation methods in the research field of food science. However, this type of review article is rare. To bridge this gap, we have comprehensively summarized the principle, combination usage, and application of molecular simulation methods in food science. We also analyzed the limitations and future trends and offered valuable strategies with the latest technologies to help food scientists use molecular simulation methods.
Collapse
Affiliation(s)
- Yuandong Yu
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing400030, China
| | - Shiqi Xu
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing400030, China
| | - Ran He
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing400030, China
| | - Guizhao Liang
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing400030, China
| |
Collapse
|
5
|
Xiao P, Pan Y, Cai F, Tu H, Liu J, Yang X, Liang H, Zou X, Yang L, Duan J, Xv L, Feng L, Liu Z, Qian Y, Meng Y, Du J, Mei X, Lou T, Yin X, Tan Z. A deep learning based framework for the classification of multi- class capsule gastroscope image in gastroenterologic diagnosis. Front Physiol 2022; 13:1060591. [PMID: 36467700 PMCID: PMC9716070 DOI: 10.3389/fphys.2022.1060591] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 11/07/2022] [Indexed: 07/30/2023] Open
Abstract
Purpose: The purpose of this paper is to develop a method to automatic classify capsule gastroscope image into three categories to prevent high-risk factors for carcinogenesis, such as atrophic gastritis (AG). The purpose of this research work is to develop a deep learning framework based on transfer learning to classify capsule gastroscope image into three categories: normal gastroscopic image, chronic erosive gastritis images, and ulcer gastric image. Method: In this research work, we proposed deep learning framework based on transfer learning to classify capsule gastroscope image into three categories: normal gastroscopic image, chronic erosive gastritis images, and ulcer gastric image. We used VGG- 16, ResNet-50, and Inception V3 pre-trained models, fine-tuned them and adjust hyperparameters according to our classification problem. Results: A dataset containing 380 images was collected for each capsule gastroscope image category, and divided into training set and test set in a ratio of 70%, and 30% respectively, and then based on the dataset, three methods, including as VGG- 16, ResNet-50, and Inception v3 are used. We achieved highest accuracy of 94.80% by using VGG- 16 to diagnose and classify capsule gastroscopic images into three categories: normal gastroscopic image, chronic erosive gastritis images, and ulcer gastric image. Our proposed approach classified capsule gastroscope image with respectable specificity and accuracy. Conclusion: The primary technique and industry standard for diagnosing and treating numerous stomach problems is gastroscopy. Capsule gastroscope is a new screening tool for gastric diseases. However, a number of elements, including image quality of capsule endoscopy, the doctors' experience and fatigue, limit its effectiveness. Early identification is necessary for high-risk factors for carcinogenesis, such as atrophic gastritis (AG). Our suggested framework will help prevent incorrect diagnoses brought on by low image quality, individual experience, and inadequate gastroscopy inspection coverage, among other factors. As a result, the suggested approach will raise the standard of gastroscopy. Deep learning has great potential in gastritis image classification for assisting with achieving accurate diagnoses after endoscopic procedures.
Collapse
Affiliation(s)
- Ping Xiao
- Health Management Center, Shenzhen University General Hospital, Shenzhen University Clinical Medical Academy, Shenzhen University, Shenzhen, China
- Department of Otorhinolaryngology Head and Neck Surgery, Shenzhen Children’s Hospital, Shenzhen, China
| | - Yuhang Pan
- Health Management Center, Shenzhen University General Hospital, Shenzhen University Clinical Medical Academy, Shenzhen University, Shenzhen, China
| | - Feiyue Cai
- Health Management Center, Shenzhen University General Hospital, Shenzhen University Clinical Medical Academy, Shenzhen University, Shenzhen, China
- Shenzhen Nanshan District General Practice Alliance, Shenzhen, China
| | - Haoran Tu
- Group International Division, Shenzhen Senior High School, Shenzhen, China
| | - Junru Liu
- Health Management Center, Shenzhen University General Hospital, Shenzhen University Clinical Medical Academy, Shenzhen University, Shenzhen, China
| | - Xuemei Yang
- Health Management Center, Shenzhen University General Hospital, Shenzhen University Clinical Medical Academy, Shenzhen University, Shenzhen, China
| | - Huanling Liang
- Health Management Center, Shenzhen University General Hospital, Shenzhen University Clinical Medical Academy, Shenzhen University, Shenzhen, China
| | - Xueqing Zou
- Health Management Center, Shenzhen University General Hospital, Shenzhen University Clinical Medical Academy, Shenzhen University, Shenzhen, China
| | - Li Yang
- Health Management Center, Shenzhen University General Hospital, Shenzhen University Clinical Medical Academy, Shenzhen University, Shenzhen, China
| | - Jueni Duan
- Health Management Center, Shenzhen University General Hospital, Shenzhen University Clinical Medical Academy, Shenzhen University, Shenzhen, China
| | - Long Xv
- Department of Gastroenterology and Hepatology, Shenzhen University General Hospital, Shenzhen University Clinical Medical Academy, Shenzhen University, Shenzhen, China
| | - Lijuan Feng
- Department of Gastroenterology and Hepatology, Shenzhen University General Hospital, Shenzhen University Clinical Medical Academy, Shenzhen University, Shenzhen, China
| | - Zhenyu Liu
- Department of Gastroenterology and Hepatology, Shenzhen University General Hospital, Shenzhen University Clinical Medical Academy, Shenzhen University, Shenzhen, China
| | - Yun Qian
- Department of Gastroenterology and Hepatology, Shenzhen University General Hospital, Shenzhen University Clinical Medical Academy, Shenzhen University, Shenzhen, China
| | - Yu Meng
- Department of Gastroenterology and Hepatology, Shenzhen University General Hospital, Shenzhen University Clinical Medical Academy, Shenzhen University, Shenzhen, China
| | - Jingfeng Du
- Department of Gastroenterology and Hepatology, Shenzhen University General Hospital, Shenzhen University Clinical Medical Academy, Shenzhen University, Shenzhen, China
| | - Xi Mei
- Health Management Center, Shenzhen University General Hospital, Shenzhen University Clinical Medical Academy, Shenzhen University, Shenzhen, China
| | - Ting Lou
- Health Management Center, Shenzhen University General Hospital, Shenzhen University Clinical Medical Academy, Shenzhen University, Shenzhen, China
| | - Xiaoxv Yin
- School of Public Health, Huazhong University of Science and Technology, Wuhan, China
| | - Zhen Tan
- Health Management Center, Shenzhen University General Hospital, Shenzhen University Clinical Medical Academy, Shenzhen University, Shenzhen, China
- Shenzhen Nanshan District General Practice Alliance, Shenzhen, China
| |
Collapse
|
6
|
Wang W, Zhang Y, Liu D, Zhang H, Wang X, Zhou Y. Prediction of DNA-Binding Protein–Drug-Binding Sites Using Residue Interaction Networks and Sequence Feature. Front Bioeng Biotechnol 2022; 10:822392. [PMID: 35519609 PMCID: PMC9065339 DOI: 10.3389/fbioe.2022.822392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 03/14/2022] [Indexed: 11/13/2022] Open
Abstract
Identification of protein–ligand binding sites plays a critical role in drug discovery. However, there is still a lack of targeted drug prediction for DNA-binding proteins. This study aims at the binding sites of DNA-binding proteins and drugs, by mining the residue interaction network features, which can describe the local and global structure of amino acids, combined with sequence feature. The predictor of DNA-binding protein–drug-binding sites is built by employing the Extreme Gradient Boosting (XGBoost) model with random under-sampling. We found that the residue interaction network features can better characterize DNA-binding proteins, and the binding sites with high betweenness value and high closeness value are more likely to interact with drugs. The model shows that the residue interaction network features can be used as an important quantitative indicator of drug-binding sites, and this method achieves high predictive performance for the binding sites of DNA-binding protein–drug. This study will help in drug discovery research for DNA-binding proteins.
Collapse
Affiliation(s)
- Wei Wang
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
- Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province, College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
- *Correspondence: Wei Wang, ; Dong Liu, ; Yun Zhou,
| | - Yu Zhang
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
| | - Dong Liu
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
- Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province, College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
- *Correspondence: Wei Wang, ; Dong Liu, ; Yun Zhou,
| | - HongJun Zhang
- Computer Science and Technology, Anyang University, Anyang, China
| | - XianFang Wang
- Computer Science and Technology, Henan Institute of Technology, Xinxiang, China
| | - Yun Zhou
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
- *Correspondence: Wei Wang, ; Dong Liu, ; Yun Zhou,
| |
Collapse
|
7
|
Stringer B, de Ferrante H, Abeln S, Heringa J, Feenstra KA, Haydarlou R. PIPENN: protein interface prediction from sequence with an ensemble of neural nets. Bioinformatics 2022; 38:2111-2118. [PMID: 35150231 PMCID: PMC9004643 DOI: 10.1093/bioinformatics/btac071] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 01/16/2022] [Accepted: 02/04/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The interactions between proteins and other molecules are essential to many biological and cellular processes. Experimental identification of interface residues is a time-consuming, costly and challenging task, while protein sequence data are ubiquitous. Consequently, many computational and machine learning approaches have been developed over the years to predict such interface residues from sequence. However, the effectiveness of different Deep Learning (DL) architectures and learning strategies for protein-protein, protein-nucleotide and protein-small molecule interface prediction has not yet been investigated in great detail. Therefore, we here explore the prediction of protein interface residues using six DL architectures and various learning strategies with sequence-derived input features. RESULTS We constructed a large dataset dubbed BioDL, comprising protein-protein interactions from the PDB, and DNA/RNA and small molecule interactions from the BioLip database. We also constructed six DL architectures, and evaluated them on the BioDL benchmarks. This shows that no single architecture performs best on all instances. An ensemble architecture, which combines all six architectures, does consistently achieve peak prediction accuracy. We confirmed these results on the published benchmark set by Zhang and Kurgan (ZK448), and on our own existing curated homo- and heteromeric protein interaction dataset. Our PIPENN sequence-based ensemble predictor outperforms current state-of-the-art sequence-based protein interface predictors on ZK448 on all interaction types, achieving an AUC-ROC of 0.718 for protein-protein, 0.823 for protein-nucleotide and 0.842 for protein-small molecule. AVAILABILITY AND IMPLEMENTATION Source code and datasets are available at https://github.com/ibivu/pipenn/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Hans de Ferrante
- Department of Computer Science, IBIVU—Center for Integrative Bioinformatics, Vrije Universiteit, 1081HV Amsterdam, The Netherlands
| | - Sanne Abeln
- Department of Computer Science, IBIVU—Center for Integrative Bioinformatics, Vrije Universiteit, 1081HV Amsterdam, The Netherlands
| | - Jaap Heringa
- Department of Computer Science, IBIVU—Center for Integrative Bioinformatics, Vrije Universiteit, 1081HV Amsterdam, The Netherlands
| | - K Anton Feenstra
- Department of Computer Science, IBIVU—Center for Integrative Bioinformatics, Vrije Universiteit, 1081HV Amsterdam, The Netherlands
| | | |
Collapse
|