1
|
Shi MH, Zhang SW, Zhang QQ, Han Y, Zhang S. PLAGCA: Predicting protein-ligand binding affinity with the graph cross-attention mechanism. J Biomed Inform 2025; 165:104816. [PMID: 40139623 DOI: 10.1016/j.jbi.2025.104816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2024] [Revised: 03/13/2025] [Accepted: 03/14/2025] [Indexed: 03/29/2025]
Abstract
Accurate prediction of protein-ligand binding affinity plays a crucial role in drug discovery. However, determining the binding affinity of protein-ligands through biological experimental approaches is both time-consuming and expensive. Although some computational methods have been developed to predict protein-ligands binding affinity, most existing methods extract the global features of proteins and ligands through separate encoders, without considering to extract the local pocket interaction features of protein-ligand complexes, resulting in the limited prediction accuracy. In this work, we proposed a novel Protein-Ligand binding Affinity prediction method (named PLAGCA) by introducing Graph Cross-Attention mechanism to learn the local three-dimensional (3D) features of protein-ligand pockets, and integrating the global sequence/string features and local graph interaction features of protein-ligand complexes. PLAGCA uses sequence encoding and self-attention to extract the protein/ligand global features from protein FASTA sequences/ligand SMILES strings, adopts graph neural network and cross-attention to extract the protein-ligand local interaction features from the molecular structures of protein binding pockets and ligands. All these features are concatenated and input into a multi-layer perceptron (MLP) for predicting the protein-ligand binding affinity. The experimental results show that our PLAGCA outperforms other state-of-the-art computational methods, and it can effectively predict protein-ligand binding affinity with superior generalization capability. PLAGCA can capture the critical functional residues that are important contribution to the protein-ligand binding.
Collapse
Affiliation(s)
- Ming-Hui Shi
- MOE Key Laboratory of Information Fusion Technology, School of Automation, Northwestern Polytechnical University, Xian 710072, China.
| | - Shao-Wu Zhang
- MOE Key Laboratory of Information Fusion Technology, School of Automation, Northwestern Polytechnical University, Xian 710072, China.
| | - Qing-Qing Zhang
- MOE Key Laboratory of Information Fusion Technology, School of Automation, Northwestern Polytechnical University, Xian 710072, China
| | - Yong Han
- MOE Key Laboratory of Information Fusion Technology, School of Automation, Northwestern Polytechnical University, Xian 710072, China
| | - Shanwen Zhang
- School of Computing, Xijing University, Xi'an, 710123, China
| |
Collapse
|
2
|
Wang B, Zhang T, Liu Q, Sutcharitchan C, Zhou Z, Zhang D, Li S. Elucidating the role of artificial intelligence in drug development from the perspective of drug-target interactions. J Pharm Anal 2025; 15:101144. [PMID: 40099205 PMCID: PMC11910364 DOI: 10.1016/j.jpha.2024.101144] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Revised: 10/29/2024] [Accepted: 11/08/2024] [Indexed: 03/19/2025] Open
Abstract
Drug development remains a critical issue in the field of biomedicine. With the rapid advancement of information technologies such as artificial intelligence (AI) and the advent of the big data era, AI-assisted drug development has become a new trend, particularly in predicting drug-target associations. To address the challenge of drug-target prediction, AI-driven models have emerged as powerful tools, offering innovative solutions by effectively extracting features from complex biological data, accurately modeling molecular interactions, and precisely predicting potential drug-target outcomes. Traditional machine learning (ML), network-based, and advanced deep learning architectures such as convolutional neural networks (CNNs), graph convolutional networks (GCNs), and transformers play a pivotal role. This review systematically compiles and evaluates AI algorithms for drug- and drug combination-target predictions, highlighting their theoretical frameworks, strengths, and limitations. CNNs effectively identify spatial patterns and molecular features critical for drug-target interactions. GCNs provide deep insights into molecular interactions via relational data, whereas transformers increase prediction accuracy by capturing complex dependencies within biological sequences. Network-based models offer a systematic perspective by integrating diverse data sources, and traditional ML efficiently handles large datasets to improve overall predictive accuracy. Collectively, these AI-driven methods are transforming drug-target predictions and advancing the development of personalized therapy. This review summarizes the application of AI in drug development, particularly in drug-target prediction, and offers recommendations on models and algorithms for researchers engaged in biomedical research. It also provides typical cases to better illustrate how AI can further accelerate development in the fields of biomedicine and drug discovery.
Collapse
Affiliation(s)
- Boyang Wang
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRist, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Tingyu Zhang
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRist, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Qingyuan Liu
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRist, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Chayanis Sutcharitchan
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRist, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Ziyi Zhou
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRist, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Dingfan Zhang
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRist, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Shao Li
- Institute for TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRist, Department of Automation, Tsinghua University, Beijing, 100084, China
| |
Collapse
|
3
|
Peng L, Liu X, Yang L, Liu L, Bai Z, Chen M, Lu X, Nie L. BINDTI: A Bi-Directional Intention Network for Drug-Target Interaction Identification Based on Attention Mechanisms. IEEE J Biomed Health Inform 2025; 29:1602-1612. [PMID: 38457318 DOI: 10.1109/jbhi.2024.3375025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/10/2024]
Abstract
The identification of drug-target interactions (DTIs) is an essential step in drug discovery. In vitro experimental methods are expensive, laborious, and time-consuming. Deep learning has witnessed promising progress in DTI prediction. However, how to precisely represent drug and protein features is a major challenge for DTI prediction. Here, we developed an end-to-end DTI identification framework called BINDTI based on bi-directional Intention network. First, drug features are encoded with graph convolutional networks based on its 2D molecular graph obtained by its SMILES string. Next, protein features are encoded based on its amino acid sequence through a mixed model called ACmix, which integrates self-attention mechanism and convolution. Third, drug and target features are fused through bi-directional Intention network, which combines Intention and multi-head attention. Finally, unknown drug-target (DT) pairs are classified through multilayer perceptron based on the fused DT features. The results demonstrate that BINDTI greatly outperformed four baseline methods (i.e., CPI-GNN, TransfomerCPI, MolTrans, and IIFDTI) on the BindingDB, BioSNAP, DrugBank, and Human datasets. More importantly, it was more appropriate to predict new DTIs than the four baseline methods on imbalanced datasets. Ablation experimental results elucidated that both bi-directional Intention and ACmix could greatly advance DTI prediction. The fused feature visualization and case studies manifested that the predicted results by BINDTI were basically consistent with the true ones. We anticipate that the proposed BINDTI framework can find new low-cost drug candidates, improve drugs' virtual screening, and further facilitate drug repositioning as well as drug discovery.
Collapse
|
4
|
Peng L, Yang C, Yang J, Tu Y, Yu Q, Li Z, Chen M, Liang W. Drug Repositioning via Multi-View Representation Learning With Heterogeneous Graph Neural Network. IEEE J Biomed Health Inform 2025; 29:1668-1679. [PMID: 39074005 DOI: 10.1109/jbhi.2024.3434439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/31/2024]
Abstract
Exploring simple and efficient computational methods for drug repositioning has emerged as a popular and compelling topic in the realm of comprehensive drug development. The crux of this technology lies in identifying potential drug-disease associations, which can effectively mitigate the burdens caused by the exorbitant costs and lengthy periods of conventional drugs development. However, existing computational drug repositioning methods continue to encounter challenges in accurately predicting associations between drugs and diseases. In this paper, we propose a Multi-view Representation Learning method (MRLHGNN) with Heterogeneous Graph Neural Network for drug repositioning. This method is based on a collection of data from multiple biological entities associated with drugs or diseases. It consists of a view-specific feature aggregation module with meta-paths and auto multi-view fusion encoder. To better utilize local structural and semantic information from specific views in heterogeneous graph, MRLHGNN employs a feature aggregation model with variable-length meta-paths to expand the local receptive field. Additionally, it utilizes a transformer based semantic aggregation module to aggregate semantic features across different view-specific graphs. Finally, potential drug-disease associations are obtained through a multi-view fusion decoder with an attention mechanism. Cross-validation experiments demonstrate the effectiveness and interpretability of the MRLHGNN in comparison to nine state-of-the-art approaches. Case studies further reveal that MRLHGNN can serve as a powerful tool for drug repositioning.
Collapse
|
5
|
Peng L, Mao J, Huang G, Han G, Liu X, Liao W, Tian G, Yang J. DO-GMA: An End-to-End Drug-Target Interaction Identification Framework with a Depthwise Overparameterized Convolutional Network and the Gated Multihead Attention Mechanism. J Chem Inf Model 2025; 65:1318-1337. [PMID: 39874533 DOI: 10.1021/acs.jcim.4c02088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2025]
Abstract
Identification of potential drug-target interactions (DTIs) is a crucial step in drug discovery and repurposing. Although deep learning effectively deciphers DTIs, most deep learning-based methods represent drug features from only a single perspective. Moreover, the fusion method of drug and protein features needs further refinement. To address the above two problems, in this study, we develop a novel end-to-end framework named DO-GMA for potential DTI identification by incorporating Depthwise Overparameterized convolutional neural network and the Gated Multihead Attention mechanism with shared-learned queries and bilinear model concatenation. DO-GMA first designs a depthwise overparameterized convolutional neural network to learn drug representations from their SMILES strings and protein representations from their amino acid sequences. Next, it extracts drug representations from their 2D molecular graphs through a graph convolutional network. Subsequently, it fuses drug and protein features by combining the gated attention mechanism and the multihead attention mechanism with shared-learned queries and bilinear model concatenation. Finally, it takes the fused drug-target features as inputs and builds a multilayer perceptron to classify unlabeled drug-target pairs (DTPs). DO-GMA was benchmarked against six newest DTI prediction methods (CPI-GNN, BACPI, CPGL, DrugBAN, BINDTI, and FOTF-CPI) under four different experimental settings on four DTI data sets (i.e., DrugBank, BioSNAP, C.elegans, and BindingDB). The results show that DO-GMA significantly outperformed the above six methods based on AUC, AUPR, accuracy, F1-score, and MCC. An ablation study, robust statistical analysis, sensitivity analysis of parameters, visualization of the fused features, computational cost analysis, and case analysis further validated the powerful DTI identification performance of DO-GMA. In addition, DO-GMA predicted that two drug-protein pairs (i.e., DB00568 and P06276, and DB09118 and Q9UQD0) could be interacting. DO-GMA is freely available at https://github.com/plhhnu/DO-GMA.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou 412007, Hunan, China
| | - Jiale Mao
- School of Computer Science, Hunan University of Technology, Zhuzhou 412007, Hunan, China
| | - Guohua Huang
- School of Information Technology and Administration, Hunan University of Finance and Economics, Changsha 410125, China
| | - Guosheng Han
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan 411100, Hunan, China
| | - Xin Liu
- College of Life Science and Chemistry, Hunan University of Technology, Zhuzhou 412007, Hunan, China
| | - Wen Liao
- School of Computer Science, Hunan University of Technology, Zhuzhou 412007, Hunan, China
| | - Geng Tian
- Geneis (Beijing) Co. Ltd., Beijing 100102, China
| | | |
Collapse
|
6
|
Yuan Y, Chen S, Hu R, Wang X. MutualDTA: An Interpretable Drug-Target Affinity Prediction Model Leveraging Pretrained Models and Mutual Attention. J Chem Inf Model 2025; 65:1211-1227. [PMID: 39878060 DOI: 10.1021/acs.jcim.4c01893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2025]
Abstract
Efficient and accurate drug-target affinity (DTA) prediction can significantly accelerate the drug development process. Recently, deep learning models have been widely applied to DTA prediction and have achieved notable success. However, existing methods often encounter several common issues: first, the data representations lack sufficient information; second, the extracted features are not comprehensive; and third, most methods lack interpretability when modeling drug-target binding. To overcome the above-mentioned problems, we propose an interpretable deep learning model called MutualDTA for predicting DTA. MutualDTA leverages the power of pretrained models to obtain accurate representations of drugs and targets. It also employs well-designed modules to extract hidden features from these representations. Furthermore, the interpretability of MutualDTA is realized by the Mutual-Attention module, which (i) establishes relationships between drugs and proteins from the perspective of intermolecular interactions between drug atoms and protein amino acid residues and (ii) allows MutualDTA to capture the binding sites based on attention scores. The test results on two benchmark data sets show that MutualDTA achieves the best performance compared to the 12 state-of-the-art models. Attention visualization experiments show that MutualDTA can capture partial interaction sites, which not only helps drug developers reduce the search space for binding sites, but also demonstrates the interpretability of MutualDTA. Finally, the trained MutualDTA is applied to screen high-affinity drug screens targeting Alzheimer's disease (AD)-related proteins, and the screened drugs are partially present in the anti-AD drug library. These results demonstrate the reliability of MutualDTA in drug development.
Collapse
Affiliation(s)
- Yongna Yuan
- School of Information Science & Engineering, Lanzhou University, Lanzhou 730000, China
| | - Siming Chen
- School of Information Science & Engineering, Lanzhou University, Lanzhou 730000, China
| | - Rizhen Hu
- School of Information Science & Engineering, Lanzhou University, Lanzhou 730000, China
| | - Xin Wang
- School of Information Science & Engineering, Lanzhou University, Lanzhou 730000, China
| |
Collapse
|
7
|
Yang S, Bai M, Liu W, Li W, Zhong Z, Kwok LY, Dong G, Sun Z. Predicting Lactobacillus delbrueckii subsp. bulgaricus-Streptococcus thermophilus interactions based on a highly accurate semi-supervised learning method. SCIENCE CHINA. LIFE SCIENCES 2025; 68:558-574. [PMID: 39417929 DOI: 10.1007/s11427-023-2569-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 03/15/2024] [Indexed: 10/19/2024]
Abstract
Lactobacillus delbrueckii subsp. bulgaricus (L. bulgaricus) and Streptococcus thermophilus (S. thermophilus) are commonly used starters in milk fermentation. Fermentation experiments revealed that L. bulgaricus-S. thermophilus interactions (LbStI) substantially impact dairy product quality and production. Traditional biological humidity experiments are time-consuming and labor-intensive in screening interaction combinations, an artificial intelligence-based method for screening interactive starter combinations is necessary. However, in the current research on artificial intelligence based interaction prediction in the field of bioinformatics, most successful models adopt supervised learning methods, and there is a lack of research on interaction prediction with only a small number of labeled samples. Hence, this study aimed to develop a semi-supervised learning framework for predicting LbStI using genomic data from 362 isolates (181 per species). The framework consisted of a two-part model: a co-clustering prediction model (based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) dataset) and a Laplacian regularized least squares prediction model (based on K-mer analysis and gene composition of all isolates datasets). To enhance accuracy, we integrated the separate outcomes produced by each component of the two-part model to generate the ultimate LbStI prediction results, which were verified through milk fermentation experiments. Validation through milk fermentation experiments confirmed a high precision rate of 85% (17/20; validated with 20 randomly selected combinations of expected interacting isolates). Our data suggest that the biosynthetic pathways of cysteine, riboflavin, teichoic acid, and exopolysaccharides, as well as the ATP-binding cassette transport systems, contribute to the mutualistic relationship between these starter bacteria during milk fermentation. However, this finding requires further experimental verification. The presented model and data are valuable resources for academics and industry professionals interested in screening dairy starter cultures and understanding their interactions.
Collapse
Affiliation(s)
- Shujuan Yang
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, 010018, China
- Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, 010018, China
- Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot, 010018, China
- Collaborative Innovative Center for Lactic Acid Bacteria and Fermented Dairy Products, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, 010018, China
| | - Mei Bai
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, 010018, China
- Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, 010018, China
- Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot, 010018, China
- Collaborative Innovative Center for Lactic Acid Bacteria and Fermented Dairy Products, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, 010018, China
| | - Weichi Liu
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, 010018, China
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application of Agriculture and Animal Husbandry, Hohhot, 010018, China
| | - Weicheng Li
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, 010018, China
- Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, 010018, China
- Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot, 010018, China
- Collaborative Innovative Center for Lactic Acid Bacteria and Fermented Dairy Products, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, 010018, China
| | - Zhi Zhong
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, 010018, China
- Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, 010018, China
- Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot, 010018, China
- Collaborative Innovative Center for Lactic Acid Bacteria and Fermented Dairy Products, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, 010018, China
| | - Lai-Yu Kwok
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, 010018, China
- Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, 010018, China
- Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot, 010018, China
- Collaborative Innovative Center for Lactic Acid Bacteria and Fermented Dairy Products, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, 010018, China
| | - Gaifang Dong
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, 010018, China.
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application of Agriculture and Animal Husbandry, Hohhot, 010018, China.
| | - Zhihong Sun
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, 010018, China.
- Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, 010018, China.
- Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot, 010018, China.
- Collaborative Innovative Center for Lactic Acid Bacteria and Fermented Dairy Products, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, 010018, China.
| |
Collapse
|
8
|
Li Z, Zeng Y, Jiang M, Wei B. Deep Drug-Target Binding Affinity Prediction Base on Multiple Feature Extraction and Fusion. ACS OMEGA 2025; 10:2020-2032. [PMID: 39866608 PMCID: PMC11755178 DOI: 10.1021/acsomega.4c08048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 12/25/2024] [Accepted: 01/03/2025] [Indexed: 01/28/2025]
Abstract
Accurate drug-target binding affinity (DTA) prediction is crucial in drug discovery. Recently, deep learning methods for DTA prediction have made significant progress. However, there are still two challenges: (1) recent models always ignore the correlations in drug and target data in the drug/target representation process and (2) the interaction learning of drug-target pairs always is by simple concatenation, which is insufficient to explore their fusion. To overcome these challenges, we propose an end-to-end sequence-based model called BTDHDTA. In the feature extraction process, the bidirectional gated recurrent unit (GRU), transformer encoder, and dilated convolution are employed to extract global, local, and their correlation patterns of drug and target input. Additionally, a module combining convolutional neural networks with a Highway connection is introduced to fuse drug and protein deep features. We evaluate the performance of BTDHDTA on three benchmark data sets (Davis, KIBA, and Metz), demonstrating its superiority over several current state-of-the-art methods in key metrics such as Mean Squared Error (MSE), Concordance Index (CI), and Regression toward the mean (R m 2). The results indicate that our method achieves a better performance in DTA prediction. In the case study, we use the BTDHDTA model to predict the binding affinities between 3137 FDA-approved drugs and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) replication-related proteins, validating the model's effectiveness in practical scenarios.
Collapse
Affiliation(s)
- Zepeng Li
- School
of Computer Science and Technology, Zhejiang
Sci-Tech University, Hangzhou 310018, China
| | - Yuni Zeng
- School
of Computer Science and Technology, Zhejiang
Sci-Tech University, Hangzhou 310018, China
| | - Mingfeng Jiang
- School
of Computer Science and Technology, Zhejiang
Sci-Tech University, Hangzhou 310018, China
| | - Bo Wei
- School
of Computer Science and Technology, Zhejiang
Sci-Tech University, Hangzhou 310018, China
- Longgang
Research Institute, Zhejiang Sci-Tech University, Longgang 325000, Zhejiang, China
| |
Collapse
|
9
|
Meng J, Zhang L, He Z, Hu M, Liu J, Bao W, Tian Q, Feng H, Liu H. Development of a machine learning-based target-specific scoring function for structure-based binding affinity prediction for human dihydroorotate dehydrogenase inhibitors. J Comput Chem 2025; 46:e27510. [PMID: 39325045 DOI: 10.1002/jcc.27510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 08/21/2024] [Accepted: 09/11/2024] [Indexed: 09/27/2024]
Abstract
Human dihydroorotate dehydrogenase (hDHODH) is a flavin mononucleotide-dependent enzyme that can limit de novo pyrimidine synthesis, making it a therapeutic target for diseases such as autoimmune disorders and cancer. In this study, using the docking structures of complexes generated by AutoDock Vina, we integrate interaction features and ligand features, and employ support vector regression to develop a target-specific scoring function for hDHODH (TSSF-hDHODH). The Pearson correlation coefficient values of TSSF-hDHODH in the cross-validation and external validation are 0.86 and 0.74, respectively, both of which are far superior to those of classic scoring function AutoDock Vina and random forest (RF) based generic scoring function RF-Score. TSSF-hDHODH is further used for the virtual screening of potential inhibitors in the FDA-Approved & Pharmacopeia Drug Library. In conjunction with the results from molecular dynamics simulations, crizotinib is identified as a candidate for subsequent structural optimization. This study can be useful for the discovery of hDHODH inhibitors and the development of scoring functions for additional targets.
Collapse
Affiliation(s)
- Jinhui Meng
- School of Life Science, Liaoning University, Shenyang, Liaoning, China
| | - Li Zhang
- School of Life Science, Liaoning University, Shenyang, Liaoning, China
- Liaoning Provincial Key Laboratory of Computational Simulation and Information Processing of Biomacromolecules, Liaoning University, Shenyang, Liaoning, China
- Engineering Laboratory for Molecular Simulation and Designing of Drug Molecules of Liaoning, Liaoning University, Shenyang, Liaoning, China
| | - Zhe He
- School of Life Science, Liaoning University, Shenyang, Liaoning, China
| | - Mengfeng Hu
- School of Life Science, Liaoning University, Shenyang, Liaoning, China
| | - Jinhan Liu
- School of Life Science, Liaoning University, Shenyang, Liaoning, China
| | - Wenzhuo Bao
- School of Life Science, Liaoning University, Shenyang, Liaoning, China
| | - Qifeng Tian
- School of Life Science, Liaoning University, Shenyang, Liaoning, China
| | - Huawei Feng
- School of Pharmacy, Liaoning University, Shenyang, Liaoning, China
| | - Hongsheng Liu
- Liaoning Provincial Key Laboratory of Computational Simulation and Information Processing of Biomacromolecules, Liaoning University, Shenyang, Liaoning, China
- Engineering Laboratory for Molecular Simulation and Designing of Drug Molecules of Liaoning, Liaoning University, Shenyang, Liaoning, China
- School of Pharmacy, Liaoning University, Shenyang, Liaoning, China
| |
Collapse
|
10
|
Ouyang X, Feng Y, Cui C, Li Y, Zhang L, Wang H. Improving generalizability of drug-target binding prediction by pre-trained multi-view molecular representations. Bioinformatics 2024; 41:btaf002. [PMID: 39776159 PMCID: PMC11751634 DOI: 10.1093/bioinformatics/btaf002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Revised: 12/12/2024] [Accepted: 01/06/2025] [Indexed: 01/11/2025] Open
Abstract
MOTIVATION Most drugs start on their journey inside the body by binding the right target proteins. This is the reason that numerous efforts have been devoted to predicting the drug-target binding during drug development. However, the inherent diversity among molecular properties, coupled with limited training data availability, poses challenges to the accuracy and generalizability of these methods beyond their training domain. RESULTS In this work, we proposed a neural networks construction for high accurate and generalizable drug-target binding prediction, named Pre-trained Multi-view Molecular Representations (PMMR). The method uses pre-trained models to transfer representations of target proteins and drugs to the domain of drug-target binding prediction, mitigating the issue of poor generalizability stemming from limited data. Then, two typical representations of drug molecules, Graphs and SMILES strings, are learned respectively by a Graph Neural Network and a Transformer to achieve complementarity between local and global features. PMMR was evaluated on drug-target affinity and interaction benchmark datasets, and it derived preponderant performance contrast to peer methods, especially generalizability in cold-start scenarios. Furthermore, our state-of-the-art method was indicated to have the potential for drug discovery by a case study of cyclin-dependent kinase 2. AVAILABILITY AND IMPLEMENTATION https://github.com/NENUBioCompute/PMMR.
Collapse
Affiliation(s)
- Xike Ouyang
- School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, Jilin 130117, China
| | - Yannuo Feng
- School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, Jilin 130117, China
| | - Chen Cui
- School of Computer Science and Engineering, Changchun University of Technology, Changchun, Jilin 130051, China
| | - Yunhe Li
- School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, Jilin 130117, China
| | - Li Zhang
- School of Computer Science and Engineering, Changchun University of Technology, Changchun, Jilin 130051, China
| | - Han Wang
- School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, Jilin 130117, China
| |
Collapse
|
11
|
Liu M, Meng X, Mao Y, Li H, Liu J. ReduMixDTI: Prediction of Drug-Target Interaction with Feature Redundancy Reduction and Interpretable Attention Mechanism. J Chem Inf Model 2024; 64:8952-8962. [PMID: 39570771 DOI: 10.1021/acs.jcim.4c01554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/10/2024]
Abstract
Identifying drug-target interactions (DTIs) is essential for drug discovery and development. Existing deep learning approaches to DTI prediction often employ powerful feature encoders to represent drugs and targets holistically, which usually cause significant redundancy and noise by neglecting the restricted binding regions. Furthermore, many previous DTI networks ignore or simplify the complex intermolecular interaction process involving diverse binding types, which significantly limits both predictive ability and interpretability. We propose ReduMixDTI, an end-to-end model that addresses feature redundancy and explicitly captures complex local interactions for DTI prediction. In this study, drug and target features are encoded by using graph neural networks and convolutional neural networks, respectively. These features are refined from channel and spatial perspectives to enhance the representations. The proposed attention mechanism explicitly models pairwise interactions between drug and target substructures, improving the model's understanding of binding processes. In extensive comparisons with seven state-of-the-art methods, ReduMixDTI demonstrates superior performance across three benchmark data sets and external test sets reflecting real-world scenarios. Additionally, we perform comprehensive ablation studies and visualize protein attention weights to enhance the interpretability. The results confirm that ReduMixDTI serves as a robust and interpretable model for reducing feature redundancy, contributing to advances in DTI prediction.
Collapse
Affiliation(s)
- Mingqing Liu
- National Engineering Laboratory for Brain-inspired Intelligence Technology and Application, School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, Anhui, China
- Center for Advanced Interdisciplinary Science and Biomedicine of IHM, Division of Life Sciences and Medicine, University of Science and Technology of China MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei 230026, Anhui, China
| | - Xuechun Meng
- National Engineering Laboratory for Brain-inspired Intelligence Technology and Application, School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, Anhui, China
- Center for Advanced Interdisciplinary Science and Biomedicine of IHM, Division of Life Sciences and Medicine, University of Science and Technology of China MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei 230026, Anhui, China
| | - Yiyang Mao
- National Engineering Laboratory for Brain-inspired Intelligence Technology and Application, School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, Anhui, China
- Center for Advanced Interdisciplinary Science and Biomedicine of IHM, Division of Life Sciences and Medicine, University of Science and Technology of China MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei 230026, Anhui, China
| | - Hongqi Li
- Department of Geriatrics, The First Affiliated Hospital of USTC, University of Science and Technology of China, Hefei 230026, Anhui, China
| | - Ji Liu
- National Engineering Laboratory for Brain-inspired Intelligence Technology and Application, School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, Anhui, China
- Center for Advanced Interdisciplinary Science and Biomedicine of IHM, Division of Life Sciences and Medicine, University of Science and Technology of China MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei 230026, Anhui, China
- MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei 230026, Anhui, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei 230026, Anhui, China
| |
Collapse
|
12
|
Liang J, Hu Z, Bi Y, Cheng H, Guo WF. Multimodal multiobjective optimization with structural network control principles to optimize personalized drug targets for drug discovery of individual patients. Brief Bioinform 2024; 26:bbaf007. [PMID: 39835535 PMCID: PMC11747759 DOI: 10.1093/bib/bbaf007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Revised: 12/05/2024] [Accepted: 01/05/2025] [Indexed: 01/22/2025] Open
Abstract
Structural network control principles provided novel and efficient clues for the optimization of personalized drug targets (PDTs) related to state transitions of individual patients. However, most existing methods focus on one subnetwork or module as drug targets through the identification of the minimal set of driver nodes and ignore the state transition capabilities of other modules with different configurations of drug targets [i.e. multimodal drug targets (MDTs)] embedding the knowledge of previous drug targets (i.e. multiobjective optimization). Therefore, a novel multimodal multiobjective evolutionary optimization framework (called MMONCP) is proposed to optimize PDTs with network control principles. The key points of MMONCP are that a constrained multimodal multiobjective optimization problem is formed with discrete constraints on the decision space and multimodality characteristics, and a novel evolutionary algorithm denoted as CMMOEA-GLS-WSCD is designed by combining a global and local search strategy and a weighting-based special crowding distance strategy to balance the diversity of both objective and decision space. The experimental results on three cancer genomics data from The Cancer Genome Atlas indicate that MMONCP achieves a higher performance including algorithm convergence and diversity, the fraction of identified MDTs, and the area under the curve score than advanced algorithms. Additionally, MMONCP can detect the early state from the difference between the target activity and toxicity of MDTs and provide early treatment options for cancer treatment in precision medicine.
Collapse
Affiliation(s)
- Jing Liang
- School of Electrical and Information Engineering, Zhengzhou University, No. 100, Science Avenue, Hightech District, Zhengzhou City 450001, Henan Province, China
- State Key Laboratory of Intelligent Agricultural Power Equipment, No. 39, Xiyuan Road, Jianxi District, Luoyang City 471039, Henan Province, China
| | - Zhuo Hu
- School of Electrical and Information Engineering, Zhengzhou University, No. 100, Science Avenue, Hightech District, Zhengzhou City 450001, Henan Province, China
| | - Ying Bi
- School of Electrical and Information Engineering, Zhengzhou University, No. 100, Science Avenue, Hightech District, Zhengzhou City 450001, Henan Province, China
| | - Han Cheng
- School of Life Sciences, Zhengzhou University, No. 100, Science Avenue, High-tech District, Zhengzhou City 450001, Henan Province, China
| | - Wei-Feng Guo
- School of Electrical and Information Engineering, Zhengzhou University, No. 100, Science Avenue, Hightech District, Zhengzhou City 450001, Henan Province, China
- State Key Laboratory of Intelligent Agricultural Power Equipment, No. 39, Xiyuan Road, Jianxi District, Luoyang City 471039, Henan Province, China
| |
Collapse
|
13
|
Chen J, Tao R, Qiu Y, Yuan Q. CMFHMDA: a prediction framework for human disease-microbe associations based on cross-domain matrix factorization. Brief Bioinform 2024; 25:bbae481. [PMID: 39327064 PMCID: PMC11427075 DOI: 10.1093/bib/bbae481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Revised: 08/27/2024] [Accepted: 09/12/2024] [Indexed: 09/28/2024] Open
Abstract
Predicting associations between microbes and diseases opens up new avenues for developing diagnostic, preventive, and therapeutic strategies. Given that laboratory-based biological tests to verify these associations are often time-consuming and expensive, there is a critical need for innovative computational frameworks to predict new microbe-disease associations. In this work, we introduce a novel prediction algorithm called Predicting Human Disease-Microbe Associations using Cross-Domain Matrix Factorization (CMFHMDA). Initially, we calculate the composite similarity of diseases and the Gaussian interaction profile similarity of microbes. We then apply the Weighted K Nearest Known Neighbors (WKNKN) algorithm to refine the microbe-disease association matrix. Our CMFHMDA model is subsequently developed by integrating the network data of both microbes and diseases to predict potential associations. The key innovations of this method include using the WKNKN algorithm to preprocess missing values in the association matrix and incorporating cross-domain information from microbes and diseases into the CMFHMDA model. To validate CMFHMDA, we employed three different cross-validation techniques to evaluate the model's accuracy. The results indicate that the CMFHMDA model achieved Area Under the Receiver Operating Characteristic Curve scores of 0.9172, 0.8551, and 0.9351$\pm $0.0052 in global Leave-One-Out Cross-Validation (LOOCV), local LOOCV, and five-fold CV, respectively. Furthermore, many predicted associations have been confirmed by published experimental studies, establishing CMFHMDA as an effective tool for predicting potential disease-associated microbes.
Collapse
Affiliation(s)
- Jing Chen
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, 215009 Suzhou, China
| | - Ran Tao
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, 215009 Suzhou, China
| | - Yi Qiu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, 215009 Suzhou, China
| | - Qun Yuan
- Suzhou Research Center of Medical School, Suzhou Hospital, Affiliated Hospital of Medical School, Nanjing University, 215153 Suzhou, China
| |
Collapse
|
14
|
Zhao L, Zhu Y, Wen N, Wang C, Wang J, Yuan Y. Drug-Target Binding Affinity Prediction in a Continuous Latent Space Using Variational Autoencoders. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1458-1467. [PMID: 38767996 DOI: 10.1109/tcbb.2024.3402661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Accurate prediction of Drug-Target binding Affinity (DTA) is a daunting yet pivotal task in the sphere of drug discovery. Over the years, a plethora of deep learning-based DTA models have emerged, rendering promising results in predicting the binding affinities between drugs and their target proteins. However, in contrast to the conventional approach of modeling binding affinity in vector spaces, we propose a more nuanced modeling process in a continuous space to account for the diversity of input samples. Initially, the drug is encoded using the Simplified Molecular Input Line Entry System (SMILES), while the target sequences are characterized via a pretrained language model. Subsequently, highly correlative information is extracted utilizing residual gated convolutional neural networks. In a departure from existing deep learning-based models, our model learns the hidden representations of the drugs and targets jointly. Instead of employing two vectors, our hidden representations consist of two Gaussian distributions. To validate the effectiveness of our proposal, we conducted evaluations on commonly utilized benchmark datasets. The experimental outcomes corroborated that our method surpasses the state-of-the-art vectorial representation methods in terms of performance. This approach, therefore, offers potential enhancements in the precision of DTA predictions, potentially contributing to more efficient drug discovery processes.
Collapse
|
15
|
Chen J, Zhu Y, Yuan Q. Predicting potential microbe-disease associations based on dual branch graph convolutional network. J Cell Mol Med 2024; 28:e18571. [PMID: 39086148 PMCID: PMC11291560 DOI: 10.1111/jcmm.18571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 06/15/2024] [Accepted: 06/27/2024] [Indexed: 08/02/2024] Open
Abstract
Studying the association between microbes and diseases not only aids in the prevention and diagnosis of diseases, but also provides crucial theoretical support for new drug development and personalized treatment. Due to the time-consuming and costly nature of laboratory-based biological tests to confirm the relationship between microbes and diseases, there is an urgent need for innovative computational frameworks to anticipate new associations between microbes and diseases. Here, we propose a novel computational approach based on a dual branch graph convolutional network (GCN) module, abbreviated as DBGCNMDA, for identifying microbe-disease associations. First, DBGCNMDA calculates the similarity matrix of diseases and microbes by integrating functional similarity and Gaussian association spectrum kernel (GAPK) similarity. Then, semantic information from different biological networks is extracted by two GCN modules from different perspectives. Finally, the scores of microbe-disease associations are predicted based on the extracted features. The main innovation of this method lies in the use of two types of information for microbe/disease similarity assessment. Additionally, we extend the disease nodes to address the issue of insufficient features due to low data dimensionality. We optimize the connectivity between the homogeneous entities using random walk with restart (RWR), and then use the optimized similarity matrix as the initial feature matrix. In terms of network understanding, we design a dual branch GCN module, namely GlobalGCN and LocalGCN, to fine-tune node representations by introducing side information, including homologous neighbour nodes. We evaluate the accuracy of the DBGCNMDA model using five-fold cross-validation (5-fold-CV) technique. The results show that the area under the receiver operating characteristic curve (AUC) and area under the precision versus recall curve (AUPR) of the DBGCNMDA model in the 5-fold-CV are 0.9559 and 0.9630, respectively. The results from the case studies using published experimental data confirm a significant number of predicted associations, indicating that DBGCNMDA is an effective tool for predicting potential microbe-disease associations.
Collapse
Affiliation(s)
- Jing Chen
- School of Electronic and Information EngineeringSuzhou University of Science and TechnologySuzhouChina
| | - Yongjun Zhu
- School of Electronic and Information EngineeringSuzhou University of Science and TechnologySuzhouChina
| | - Qun Yuan
- Department of Respiratory Medicine, The Affiliated Suzhou Hospital of NanjingUniversity Medical SchoolSuzhouChina
| |
Collapse
|
16
|
Wu H, Liu J, Zhang R, Lu Y, Cui G, Cui Z, Ding Y. A review of deep learning methods for ligand based drug virtual screening. FUNDAMENTAL RESEARCH 2024; 4:715-737. [PMID: 39156568 PMCID: PMC11330120 DOI: 10.1016/j.fmre.2024.02.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/10/2024] [Accepted: 02/18/2024] [Indexed: 08/20/2024] Open
Abstract
Drug discovery is costly and time consuming, and modern drug discovery endeavors are progressively reliant on computational methodologies, aiming to mitigate temporal and financial expenditures associated with the process. In particular, the time required for vaccine and drug discovery is prolonged during emergency situations such as the coronavirus 2019 pandemic. Recently, the performance of deep learning methods in drug virtual screening has been particularly prominent. It has become a concern for researchers how to summarize the existing deep learning in drug virtual screening, select different models for different drug screening problems, exploit the advantages of deep learning models, and further improve the capability of deep learning in drug virtual screening. This review first introduces the basic concepts of drug virtual screening, common datasets, and data representation methods. Then, large numbers of common deep learning methods for drug virtual screening are compared and analyzed. In addition, a dataset of different sizes is constructed independently to evaluate the performance of each deep learning model for the difficult problem of large-scale ligand virtual screening. Finally, the existing challenges and future directions in the field of virtual screening are presented.
Collapse
Affiliation(s)
- Hongjie Wu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Junkai Liu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Runhua Zhang
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Yaoyao Lu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Guozeng Cui
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Zhiming Cui
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| |
Collapse
|
17
|
Zhao K, Zhao P, Wang S, Xia Y, Zhang G. FoldPAthreader: predicting protein folding pathway using a novel folding force field model derived from known protein universe. Genome Biol 2024; 25:152. [PMID: 38862984 PMCID: PMC11167914 DOI: 10.1186/s13059-024-03291-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 05/29/2024] [Indexed: 06/13/2024] Open
Abstract
Protein folding has become a tractable problem with the significant advances in deep learning-driven protein structure prediction. Here we propose FoldPAthreader, a protein folding pathway prediction method that uses a novel folding force field model by exploring the intrinsic relationship between protein evolution and folding from the known protein universe. Further, the folding force field is used to guide Monte Carlo conformational sampling, driving the protein chain fold into its native state by exploring potential intermediates. On 30 example targets, FoldPAthreader successfully predicts 70% of the proteins whose folding pathway is consistent with biological experimental data.
Collapse
Affiliation(s)
- Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Pengxin Zhao
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Suhui Wang
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Yuhao Xia
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China.
| |
Collapse
|
18
|
Kalemati M, Zamani Emani M, Koohi S. DCGAN-DTA: Predicting drug-target binding affinity with deep convolutional generative adversarial networks. BMC Genomics 2024; 25:411. [PMID: 38724911 PMCID: PMC11080241 DOI: 10.1186/s12864-024-10326-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 04/19/2024] [Indexed: 05/13/2024] Open
Abstract
BACKGROUND In recent years, there has been a growing interest in utilizing computational approaches to predict drug-target binding affinity, aiming to expedite the early drug discovery process. To address the limitations of experimental methods, such as cost and time, several machine learning-based techniques have been developed. However, these methods encounter certain challenges, including the limited availability of training data, reliance on human intervention for feature selection and engineering, and a lack of validation approaches for robust evaluation in real-life applications. RESULTS To mitigate these limitations, in this study, we propose a method for drug-target binding affinity prediction based on deep convolutional generative adversarial networks. Additionally, we conducted a series of validation experiments and implemented adversarial control experiments using straw models. These experiments serve to demonstrate the robustness and efficacy of our predictive models. We conducted a comprehensive evaluation of our method by comparing it to baselines and state-of-the-art methods. Two recently updated datasets, namely the BindingDB and PDBBind, were used for this purpose. Our findings indicate that our method outperforms the alternative methods in terms of three performance measures when using warm-start data splitting settings. Moreover, when considering physiochemical-based cold-start data splitting settings, our method demonstrates superior predictive performance, particularly in terms of the concordance index. CONCLUSION The results of our study affirm the practical value of our method and its superiority over alternative approaches in predicting drug-target binding affinity across multiple validation sets. This highlights the potential of our approach in accelerating drug repurposing efforts, facilitating novel drug discovery, and ultimately enhancing disease treatment. The data and source code for this study were deposited in the GitHub repository, https://github.com/mojtabaze7/DCGAN-DTA . Furthermore, the web server for our method is accessible at https://dcgan.shinyapps.io/bindingaffinity/ .
Collapse
Affiliation(s)
- Mahmood Kalemati
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | - Mojtaba Zamani Emani
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | - Somayyeh Koohi
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran.
| |
Collapse
|
19
|
Svensson E, Hoedt PJ, Hochreiter S, Klambauer G. HyperPCM: Robust Task-Conditioned Modeling of Drug-Target Interactions. J Chem Inf Model 2024; 64:2539-2553. [PMID: 38185877 PMCID: PMC11005051 DOI: 10.1021/acs.jcim.3c01417] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/27/2023] [Accepted: 11/27/2023] [Indexed: 01/09/2024]
Abstract
A central problem in drug discovery is to identify the interactions between drug-like compounds and protein targets. Over the past few decades, various quantitative structure-activity relationship (QSAR) and proteo-chemometric (PCM) approaches have been developed to model and predict these interactions. While QSAR approaches solely utilize representations of the drug compound, PCM methods incorporate both representations of the protein target and the drug compound, enabling them to achieve above-chance predictive accuracy on previously unseen protein targets. Both QSAR and PCM approaches have recently been improved by machine learning and deep neural networks, that allow the development of drug-target interaction prediction models from measurement data. However, deep neural networks typically require large amounts of training data and cannot robustly adapt to new tasks, such as predicting interaction for unseen protein targets at inference time. In this work, we propose to use HyperNetworks to efficiently transfer information between tasks during inference and thus to accurately predict drug-target interactions on unseen protein targets. Our HyperPCM method reaches state-of-the-art performance compared to previous methods on multiple well-known benchmarks, including Davis, DUD-E, and a ChEMBL derived data set, and particularly excels at zero-shot inference involving unseen protein targets. Our method, as well as reproducible data preparation, is available at https://github.com/ml-jku/hyper-dti.
Collapse
Affiliation(s)
- Emma Svensson
- ELLIS
Unit Linz & Institute for Machine Learning, Johannes Kepler University, Linz 4040, Austria
- Molecular
AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, 431 83, Sweden
| | - Pieter-Jan Hoedt
- ELLIS
Unit Linz & Institute for Machine Learning, Johannes Kepler University, Linz 4040, Austria
| | - Sepp Hochreiter
- ELLIS
Unit Linz & Institute for Machine Learning, Johannes Kepler University, Linz 4040, Austria
- Institute
of Advanced Research in Artificial Intelligence (IARAI), Vienna 1030, Austria
| | - Günter Klambauer
- ELLIS
Unit Linz & Institute for Machine Learning, Johannes Kepler University, Linz 4040, Austria
| |
Collapse
|
20
|
Zeng X, Li SJ, Lv SQ, Wen ML, Li Y. A comprehensive review of the recent advances on predicting drug-target affinity based on deep learning. Front Pharmacol 2024; 15:1375522. [PMID: 38628639 PMCID: PMC11019008 DOI: 10.3389/fphar.2024.1375522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 03/21/2024] [Indexed: 04/19/2024] Open
Abstract
Accurate calculation of drug-target affinity (DTA) is crucial for various applications in the pharmaceutical industry, including drug screening, design, and repurposing. However, traditional machine learning methods for calculating DTA often lack accuracy, posing a significant challenge in accurately predicting DTA. Fortunately, deep learning has emerged as a promising approach in computational biology, leading to the development of various deep learning-based methods for DTA prediction. To support researchers in developing novel and highly precision methods, we have provided a comprehensive review of recent advances in predicting DTA using deep learning. We firstly conducted a statistical analysis of commonly used public datasets, providing essential information and introducing the used fields of these datasets. We further explored the common representations of sequences and structures of drugs and targets. These analyses served as the foundation for constructing DTA prediction methods based on deep learning. Next, we focused on explaining how deep learning models, such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Transformer, and Graph Neural Networks (GNNs), were effectively employed in specific DTA prediction methods. We highlighted the unique advantages and applications of these models in the context of DTA prediction. Finally, we conducted a performance analysis of multiple state-of-the-art methods for predicting DTA based on deep learning. The comprehensive review aimed to help researchers understand the shortcomings and advantages of existing methods, and further develop high-precision DTA prediction tool to promote the development of drug discovery.
Collapse
Affiliation(s)
- Xin Zeng
- College of Mathematics and Computer Science, Dali University, Dali, China
| | - Shu-Juan Li
- Yunnan Institute of Endemic Diseases Control and Prevention, Dali, China
| | - Shuang-Qing Lv
- Institute of Surveying and Information Engineering West Yunnan University of Applied Science, Dali, China
| | - Meng-Liang Wen
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, China
| | - Yi Li
- College of Mathematics and Computer Science, Dali University, Dali, China
| |
Collapse
|
21
|
Peng L, Yang Y, Yang C, Li Z, Cheong N. HRGCNLDA: Forecasting of lncRNA-disease association based on hierarchical refinement graph convolutional neural network. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:4814-4834. [PMID: 38872515 DOI: 10.3934/mbe.2024212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2024]
Abstract
Long non-coding RNA (lncRNA) is considered to be a crucial regulator involved in various human biological processes, including the regulation of tumor immune checkpoint proteins. It has great potential as both a cancer biomolecular biomarker and therapeutic target. Nevertheless, conventional biological experimental techniques are both resource-intensive and laborious, making it essential to develop an accurate and efficient computational method to facilitate the discovery of potential links between lncRNAs and diseases. In this study, we proposed HRGCNLDA, a computational approach utilizing hierarchical refinement of graph convolutional neural networks for forecasting lncRNA-disease potential associations. This approach effectively addresses the over-smoothing problem that arises from stacking multiple layers of graph convolutional neural networks. Specifically, HRGCNLDA enhances the layer representation during message propagation and node updates, thereby amplifying the contribution of hidden layers that resemble the ego layer while reducing discrepancies. The results of the experiments showed that HRGCNLDA achieved the highest AUC-ROC (area under the receiver operating characteristic curve, AUC for short) and AUC-PR (area under the precision versus recall curve, AUPR for short) values compared to other methods. Finally, to further demonstrate the reliability and efficacy of our approach, we performed case studies on the case of three prevalent human diseases, namely, breast cancer, lung cancer and gastric cancer.
Collapse
Affiliation(s)
- Li Peng
- College of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan 411201, China
- Hunan Key Laboratory for Service Computing and Novel Software Technology, Hunan University of Science and Technology, Xiangtan 411201, China
| | - Yujie Yang
- College of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan 411201, China
| | - Cheng Yang
- College of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan 411201, China
| | - Zejun Li
- School of Computer Science and Engineering, Hunan Institute of Technology, Hengyang 421002, China
| | - Ngai Cheong
- Faculty of Applied Sciences, Macao Polytechnic University, Macau 999078, China
| |
Collapse
|
22
|
Qi H, Yu T, Yu W, Liu C. Drug-target affinity prediction with extended graph learning-convolutional networks. BMC Bioinformatics 2024; 25:75. [PMID: 38365583 PMCID: PMC10874073 DOI: 10.1186/s12859-024-05698-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 02/12/2024] [Indexed: 02/18/2024] Open
Abstract
BACKGROUND High-performance computing plays a pivotal role in computer-aided drug design, a field that holds significant promise in pharmaceutical research. The prediction of drug-target affinity (DTA) is a crucial stage in this process, potentially accelerating drug development through rapid and extensive preliminary compound screening, while also minimizing resource utilization and costs. Recently, the incorporation of deep learning into DTA prediction and the enhancement of its accuracy have emerged as key areas of interest in the research community. Drugs and targets can be characterized through various methods, including structure-based, sequence-based, and graph-based representations. Despite the progress in structure and sequence-based techniques, they tend to provide limited feature information. Conversely, graph-based approaches have risen to prominence, attracting considerable attention for their comprehensive data representation capabilities. Recent studies have focused on constructing protein and drug molecular graphs using sequences and SMILES, subsequently deriving representations through graph neural networks. However, these graph-based approaches are limited by the use of a fixed adjacent matrix of protein and drug molecular graphs for graph convolution. This limitation restricts the learning of comprehensive feature representations from intricate compound and protein structures, consequently impeding the full potential of graph-based feature representation in DTA prediction. This, in turn, significantly impacts the models' generalization capabilities in the complex realm of drug discovery. RESULTS To tackle these challenges, we introduce GLCN-DTA, a model specifically designed for proficiency in DTA tasks. GLCN-DTA innovatively integrates a graph learning module into the existing graph architecture. This module is designed to learn a soft adjacent matrix, which effectively and efficiently refines the contextual structure of protein and drug molecular graphs. This advancement allows for learning richer structural information from protein and drug molecular graphs via graph convolution, specifically tailored for DTA tasks, compared to the conventional fixed adjacent matrix approach. A series of experiments have been conducted to validate the efficacy of the proposed GLCN-DTA method across diverse scenarios. The results demonstrate that GLCN-DTA possesses advantages in terms of robustness and high accuracy. CONCLUSIONS The proposed GLCN-DTA model enhances DTA prediction performance by introducing a novel framework that synergizes graph learning operations with graph convolution operations, thereby achieving richer representations. GLCN-DTA does not distinguish between different protein classifications, including structurally ordered and intrinsically disordered proteins, focusing instead on improving feature representation. Therefore, its applicability scope may be more effective in scenarios involving structurally ordered proteins, while potentially being limited in contexts with intrinsically disordered proteins.
Collapse
Affiliation(s)
- Haiou Qi
- Nursing Department, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, 310016, China
| | - Ting Yu
- Operating Room Department, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, 310016, China.
| | - Wenwen Yu
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China
| | - Chenxi Liu
- School of Medicine and Health Management, Tongji Medical School, Huazhong University of Science and Technology, Wuhan, 430030, China
| |
Collapse
|
23
|
Dehghan A, Abbasi K, Razzaghi P, Banadkuki H, Gharaghani S. CCL-DTI: contributing the contrastive loss in drug-target interaction prediction. BMC Bioinformatics 2024; 25:48. [PMID: 38291364 PMCID: PMC11264960 DOI: 10.1186/s12859-024-05671-3] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 01/22/2024] [Indexed: 02/01/2024] Open
Abstract
BACKGROUND The Drug-Target Interaction (DTI) prediction uses a drug molecule and a protein sequence as inputs to predict the binding affinity value. In recent years, deep learning-based models have gotten more attention. These methods have two modules: the feature extraction module and the task prediction module. In most deep learning-based approaches, a simple task prediction loss (i.e., categorical cross entropy for the classification task and mean squared error for the regression task) is used to learn the model. In machine learning, contrastive-based loss functions are developed to learn more discriminative feature space. In a deep learning-based model, extracting more discriminative feature space leads to performance improvement for the task prediction module. RESULTS In this paper, we have used multimodal knowledge as input and proposed an attention-based fusion technique to combine this knowledge. Also, we investigate how utilizing contrastive loss function along the task prediction loss could help the approach to learn a more powerful model. Four contrastive loss functions are considered: (1) max-margin contrastive loss function, (2) triplet loss function, (3) Multi-class N-pair Loss Objective, and (4) NT-Xent loss function. The proposed model is evaluated using four well-known datasets: Wang et al. dataset, Luo's dataset, Davis, and KIBA datasets. CONCLUSIONS Accordingly, after reviewing the state-of-the-art methods, we developed a multimodal feature extraction network by combining protein sequences and drug molecules, along with protein-protein interaction networks and drug-drug interaction networks. The results show it performs significantly better than the comparable state-of-the-art approaches.
Collapse
Affiliation(s)
- Alireza Dehghan
- Department of Bioinformatics, Kish International Campus, University of Tehran, Kish, 1417614411, Iran
| | - Karim Abbasi
- Laboratory of System Biology, Bioinformatics and Artificial Intelligence in Medicine (LBB&AI), Faculty of Mathematics and Computer Science, Kharazmi University, Tehran, 1417614411, Iran
| | - Parvin Razzaghi
- Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, 4513766731, Iran.
| | - Hossein Banadkuki
- Laboratory of Bioinformatics and Drug Design (LBD), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, 1417614411, Iran
| | - Sajjad Gharaghani
- Laboratory of Bioinformatics and Drug Design (LBD), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, 1417614411, Iran.
| |
Collapse
|
24
|
Xu F, Hu H, Lin H, Lu J, Cheng F, Zhang J, Li X, Shuai J. scGIR: deciphering cellular heterogeneity via gene ranking in single-cell weighted gene correlation networks. Brief Bioinform 2024; 25:bbae091. [PMID: 38487851 PMCID: PMC10940817 DOI: 10.1093/bib/bbae091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 02/08/2024] [Accepted: 02/15/2024] [Indexed: 03/18/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool for investigating cellular heterogeneity through high-throughput analysis of individual cells. Nevertheless, challenges arise from prevalent sequencing dropout events and noise effects, impacting subsequent analyses. Here, we introduce a novel algorithm, Single-cell Gene Importance Ranking (scGIR), which utilizes a single-cell gene correlation network to evaluate gene importance. The algorithm transforms single-cell sequencing data into a robust gene correlation network through statistical independence, with correlation edges weighted by gene expression levels. We then constructed a random walk model on the resulting weighted gene correlation network to rank the importance of genes. Our analysis of gene importance using PageRank algorithm across nine authentic scRNA-seq datasets indicates that scGIR can effectively surmount technical noise, enabling the identification of cell types and inference of developmental trajectories. We demonstrated that the edges of gene correlation, weighted by expression, play a critical role in enhancing the algorithm's performance. Our findings emphasize that scGIR outperforms in enhancing the clustering of cell subtypes, reverse identifying differentially expressed marker genes, and uncovering genes with potential differential importance. Overall, we proposed a promising method capable of extracting more information from single-cell RNA sequencing datasets, potentially shedding new lights on cellular processes and disease mechanisms.
Collapse
Affiliation(s)
- Fei Xu
- Department of Physics, Anhui Normal University, Wuhu 241002, China
- Wenzhou Institute and Wenzhou Key Laboratory of Biophysics, University of Chinese Academy of Sciences, Wenzhou 325001, China
| | - Huan Hu
- Institute of Applied Genomics, Fuzhou University, Fuzhou 350108, China
| | - Hai Lin
- Wenzhou Institute and Wenzhou Key Laboratory of Biophysics, University of Chinese Academy of Sciences, Wenzhou 325001, China
| | - Jun Lu
- Department of Physics, Anhui Normal University, Wuhu 241002, China
- School of Medical Imageology, Wannan Medical College, Wuhu 241002, China
| | - Feng Cheng
- Department of Physics, and Fujian Provincial Key Lab for Soft Functional Materials Research, Xiamen University, Xiamen 361005, China
| | - Jiqian Zhang
- Department of Physics, Anhui Normal University, Wuhu 241002, China
| | - Xiang Li
- Department of Physics, and Fujian Provincial Key Lab for Soft Functional Materials Research, Xiamen University, Xiamen 361005, China
| | - Jianwei Shuai
- Wenzhou Institute and Wenzhou Key Laboratory of Biophysics, University of Chinese Academy of Sciences, Wenzhou 325001, China
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou 325001, China
| |
Collapse
|
25
|
Peng CX, Liang F, Xia YH, Zhao KL, Hou MH, Zhang GJ. Recent Advances and Challenges in Protein Structure Prediction. J Chem Inf Model 2024; 64:76-95. [PMID: 38109487 DOI: 10.1021/acs.jcim.3c01324] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2023]
Abstract
Artificial intelligence has made significant advances in the field of protein structure prediction in recent years. In particular, DeepMind's end-to-end model, AlphaFold2, has demonstrated the capability to predict three-dimensional structures of numerous unknown proteins with accuracy levels comparable to those of experimental methods. This breakthrough has opened up new possibilities for understanding protein structure and function as well as accelerating drug discovery and other applications in the field of biology and medicine. Despite the remarkable achievements of artificial intelligence in the field, there are still some challenges and limitations. In this Review, we discuss the recent progress and some of the challenges in protein structure prediction. These challenges include predicting multidomain protein structures, protein complex structures, multiple conformational states of proteins, and protein folding pathways. Furthermore, we highlight directions in which further improvements can be conducted.
Collapse
Affiliation(s)
- Chun-Xiang Peng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Fang Liang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Yu-Hao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Kai-Long Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Ming-Hua Hou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
26
|
Zeng X, Zhong KY, Jiang B, Li Y. Fusing Sequence and Structural Knowledge by Heterogeneous Models to Accurately and Interpretively Predict Drug-Target Affinity. Molecules 2023; 28:8005. [PMID: 38138496 PMCID: PMC10745601 DOI: 10.3390/molecules28248005] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 12/06/2023] [Accepted: 12/06/2023] [Indexed: 12/24/2023] Open
Abstract
Drug-target affinity (DTA) prediction is crucial for understanding molecular interactions and aiding drug discovery and development. While various computational methods have been proposed for DTA prediction, their predictive accuracy remains limited, failing to delve into the structural nuances of interactions. With increasingly accurate and accessible structure prediction of targets, we developed a novel deep learning model, named S2DTA, to accurately predict DTA by fusing sequence features of drug SMILES, targets, and pockets and their corresponding graph structural features using heterogeneous models based on graph and semantic networks. Experimental findings underscored that complex feature representations imparted negligible enhancements to the model's performance. However, the integration of heterogeneous models demonstrably bolstered predictive accuracy. In comparison to three state-of-the-art methodologies, such as DeepDTA, GraphDTA, and DeepDTAF, S2DTA's performance became more evident. It exhibited a 25.2% reduction in mean absolute error (MAE) and a 20.1% decrease in root mean square error (RMSE). Additionally, S2DTA showed some improvements in other crucial metrics, including Pearson Correlation Coefficient (PCC), Spearman, Concordance Index (CI), and R2, with these metrics experiencing increases of 19.6%, 17.5%, 8.1%, and 49.4%, respectively. Finally, we conducted an interpretability analysis on the effectiveness of S2DTA by bidirectional self-attention mechanism. The analysis results supported that S2DTA was an effective and accurate tool for predicting DTA.
Collapse
Affiliation(s)
- Xin Zeng
- College of Mathematics and Computer Science, Dali University, Dali 671003, China; (X.Z.); (K.-Y.Z.)
| | - Kai-Yang Zhong
- College of Mathematics and Computer Science, Dali University, Dali 671003, China; (X.Z.); (K.-Y.Z.)
| | - Bei Jiang
- Yunnan Key Laboratory of Screening and Research on Anti-Pathogenic Plant Resources from Western Yunnan, Dali University, Dali 671000, China;
| | - Yi Li
- College of Mathematics and Computer Science, Dali University, Dali 671003, China; (X.Z.); (K.-Y.Z.)
| |
Collapse
|
27
|
Li H, Wang S, Zheng W, Yu L. Multi-dimensional search for drug-target interaction prediction by preserving the consistency of attention distribution. Comput Biol Chem 2023; 107:107968. [PMID: 37844375 DOI: 10.1016/j.compbiolchem.2023.107968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 09/27/2023] [Accepted: 10/05/2023] [Indexed: 10/18/2023]
Abstract
Predicting drug-target interaction (DTI) is a crucial step in the process of drug repurposing and new drug development. Although the attention mechanism has been widely used to capture the interactions between drugs and targets, it mainly uses the Simplified Molecular Input Line Entry System (SMILES) and two-dimensional (2D) molecular graph features of drugs. In this paper, we propose a neural network model called MdDTI for DTI prediction. The model searches for binding sites that may interact with the target from the multiple dimensions of drug structure, namely the 2D substructures and the three-dimensional (3D) spatial structure. For the 2D substructures, we have developed a novel substructure decomposition strategy based on drug molecular graphs and compared its performance with the SMILES-based decomposition method. For the 3D spatial structure of drugs, we constructed spatial feature representation matrices for drugs based on the Cartesian coordinates of heavy atoms (without hydrogen atoms) in each drug. Finally, to ensure the search results of the model are consistent across multiple dimensions, we construct a consistency loss function. We evaluate MdDTI on four drug-target interaction datasets and three independent compound-protein affinity test sets. The results indicate that our model surpasses a series of state-of-the-art models. Case studies demonstrate that our model is capable of capturing the potential binding regions between drugs and targets, and it shows efficacy in drug repurposing. Our code is available at https://github.com/lhhu1999/MdDTI.
Collapse
Affiliation(s)
- Huaihu Li
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| | - Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China; The Key Lab of Intelligent Systems and Computing of Yunnan Province, Yunnan University, Kunming, Yunnan, China.
| | - Weihua Zheng
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| | - Li Yu
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| |
Collapse
|
28
|
Zhang Y, Liu C, Liu M, Liu T, Lin H, Huang CB, Ning L. Attention is all you need: utilizing attention in AI-enabled drug discovery. Brief Bioinform 2023; 25:bbad467. [PMID: 38189543 PMCID: PMC10772984 DOI: 10.1093/bib/bbad467] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 11/03/2023] [Accepted: 11/25/2023] [Indexed: 01/09/2024] Open
Abstract
Recently, attention mechanism and derived models have gained significant traction in drug development due to their outstanding performance and interpretability in handling complex data structures. This review offers an in-depth exploration of the principles underlying attention-based models and their advantages in drug discovery. We further elaborate on their applications in various aspects of drug development, from molecular screening and target binding to property prediction and molecule generation. Finally, we discuss the current challenges faced in the application of attention mechanisms and Artificial Intelligence technologies, including data quality, model interpretability and computational resource constraints, along with future directions for research. Given the accelerating pace of technological advancement, we believe that attention-based models will have an increasingly prominent role in future drug discovery. We anticipate that these models will usher in revolutionary breakthroughs in the pharmaceutical domain, significantly accelerating the pace of drug development.
Collapse
Affiliation(s)
- Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Caiqi Liu
- Department of Gastrointestinal Medical Oncology, Harbin Medical University Cancer Hospital, No.150 Haping Road, Nangang District, Harbin, Heilongjiang 150081, China
- Key Laboratory of Molecular Oncology of Heilongjiang Province, No.150 Haping Road, Nangang District, Harbin, Heilongjiang 150081, China
| | - Mujiexin Liu
- Chongqing Key Laboratory of Sichuan-Chongqing Co-construction for Diagnosis and Treatment of Infectious Diseases Integrated Traditional Chinese and Western Medicine, College of Medical Technology, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Tianyuan Liu
- Graduate School of Science and Technology, University of Tsukuba, Tsukuba, Japan
| | - Hao Lin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Cheng-Bing Huang
- School of Computer Science and Technology, Aba Teachers University, Aba, China
| | - Lin Ning
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu 611844, China
| |
Collapse
|
29
|
Zhang L, Wang CC, Zhang Y, Chen X. GPCNDTA: Prediction of drug-target binding affinity through cross-attention networks augmented with graph features and pharmacophores. Comput Biol Med 2023; 166:107512. [PMID: 37788507 DOI: 10.1016/j.compbiomed.2023.107512] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 08/28/2023] [Accepted: 09/19/2023] [Indexed: 10/05/2023]
Abstract
Drug-target affinity prediction is a challenging task in drug discovery. The latest computational models have limitations in mining edge information in molecule graphs, accessing to knowledge in pharmacophores, integrating multimodal data of the same biomolecule and realizing effective interactions between two different biomolecules. To solve these problems, we proposed a method called Graph features and Pharmacophores augmented Cross-attention Networks based Drug-Target binding Affinity prediction (GPCNDTA). First, we utilized the GNN module, the linear projection unit and self-attention layer to correspondingly extract features of drugs and proteins. Second, we devised intramolecular and intermolecular cross-attention to respectively fuse and interact features of drugs and proteins. Finally, the linear projection unit was applied to gain final features of drugs and proteins, and the Multi-Layer Perceptron was employed to predict drug-target binding affinity. Three major innovations of GPCNDTA are as follows: (i) developing the residual CensNet and the residual EW-GCN to correspondingly extract features of drug and protein graphs, (ii) regarding pharmacophores as a new type of priors to heighten drug-target affinity prediction performance, and (iii) devising intramolecular and intermolecular cross-attention, in which the intramolecular cross-attention realizes the effective fusion of different modal data related to the same biomolecule, and the intermolecular cross-attention fulfills the information interaction between two different biomolecules in attention space. The test results on five benchmark datasets imply that GPCNDTA achieves the best performance compared with state-of-the-art computational models. Besides, relying on ablation experiments, we proved effectiveness of GNN modules, pharmacophores and two cross-attention strategies in improving the prediction accuracy, stability and reliability of GPCNDA. In case studies, we applied GPCNDTA to predict binding affinities between 3C-like proteinase and 185 drugs, and observed that most binding affinities predicted by GPCNDTA are close to corresponding experimental measurements.
Collapse
Affiliation(s)
- Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Chun-Chun Wang
- School of Science, Jiangnan University, Wuxi, 214122, China
| | - Yong Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Xing Chen
- School of Science, Jiangnan University, Wuxi, 214122, China.
| |
Collapse
|
30
|
Zhu HT, Xia YH, Zhang GJ. E2EDA: Protein Domain Assembly Based on End-to-End Deep Learning. J Chem Inf Model 2023; 63:6451-6461. [PMID: 37788318 DOI: 10.1021/acs.jcim.3c01387] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
With the development of deep learning, almost all single-domain proteins can be predicted at experimental resolution. However, the structure prediction of multi-domain proteins remains a challenge. Achieving end-to-end protein domain assembly and further improving the accuracy of the full-chain modeling by accurately predicting inter-domain orientation while improving the assembly efficiency will provide significant insights into structure-based drug discovery. In this work, we propose an End-to-End Domain Assembly method based on deep learning, named E2EDA. We first develop RMNet, an EfficientNetV2-based deep learning model that fuses multiple features using an attention mechanism to predict inter-domain rigid motion. Then, the predicted rigid motions are transformed into inter-domain spatial transformations to directly assemble the full-chain model. Finally, the scoring strategy RMscore is designed to select the best model from multiple assembled models. The experimental results show that the average TM-score of the model assembled by E2EDA on the benchmark set (282) is 0.827, which is better than those of other domain assembly methods SADA (0.792) and DEMO (0.730). Meanwhile, on our constructed multi-domain data set from AlphaFold DB, the model reassembled by E2EDA is 7.0% higher in TM-score compared to the full-chain model predicted by AlphaFold2, indicating that E2EDA can capture more accurate inter-domain orientations to improve the quality of the model predicted by AlphaFold2. Furthermore, compared to SADA and AlphaFold2, E2EDA reduced the average runtime on the benchmark by 64.7% and 19.2%, respectively, indicating that E2EDA can significantly improve assembly efficiency through an end-to-end approach. The online server is available at http://zhanglab-bioinf.com/E2EDA.
Collapse
Affiliation(s)
- Hai-Tao Zhu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Yu-Hao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| |
Collapse
|
31
|
Alghushairy O, Ali F, Alghamdi W, Khalid M, Alsini R, Asiry O. Machine learning-based model for accurate identification of druggable proteins using light extreme gradient boosting. J Biomol Struct Dyn 2023; 42:12330-12341. [PMID: 37850427 DOI: 10.1080/07391102.2023.2269280] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 10/04/2023] [Indexed: 10/19/2023]
Abstract
The identification of druggable proteins (DPs) is significant for the development of new drugs, personalized medicine, understanding of disease mechanisms, drug repurposing, and economic benefits. By identifying new druggable targets, researchers can develop new therapies for a range of diseases, leading to better patient outcomes. Identification of DPs by machine learning strategies is more efficient and cost-effective than conventional methods. In this study, a computational predictor, namely Drug-LXGB, is introduced to enhance the identification of DPs. Features are discovered by composition, transition, and distribution (CTD), composition of K-spaced amino acid pair (CKSAAP), pseudo-position-specific scoring matrix (PsePSSM), and a novel descriptor, called multi-block pseudo amino acid composition (MB-PseAAC). The dimensions of CTD, CKSAAP, PsePSSM, and MB-PseAAC are integrated and utilized the sequential forward selection as feature selection algorithm. The best characteristics are provided by random forest, extreme gradient boosting, and light eXtreme gradient boosting (LXGB). The predictive analysis of these learning methods is measured via 10-fold cross-validation. The LXGB-based model secures the highest results than other existing predictors. Our novel protocol will perform an active role in designing novel drugs and would be fruitful to explore the potential target. This study will help better to capture a more universal view of a potential target.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Omar Alghushairy
- Department of Information Systems and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Farman Ali
- Department of Software Engineering, Sarhad University of Science and Information Technology Peshawar Mardan Campus, Peshawar, Pakistan
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Majdi Khalid
- Department of Computer Science, College of Computers and Information Systems, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Raed Alsini
- Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Othman Asiry
- Department of Information Technology, College of Computing and Information Technology at Khulais, University of Jeddah, Jeddah, Saudi Arabia
| |
Collapse
|
32
|
Pan J, You Z, You W, Zhao T, Feng C, Zhang X, Ren F, Ma S, Wu F, Wang S, Sun Y. PTBGRP: predicting phage-bacteria interactions with graph representation learning on microbial heterogeneous information network. Brief Bioinform 2023; 24:bbad328. [PMID: 37742053 DOI: 10.1093/bib/bbad328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 08/14/2023] [Accepted: 08/30/2023] [Indexed: 09/25/2023] Open
Abstract
Identifying the potential bacteriophages (phage) candidate to treat bacterial infections plays an essential role in the research of human pathogens. Computational approaches are recognized as a valid way to predict bacteria and target phages. However, most of the current methods only utilize lower-order biological information without considering the higher-order connectivity patterns, which helps to improve the predictive accuracy. Therefore, we developed a novel microbial heterogeneous interaction network (MHIN)-based model called PTBGRP to predict new phages for bacterial hosts. Specifically, PTBGRP first constructs an MHIN by integrating phage-bacteria interaction (PBI) and six bacteria-bacteria interaction networks with their biological attributes. Then, different representation learning methods are deployed to extract higher-level biological features and lower-level topological features from MHIN. Finally, PTBGRP employs a deep neural network as the classifier to predict unknown PBI pairs based on the fused biological information. Experiment results demonstrated that PTBGRP achieves the best performance on the corresponding ESKAPE pathogens and PBI dataset when compared with state-of-art methods. In addition, case studies of Klebsiella pneumoniae and Staphylococcus aureus further indicate that the consideration of rich heterogeneous information enables PTBGRP to accurately predict PBI from a more comprehensive perspective. The webserver of the PTBGRP predictor is freely available at http://120.77.11.78/PTBGRP/.
Collapse
Affiliation(s)
- Jie Pan
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, the College of Life Sciences, Northwest University, Xi'an 710069, China
| | - Zhuhong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
| | - Wencai You
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, the College of Life Sciences, Northwest University, Xi'an 710069, China
| | - Tian Zhao
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, the College of Life Sciences, Northwest University, Xi'an 710069, China
| | - Chenlu Feng
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, the College of Life Sciences, Northwest University, Xi'an 710069, China
| | - Xuexia Zhang
- North China Pharmaceutical Group, Shijiazhuang 050015, Hebei, China
- National Microbial Medicine Engineering & Research Center, Shijiazhuang 050015, Hebei, China
| | - Fengzhi Ren
- North China Pharmaceutical Group, Shijiazhuang 050015, Hebei, China
- National Microbial Medicine Engineering & Research Center, Shijiazhuang 050015, Hebei, China
| | - Sanxing Ma
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, the College of Life Sciences, Northwest University, Xi'an 710069, China
| | - Fan Wu
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, the College of Life Sciences, Northwest University, Xi'an 710069, China
| | - Shiwei Wang
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, the College of Life Sciences, Northwest University, Xi'an 710069, China
| | - Yanmei Sun
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, the College of Life Sciences, Northwest University, Xi'an 710069, China
| |
Collapse
|
33
|
Peng L, Tan J, Xiong W, Zhang L, Wang Z, Yuan R, Li Z, Chen X. Deciphering ligand-receptor-mediated intercellular communication based on ensemble deep learning and the joint scoring strategy from single-cell transcriptomic data. Comput Biol Med 2023; 163:107137. [PMID: 37364528 DOI: 10.1016/j.compbiomed.2023.107137] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Revised: 05/18/2023] [Accepted: 06/04/2023] [Indexed: 06/28/2023]
Abstract
BACKGROUND Cell-cell communication in a tumor microenvironment is vital to tumorigenesis, tumor progression and therapy. Intercellular communication inference helps understand molecular mechanisms of tumor growth, progression and metastasis. METHODS Focusing on ligand-receptor co-expressions, in this study, we developed an ensemble deep learning framework, CellComNet, to decipher ligand-receptor-mediated cell-cell communication from single-cell transcriptomic data. First, credible LRIs are captured by integrating data arrangement, feature extraction, dimension reduction, and LRI classification based on an ensemble of heterogeneous Newton boosting machine and deep neural network. Next, known and identified LRIs are screened based on single-cell RNA sequencing (scRNA-seq) data in certain tissues. Finally, cell-cell communication is inferred by incorporating scRNA-seq data, the screened LRIs, a joint scoring strategy that combines expression thresholding and expression product of ligands and receptors. RESULTS The proposed CellComNet framework was compared with four competing protein-protein interaction prediction models (PIPR, XGBoost, DNNXGB, and OR-RCNN) and obtained the best AUCs and AUPRs on four LRI datasets, elucidating the optimal LRI classification ability. CellComNet was further applied to analyze intercellular communication in human melanoma and head and neck squamous cell carcinoma (HNSCC) tissues. The results demonstrate that cancer-associated fibroblasts highly communicate with melanoma cells and endothelial cells strong communicate with HNSCC cells. CONCLUSIONS The proposed CellComNet framework efficiently identified credible LRIs and significantly improved cell-cell communication inference performance. We anticipate that CellComNet can contribute to anticancer drug design and tumor-targeted therapy.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China; College of Life Sciences and Chemistry, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Jingwei Tan
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Wei Xiong
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, Jiangsu, China
| | - Zhao Wang
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Ruya Yuan
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Zejun Li
- School of Computer Science, Hunan Institute of Technology, Hengyang, 421002, Hunan, China.
| | - Xing Chen
- School of Science, Jiangnan University, Wuxi, 214122, Jiangsu, China.
| |
Collapse
|
34
|
Lv J, Liu G, Ju Y, Huang H, Sun Y. AADB: A Manually Collected Database for Combinations of Antibiotics With Adjuvants. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2827-2836. [PMID: 37279138 DOI: 10.1109/tcbb.2023.3283221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Antimicrobial resistance is a global public health concern. The lack of innovations in antibiotic development has led to renewed interest in antibiotic adjuvants. However, there is no database to collect antibiotic adjuvants. Herein, we build a comprehensive database named Antibiotic Adjuvant DataBase (AADB) by manually collecting relevant literature. Specifically, AADB includes 3,035 combinations of antibiotics with adjuvants, covering 83 antibiotics, 226 adjuvants, and 325 bacterial strains. AADB provides user-friendly interfaces for searching and downloading. Users can easily obtain these datasets for further analysis. In addition, we also collected related datasets (e.g., chemogenomic and metabolomic data) and proposed a computational strategy to dissect these datasets. As a test case, we identified 10 candidates for minocycline, and 6 of 10 candidates are the known adjuvants that synergize with minocycline to inhibit the growth of E. coli BW25113. We hope that AADB can help users to identify effective antibiotic adjuvants. AADB is freely available at http://www.acdb.plus/AADB.
Collapse
|
35
|
Binatlı OC, Gönen M. MOKPE: drug-target interaction prediction via manifold optimization based kernel preserving embedding. BMC Bioinformatics 2023; 24:276. [PMID: 37407927 DOI: 10.1186/s12859-023-05401-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 06/25/2023] [Indexed: 07/07/2023] Open
Abstract
BACKGROUND In many applications of bioinformatics, data stem from distinct heterogeneous sources. One of the well-known examples is the identification of drug-target interactions (DTIs), which is of significant importance in drug discovery. In this paper, we propose a novel framework, manifold optimization based kernel preserving embedding (MOKPE), to efficiently solve the problem of modeling heterogeneous data. Our model projects heterogeneous drug and target data into a unified embedding space by preserving drug-target interactions and drug-drug, target-target similarities simultaneously. RESULTS We performed ten replications of ten-fold cross validation on four different drug-target interaction network data sets for predicting DTIs for previously unseen drugs. The classification evaluation metrics showed better or comparable performance compared to previous similarity-based state-of-the-art methods. We also evaluated MOKPE on predicting unknown DTIs of a given network. Our implementation of the proposed algorithm in R together with the scripts that replicate the reported experiments is publicly available at https://github.com/ocbinatli/mokpe .
Collapse
Affiliation(s)
- Oğuz C Binatlı
- Graduate School of Sciences and Engineering, Koç University, 34450, Istanbul, Turkey
| | - Mehmet Gönen
- Department of Industrial Engineering, College of Engineering, Koç University, 34450, Istanbul, Turkey.
- School of Medicine, Koç University, 34450, Istanbul, Turkey.
| |
Collapse
|
36
|
Wang F, Yang H, Wu Y, Peng L, Li X. SAELGMDA: Identifying human microbe-disease associations based on sparse autoencoder and LightGBM. Front Microbiol 2023; 14:1207209. [PMID: 37415823 PMCID: PMC10320730 DOI: 10.3389/fmicb.2023.1207209] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 05/18/2023] [Indexed: 07/08/2023] Open
Abstract
Introduction Identification of complex associations between diseases and microbes is important to understand the pathogenesis of diseases and design therapeutic strategies. Biomedical experiment-based Microbe-Disease Association (MDA) detection methods are expensive, time-consuming, and laborious. Methods Here, we developed a computational method called SAELGMDA for potential MDA prediction. First, microbe similarity and disease similarity are computed by integrating their functional similarity and Gaussian interaction profile kernel similarity. Second, one microbe-disease pair is presented as a feature vector by combining the microbe and disease similarity matrices. Next, the obtained feature vectors are mapped to a low-dimensional space based on a Sparse AutoEncoder. Finally, unknown microbe-disease pairs are classified based on Light Gradient boosting machine. Results The proposed SAELGMDA method was compared with four state-of-the-art MDA methods (MNNMDA, GATMDA, NTSHMDA, and LRLSHMDA) under five-fold cross validations on diseases, microbes, and microbe-disease pairs on the HMDAD and Disbiome databases. The results show that SAELGMDA computed the best accuracy, Matthews correlation coefficient, AUC, and AUPR under the majority of conditions, outperforming the other four MDA prediction models. In particular, SAELGMDA obtained the best AUCs of 0.8358 and 0.9301 under cross validation on diseases, 0.9838 and 0.9293 under cross validation on microbes, and 0.9857 and 0.9358 under cross validation on microbe-disease pairs on the HMDAD and Disbiome databases. Colorectal cancer, inflammatory bowel disease, and lung cancer are diseases that severely threat human health. We used the proposed SAELGMDA method to find possible microbes for the three diseases. The results demonstrate that there are potential associations between Clostridium coccoides and colorectal cancer and one between Sphingomonadaceae and inflammatory bowel disease. In addition, Veillonella may associate with autism. The inferred MDAs need further validation. Conclusion We anticipate that the proposed SAELGMDA method contributes to the identification of new MDAs.
Collapse
Affiliation(s)
- Feixiang Wang
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Huandong Yang
- Department of Gastrointestinal Surgery, Yidu Central Hospital of Weifang, Weifang, China
| | - Yan Wu
- Geneis (Beijing) Co., Ltd., Beijing, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Xiaoling Li
- The Second Department of Oncology, Beidahuang Industry Group General Hospital, Harbin, China
- The Second Department of Oncology, Heilongjiang Second Cancer Hospital, Harbin, China
| |
Collapse
|