1
|
Dong K, Lin X, Zhang Y. Molecular property prediction based on graph contrastive learning with partial feature masking. J Mol Graph Model 2025; 138:109014. [PMID: 40120380 DOI: 10.1016/j.jmgm.2025.109014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 01/24/2025] [Accepted: 03/10/2025] [Indexed: 03/25/2025]
Abstract
Molecular representation learning facilitates multiple downstream tasks such as molecular property prediction (MPP) and drug design. Recent studies have shown great promise in applying self-supervised learning (SSL) to cope with the data scarcity in MPP. Contrastive learning (CL) is a typical SSL method used to learn prior knowledge so that the trained model has better generalization performance on various downstream tasks. One important issue of CL is how to generate enhanced samples that preserve the molecular core semantics for each training sample, which may significantly impact the earnings of the CL strategy. To address this issue, we propose the partial Feature Masking-based molecular Graph Contrastive Learning model (FMGCL). FMGCL constructs the masked molecular graph by masking partial features of each atom and bond in the featured molecular graph. Since the masking molecular graphs preserve the chemical structure of the molecules, they do not violate the chemical semantics of molecules, which is beneficial for capturing valuable prior knowledge of molecules during pre-training. Then, FMGCL fine-tunes the well-trained encoder on the featured molecular graph for downstream tasks. Moreover, we propose using the relative distance between samples within a batch to enhance the performance in regression tasks. Experiments on the 12 benchmark datasets from MoleculeNet and ChEMBL showed the superiority of FMGCL.
Collapse
Affiliation(s)
- Kunjie Dong
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China.
| | - Xiaohui Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China.
| | - Yanhui Zhang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China.
| |
Collapse
|
2
|
Pala MA. Graph-Aware AURALSTM: An Attentive Unified Representation Architecture with BiLSTM for Enhanced Molecular Property Prediction. Mol Divers 2025:10.1007/s11030-025-11197-4. [PMID: 40279083 DOI: 10.1007/s11030-025-11197-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2025] [Accepted: 04/12/2025] [Indexed: 04/26/2025]
Abstract
Predicting molecular properties with high accuracy is essential across scientific fields, from drug discovery and biotechnology to materials science and environmental research. In biomedical sciences, accurate molecular property prediction is crucial for elucidating disease mechanisms, identifying potential drug candidates, and optimising various processes. However, existing approaches, often based on low-dimensional representations, fail to capture the intricate spatial and structural complexities of molecular data. This study introduces a novel hybrid deep learning model, the Graph-Aware AURA-LSTM (Attentive Unified Representation Architecture-Long Short-Term Memory), designed to determine molecular properties with unprecedented accuracy using advanced graphical representations. AURA-LSTM combines multiple Graph Neural Network (GNN) architectures, specifically Graph Convolutional Networks (GCNs), Graph Attention Networks (GATs), and Graph Isomorphism Networks (GINs), in a parallel structure to comprehensively capture the multidimensional structural features of molecules. Within this architecture, GCNs incorporate local structural relationships, GATs apply attention mechanisms to highlight critical structural elements, and GINs capture intricate molecular details through isomorphic distinction, resulting in a richly detailed feature matrix. The feature layer then processes this BiLSTM matrix, which evaluates temporal relationships to enhance molecular feature classification. Evaluated on eight benchmark datasets, AURA-LSTM demonstrated superior performance, consistently achieving over 90% accuracy and outperforming state-of-the-art methods. These results position AURA-LSTM as a robust tool for molecular feature classification, uniquely capable of integrating temporally aware insights from distinct GNN architectures.
Collapse
Affiliation(s)
- Muhammed Ali Pala
- Department of Electrical and Electronics Engineering, Faculty of Technology, Sakarya University of Applied Sciences, 54050, Sakarya, Turkey.
- Biomedical Technologies Application and Research Center (BIYOTAM), Sakarya University of Applied Sciences, Sakarya, Turkey.
| |
Collapse
|
3
|
Zheng S, Zhang C, Chen Y, Chen M. Graph and Multi-Level Sequence Fusion Learning for Predicting the Molecular Activity of BACE-1 Inhibitors. Int J Mol Sci 2025; 26:1681. [PMID: 40004143 PMCID: PMC11855840 DOI: 10.3390/ijms26041681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2025] [Revised: 02/12/2025] [Accepted: 02/14/2025] [Indexed: 02/27/2025] Open
Abstract
The development of BACE-1 (β-site amyloid precursor protein cleaving enzyme 1) inhibitors is a crucial focus in exploring early treatments for Alzheimer's disease (AD). Recently, graph neural networks (GNNs) have demonstrated significant advantages in predicting molecular activity. However, their reliance on graph structures alone often neglects explicit sequence-level semantic information. To address this limitation, we proposed a Graph and multi-level Sequence Fusion Learning (GSFL) model for predicting the molecular activity of BACE-1 inhibitors. Firstly, molecular graph structures generated from SMILES strings were encoded using GNNs with an atomic-level characteristic attention mechanism. Next, substrings at functional group, ion level, and atomic level substrings were extracted from SMILES strings and encoded using a BiLSTM-Transformer framework equipped with a hierarchical attention mechanism. Finally, these features were fused to predict the activity of BACE-1 inhibitors. A dataset of 1548 compounds with BACE-1 activity measurements was curated from the ChEMBL database. In the classification experiment, the model achieved an accuracy of 0.941 on the training set and 0.877 on the test set. For the test set, it delivered a sensitivity of 0.852, a specificity of 0.894, a MCC of 0.744, an F1-score of 0.872, a PRC of 0.869, and an AUC of 0.915. Compared to traditional computer-aided drug design methods and other machine learning algorithms, the proposed model can effectively improve the accuracy of the molecular activity prediction of BACE-1 inhibitors and has a potential application value.
Collapse
Affiliation(s)
- Shaohua Zheng
- College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China
| | - Changwang Zhang
- College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China
| | - Youjia Chen
- College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China
| | - Meimei Chen
- College of Traditional Chinese Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou 350122, China
| |
Collapse
|
4
|
Xu Y, Liu X, Xia W, Ge J, Ju CW, Zhang H, Zhang JZ. ChemXTree: A Feature-Enhanced Graph Neural Network-Neural Decision Tree Framework for ADMET Prediction. J Chem Inf Model 2024; 64:8440-8452. [PMID: 39497657 PMCID: PMC11600499 DOI: 10.1021/acs.jcim.4c01186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 10/18/2024] [Accepted: 10/29/2024] [Indexed: 11/07/2024]
Abstract
The rapid progression of machine learning, especially deep learning (DL), has catalyzed a new era in drug discovery, introducing innovative approaches for predicting molecular properties. Despite the many methods available for feature representation, efficiently utilizing rich, high-dimensional information remains a significant challenge. Our work introduces ChemXTree, a novel graph-based model that integrates a Gate Modulation Feature Unit (GMFU) and neural decision tree (NDT) in the output layer to address this challenge. Extensive evaluations on benchmark data sets, including MoleculeNet and eight additional drug databases, have demonstrated ChemXTree's superior performance, surpassing or matching the current state-of-the-art models. Visualization techniques clearly demonstrate that ChemXTree significantly improves the separation between substrates and nonsubstrates in the latent space. In summary, ChemXTree demonstrates a promising approach for integrating advanced feature extraction with neural decision trees, offering significant improvements in predictive accuracy for drug discovery tasks and opening new avenues for optimizing molecular properties.
Collapse
Affiliation(s)
- Yuzhi Xu
- Shanghai
Frontiers Science Center of Artificial Intelligence and Deep Learning
and NYU-ECNU Center for Computational Chemistry, NYU Shanghai, Shanghai 200062, China
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Xinxin Liu
- Department
of Computer and Information Science, University
of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
- Department
of Materials Science and Engineering, University
of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
| | - Wei Xia
- Shanghai
Frontiers Science Center of Artificial Intelligence and Deep Learning
and NYU-ECNU Center for Computational Chemistry, NYU Shanghai, Shanghai 200062, China
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Jiankai Ge
- Chemical
and Biomolecular Engineering, University
of Illinois at Urbana−Champaign, Urbana, Illinois 61801, United States
| | - Cheng-Wei Ju
- Pritzker
School of Molecular Engineering, The University
of Chicago, Chicago, Illinois 60615, United States
| | - Haiping Zhang
- Faculty of
Synthetic Biology, Shenzhen Institute of
Advanced Technology, Shenzhen 518055, China
| | - John Z.H. Zhang
- Shanghai
Frontiers Science Center of Artificial Intelligence and Deep Learning
and NYU-ECNU Center for Computational Chemistry, NYU Shanghai, Shanghai 200062, China
- Department
of Chemistry, New York University, New York, New York 10003, United States
- Faculty of
Synthetic Biology, Shenzhen Institute of
Advanced Technology, Shenzhen 518055, China
- Shanghai
Engineering Research Center of Molecular Therapeutics and New Drug
Development, School of Chemistry and Molecular Engineering, East China Normal University, 200062 Shanghai, China
| |
Collapse
|
5
|
Zhang Y, Bai X. Geometry-Augmented Molecular Representation Learning for Property Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1518-1528. [PMID: 38758624 DOI: 10.1109/tcbb.2024.3402337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2024]
Abstract
Accurate molecular representation plays a crucial role in expediting the process of drug discovery. Graph neural networks (GNNs) have demonstrated robust capabilities in molecular representation learning, adept at capturing structural and spatial information in molecular graphs. For molecular representation learning, most previous GNN methods are specialized in dealing with 2D or 3D molecular data formats. By further fusing the geometric attributes and structural features of molecules, we can elevate the performance of molecular representation. To realize this, we present a novel geometry-augmented molecular representation learning model, designed to effectively encode both the 2D structural and 3D spatial information inherent in molecular graphs. By incorporating structural and spatial information as attention biases in the graph Transformer framework, our model offers a comprehensive architecture that introduces molecular structural details at both atom and bond levels. We further propose a geometry information fusion module to encode the geometry information within 3D molecular graphs. The experimental results show the efficacy of our model, demonstrating its ability to achieve competitive performance when compared to state-of-the-art (SOTA) models in various property prediction tasks.
Collapse
|
6
|
Hou L, Xiang H, Zeng X, Cao D, Zeng L, Song B. Attribute-guided prototype network for few-shot molecular property prediction. Brief Bioinform 2024; 25:bbae394. [PMID: 39133096 PMCID: PMC11318080 DOI: 10.1093/bib/bbae394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 07/08/2024] [Accepted: 07/27/2024] [Indexed: 08/13/2024] Open
Abstract
The molecular property prediction (MPP) plays a crucial role in the drug discovery process, providing valuable insights for molecule evaluation and screening. Although deep learning has achieved numerous advances in this area, its success often depends on the availability of substantial labeled data. The few-shot MPP is a more challenging scenario, which aims to identify unseen property with only few available molecules. In this paper, we propose an attribute-guided prototype network (APN) to address the challenge. APN first introduces an molecular attribute extractor, which can not only extract three different types of fingerprint attributes (single fingerprint attributes, dual fingerprint attributes, triplet fingerprint attributes) by considering seven circular-based, five path-based, and two substructure-based fingerprints, but also automatically extract deep attributes from self-supervised learning methods. Furthermore, APN designs the Attribute-Guided Dual-channel Attention module to learn the relationship between the molecular graphs and attributes and refine the local and global representation of the molecules. Compared with existing works, APN leverages high-level human-defined attributes and helps the model to explicitly generalize knowledge in molecular graphs. Experiments on benchmark datasets show that APN can achieve state-of-the-art performance in most cases and demonstrate that the attributes are effective for improving few-shot MPP performance. In addition, the strong generalization ability of APN is verified by conducting experiments on data from different domains.
Collapse
Affiliation(s)
- Linlin Hou
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
- Department of AIDD, Shanghai Yuyao Biotechnology Co., Ltd., Shanghai 201109, China
| | - Hongxin Xiang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
- Department of AIDD, Shanghai Yuyao Biotechnology Co., Ltd., Shanghai 201109, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410083, China
| | - Li Zeng
- Department of AIDD, Shanghai Yuyao Biotechnology Co., Ltd., Shanghai 201109, China
| | - Bosheng Song
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| |
Collapse
|
7
|
Kang L, Zhou S, Fang S, Liu S. Adapting differential molecular representation with hierarchical prompts for multi-label property prediction. Brief Bioinform 2024; 25:bbae438. [PMID: 39252594 PMCID: PMC11383732 DOI: 10.1093/bib/bbae438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 08/05/2024] [Accepted: 08/21/2024] [Indexed: 09/11/2024] Open
Abstract
Accurate prediction of molecular properties is crucial in drug discovery. Traditional methods often overlook that real-world molecules typically exhibit multiple property labels with complex correlations. To this end, we propose a novel framework, HiPM, which stands for Hierarchical Prompted Molecular representation learning framework. HiPM leverages task-aware prompts to enhance the differential expression of tasks in molecular representations and mitigate negative transfer caused by conflicts in individual task information. Our framework comprises two core components: the Molecular Representation Encoder (MRE) and the Task-Aware Prompter (TAP). MRE employs a hierarchical message-passing network architecture to capture molecular features at both the atom and motif levels. Meanwhile, TAP utilizes agglomerative hierarchical clustering algorithm to construct a prompt tree that reflects task affinity and distinctiveness, enabling the model to consider multi-granular correlation information among tasks, thereby effectively handling the complexity of multi-label property prediction. Extensive experiments demonstrate that HiPM achieves state-of-the-art performance across various multi-label datasets, offering a novel perspective on multi-label molecular representation learning.
Collapse
Affiliation(s)
- Linjia Kang
- College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Songhua Zhou
- College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Shuyan Fang
- College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Shichao Liu
- College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| |
Collapse
|
8
|
Zhang R, Lin Y, Wu Y, Deng L, Zhang H, Liao M, Peng Y. MvMRL: a multi-view molecular representation learning method for molecular property prediction. Brief Bioinform 2024; 25:bbae298. [PMID: 38920342 PMCID: PMC11200189 DOI: 10.1093/bib/bbae298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 05/09/2024] [Accepted: 06/07/2024] [Indexed: 06/27/2024] Open
Abstract
Effective molecular representation learning is very important for Artificial Intelligence-driven Drug Design because it affects the accuracy and efficiency of molecular property prediction and other molecular modeling relevant tasks. However, previous molecular representation learning studies often suffer from limitations, such as over-reliance on a single molecular representation, failure to fully capture both local and global information in molecular structure, and ineffective integration of multiscale features from different molecular representations. These limitations restrict the complete and accurate representation of molecular structure and properties, ultimately impacting the accuracy of predicting molecular properties. To this end, we propose a novel multi-view molecular representation learning method called MvMRL, which can incorporate feature information from multiple molecular representations and capture both local and global information from different views well, thus improving molecular property prediction. Specifically, MvMRL consists of four parts: a multiscale CNN-SE Simplified Molecular Input Line Entry System (SMILES) learning component and a multiscale Graph Neural Network encoder to extract local feature information and global feature information from the SMILES view and the molecular graph view, respectively; a Multi-Layer Perceptron network to capture complex non-linear relationship features from the molecular fingerprint view; and a dual cross-attention component to fuse feature information on the multi-views deeply for predicting molecular properties. We evaluate the performance of MvMRL on 11 benchmark datasets, and experimental results show that MvMRL outperforms state-of-the-art methods, indicating its rationality and effectiveness in molecular property prediction. The source code of MvMRL was released in https://github.com/jedison-github/MvMRL.
Collapse
Affiliation(s)
- Ru Zhang
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Nanning Normal University, No. 175, Mingxiu East Road, Xixiang Tang District, Nanning 530001, China
| | - Yanmei Lin
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Nanning Normal University, No. 175, Mingxiu East Road, Xixiang Tang District, Nanning 530001, China
- Center for Applied Mathematics of Guangxi, Nanning Normal University, 508 Xinning Road, Wuming District, Nanning 530100, China
| | - Yijia Wu
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Nanning Normal University, No. 175, Mingxiu East Road, Xixiang Tang District, Nanning 530001, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, 932 Lushan South Road, Changsha 410083, China
| | - Hao Zhang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen 518000, China
| | - Mingzhi Liao
- Center of Bioinformatics, College of Life Sciences, Northwest A&F University, 3 Taicheng Road, Yangling, Shaanxi 712100, China
| | - Yuzhong Peng
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Nanning Normal University, No. 175, Mingxiu East Road, Xixiang Tang District, Nanning 530001, China
- Guangxi Academy of Sciences, 174 East University Road, Nanning 530007, China
| |
Collapse
|
9
|
Song L, Zhu H, Wang K, Li M. LGGA-MPP: Local Geometry-Guided Graph Attention for Molecular Property Prediction. J Chem Inf Model 2024; 64:3105-3113. [PMID: 38516950 DOI: 10.1021/acs.jcim.3c02058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2024]
Abstract
Molecular property prediction is a fundamental task of drug discovery. With the rapid development of deep learning, computational approaches for predicting molecular properties are experiencing increasing popularity. However, these existing methods often ignore the 3D information on molecules, which is critical in molecular representation learning. In the past few years, several self-supervised learning (SSL) approaches have been proposed to exploit the geometric information by using pre-training on 3D molecular graphs and fine-tuning on 2D molecular graphs. Most of these approaches are based on the global geometry of molecules, and there is still a challenge in capturing the local structure and local interpretability. To this end, we propose local geometry-guided graph attention (LGGA), which integrates local geometry into the attention mechanism and message-passing of graph neural networks (GNNs). LGGA introduces a novel method to model molecules, enhancing the model's ability to capture intricate local structural details. Experiments on various data sets demonstrate that the integration of local geometry has a significant impact on the improved results, and our model outperforms the state-of-the-art methods for molecular property prediction, establishing its potential as a promising tool in drug discovery and related fields.
Collapse
Affiliation(s)
- Lei Song
- School of Software, XinJiang University, Urumqi 830091, China
| | - Huimin Zhu
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Kaili Wang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
10
|
Kengkanna A, Ohue M. Enhancing property and activity prediction and interpretation using multiple molecular graph representations with MMGX. Commun Chem 2024; 7:74. [PMID: 38580841 PMCID: PMC10997661 DOI: 10.1038/s42004-024-01155-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 03/18/2024] [Indexed: 04/07/2024] Open
Abstract
Graph Neural Networks (GNNs) excel in compound property and activity prediction, but the choice of molecular graph representations significantly influences model learning and interpretation. While atom-level molecular graphs resemble natural topology, they overlook key substructures or functional groups and their interpretation partially aligns with chemical intuition. Recent research suggests alternative representations using reduced molecular graphs to integrate higher-level chemical information and leverages both representations for model. However, there is a lack of studies about applicability and impact of different molecular graphs on model learning and interpretation. Here, we introduce MMGX (Multiple Molecular Graph eXplainable discovery), investigating the effects of multiple molecular graphs, including Atom, Pharmacophore, JunctionTree, and FunctionalGroup, on model learning and interpretation with various perspectives. Our findings indicate that multiple graphs relatively improve model performance, but in varying degrees depending on datasets. Interpretation from multiple graphs in different views provides more comprehensive features and potential substructures consistent with background knowledge. These results help to understand model decisions and offer valuable insights for subsequent tasks. The concept of multiple molecular graph representations and diverse interpretation perspectives has broad applicability across tasks, architectures, and explanation techniques, enhancing model learning and interpretation for relevant applications in drug discovery.
Collapse
Affiliation(s)
- Apakorn Kengkanna
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Kanagawa, 226-8501, Japan
| | - Masahito Ohue
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Kanagawa, 226-8501, Japan.
| |
Collapse
|
11
|
Zhou L, Peng X, Zeng L, Peng L. Finding potential lncRNA-disease associations using a boosting-based ensemble learning model. Front Genet 2024; 15:1356205. [PMID: 38495672 PMCID: PMC10940470 DOI: 10.3389/fgene.2024.1356205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 02/01/2024] [Indexed: 03/19/2024] Open
Abstract
Introduction: Long non-coding RNAs (lncRNAs) have been in the clinical use as potential prognostic biomarkers of various types of cancer. Identifying associations between lncRNAs and diseases helps capture the potential biomarkers and design efficient therapeutic options for diseases. Wet experiments for identifying these associations are costly and laborious. Methods: We developed LDA-SABC, a novel boosting-based framework for lncRNA-disease association (LDA) prediction. LDA-SABC extracts LDA features based on singular value decomposition (SVD) and classifies lncRNA-disease pairs (LDPs) by incorporating LightGBM and AdaBoost into the convolutional neural network. Results: The LDA-SABC performance was evaluated under five-fold cross validations (CVs) on lncRNAs, diseases, and LDPs. It obviously outperformed four other classical LDA inference methods (SDLDA, LDNFSGB, LDASR, and IPCAF) through precision, recall, accuracy, F1 score, AUC, and AUPR. Based on the accurate LDA prediction performance of LDA-SABC, we used it to find potential lncRNA biomarkers for lung cancer. The results elucidated that 7SK and HULC could have a relationship with non-small-cell lung cancer (NSCLC) and lung adenocarcinoma (LUAD), respectively. Conclusion: We hope that our proposed LDA-SABC method can help improve the LDA identification.
Collapse
Affiliation(s)
- Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, Hunan, China
| | - Xinhuai Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, Hunan, China
| | - Lijun Zeng
- School of Computer Science, Hunan Institute of Technology, Hengyang, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, Hunan, China
| |
Collapse
|
12
|
Wang J, Zhang L, Sun J, Yang X, Wu W, Chen W, Zhao Q. Predicting drug-induced liver injury using graph attention mechanism and molecular fingerprints. Methods 2024; 221:18-26. [PMID: 38040204 DOI: 10.1016/j.ymeth.2023.11.014] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 11/14/2023] [Accepted: 11/25/2023] [Indexed: 12/03/2023] Open
Abstract
Drug-induced liver injury (DILI) is a significant issue in drug development and clinical treatment due to its potential to cause liver dysfunction or damage, which, in severe cases, can lead to liver failure or even fatality. DILI has numerous pathogenic factors, many of which remain incompletely understood. Consequently, it is imperative to devise methodologies and tools for anticipatory assessment of DILI risk in the initial phases of drug development. In this study, we present DMFPGA, a novel deep learning predictive model designed to predict DILI. To provide a comprehensive description of molecular properties, we employ a multi-head graph attention mechanism to extract features from the molecular graphs, representing characteristics at the level of compound nodes. Additionally, we combine multiple fingerprints of molecules to capture features at the molecular level of compounds. The fusion of molecular fingerprints and graph features can more fully express the properties of compounds. Subsequently, we employ a fully connected neural network to classify compounds as either DILI-positive or DILI-negative. To rigorously evaluate DMFPGA's performance, we conduct a 5-fold cross-validation experiment. The obtained results demonstrate the superiority of our method over four existing state-of-the-art computational approaches, exhibiting an average AUC of 0.935 and an average ACC of 0.934. We believe that DMFPGA is helpful for early-stage DILI prediction and assessment in drug development.
Collapse
Affiliation(s)
- Jifeng Wang
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China
| | - Li Zhang
- School of Life Science, Liaoning University, Shenyang 110036, China
| | - Jianqiang Sun
- School of Information Science and Engineering, Linyi University, Linyi 276000, China
| | - Xin Yang
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China
| | - Wei Wu
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China
| | - Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China.
| |
Collapse
|