1
|
Dong K, Lin X, Zhang Y. Molecular property prediction based on graph contrastive learning with partial feature masking. J Mol Graph Model 2025; 138:109014. [PMID: 40120380 DOI: 10.1016/j.jmgm.2025.109014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 01/24/2025] [Accepted: 03/10/2025] [Indexed: 03/25/2025]
Abstract
Molecular representation learning facilitates multiple downstream tasks such as molecular property prediction (MPP) and drug design. Recent studies have shown great promise in applying self-supervised learning (SSL) to cope with the data scarcity in MPP. Contrastive learning (CL) is a typical SSL method used to learn prior knowledge so that the trained model has better generalization performance on various downstream tasks. One important issue of CL is how to generate enhanced samples that preserve the molecular core semantics for each training sample, which may significantly impact the earnings of the CL strategy. To address this issue, we propose the partial Feature Masking-based molecular Graph Contrastive Learning model (FMGCL). FMGCL constructs the masked molecular graph by masking partial features of each atom and bond in the featured molecular graph. Since the masking molecular graphs preserve the chemical structure of the molecules, they do not violate the chemical semantics of molecules, which is beneficial for capturing valuable prior knowledge of molecules during pre-training. Then, FMGCL fine-tunes the well-trained encoder on the featured molecular graph for downstream tasks. Moreover, we propose using the relative distance between samples within a batch to enhance the performance in regression tasks. Experiments on the 12 benchmark datasets from MoleculeNet and ChEMBL showed the superiority of FMGCL.
Collapse
Affiliation(s)
- Kunjie Dong
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China.
| | - Xiaohui Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China.
| | - Yanhui Zhang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China.
| |
Collapse
|
2
|
Xiu J, Yang H, Shen X, Xing Y, Li W, Han W. Exploring Hidden Dangers: Predicting Mycotoxin-like Toxicity and Mapping Toxicological Networks in Hepatocellular Carcinoma. J Chem Inf Model 2025. [PMID: 40393043 DOI: 10.1021/acs.jcim.5c00171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2025]
Abstract
Mycotoxins are potent triggers of hepatocellular carcinoma (HCC) due to their intricate interplay with cellular macromolecules and signaling pathways. This study integrates machine learning and biomolecular analyses to elucidate the mechanisms underlying mycotoxin-induced hepatocarcinogenesis. Using a data set of 1767 mycotoxins and 1706 non-mycotoxin fungal metabolites, we evaluated 51 machine learning models. The KPGT model achieved optimal performance with an ROC-AUC of 0.979 and balanced accuracy of 0.930. Clustering analysis identified six distinct mycotoxin clusters with unique structural features. Network toxicology analysis revealed distinct protein-protein interaction patterns across different mycotoxin clusters, identifying key regulatory proteins including EGFR, SRC, and ESR1. GO enrichment analysis uncovered cluster-specific effects on protein complexes and macromolecular assemblies, particularly in membrane organization and vesicular transport. KEGG pathway analysis demonstrated systematic perturbation of major signaling cascades, with each mycotoxin cluster distinctly modulating protein kinase networks and receptor tyrosine kinase pathways. Molecular docking analyses validated these interactions, with binding affinities ranging from -9.6 to -4.7 kcal/mol. Notably, cluster 5 showed strong binding to SRC (-9.6 kcal/mol), EGFR (-9.5 kcal/mol), and ESR1 (-7.8 kcal/mol), providing structural insights into toxin-macromolecule recognition. These findings enhance our understanding of mycotoxin-protein interactions in HCC development and suggest potential therapeutic strategies targeting these macromolecular interfaces.
Collapse
Affiliation(s)
- Jian Xiu
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Sciences, Jilin University, Changchun 130012, China
| | - Hengzheng Yang
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Sciences, Jilin University, Changchun 130012, China
| | - Xiaoli Shen
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Sciences, Jilin University, Changchun 130012, China
| | - Yuenan Xing
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Sciences, Jilin University, Changchun 130012, China
| | - Wannan Li
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Sciences, Jilin University, Changchun 130012, China
| | - Weiwei Han
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Sciences, Jilin University, Changchun 130012, China
| |
Collapse
|
3
|
Yuan N, Guan D, Li S, Zhang L, Zhu Q. Enhancing Neurodegenerative Disease Diagnosis Through Confidence-Driven Dynamic Spatio-Temporal Convolutional Network. IEEE Trans Neural Syst Rehabil Eng 2025; 33:1715-1728. [PMID: 40293889 DOI: 10.1109/tnsre.2025.3564983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/30/2025]
Abstract
Dynamic brain networks are more effective than static networks in characterizing the evolving patterns of brain functional connectivity, making them a more promising tool for diagnosing neurodegenerative diseases. However, existing classification methods for dynamic brain networks often rely on sliding windows to extract multi-window features, leading to suboptimal performance due to the spatio-temporal coupling on these windows and limited ability to effectively integrate complex topological features. To address these limitations, we propose a novel method called Confidence-Driven Dynamic Spatio-Temporal Convolutional Network (CD-DSTCN). First, our proposed method employs a spatio-temporal convolutional network integrated with a temporal attention mechanism to extract spatio-temporal features within each window. By propagating information across temporal windows during spatial convolution, the method effectively captures and integrates complex temporal and spatial dependencies. Second, each window generates an output probability, which quantifies prediction confidence based on the true class probability (TCP). This confidence score serves as a weight to assess the relative importance of different time windows. Finally, the confidence-weighted fused features are passed through a multilayer perceptron (MLP) for final classification. Extensive experiments on Alzheimer's and Parkinson's datasets show that the proposed method outperforms the state-of-the-art algorithms and can provide valuable biomarkers for brain disease diagnosis. Our code is publicly available at: https://github.com/YNingCode/CD-DSTCN.
Collapse
|
4
|
Zhai J, Qi X, Cai L, Liu Y, Tang H, Xie L, Wang J. NNKcat: deep neural network to predict catalytic constants (Kcat) by integrating protein sequence and substrate structure with enhanced data imbalance handling. Brief Bioinform 2025; 26:bbaf212. [PMID: 40370097 PMCID: PMC12078937 DOI: 10.1093/bib/bbaf212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2025] [Revised: 04/14/2025] [Accepted: 04/21/2025] [Indexed: 05/16/2025] Open
Abstract
Catalytic constant (Kcat) is to describe the efficiency of catalyzing reactions. The Kcat value of an enzyme-substrate pair indicates the rate an enzyme converts saturated substrates into product during the catalytic process. However, it is challenging to construct robust prediction models for this important property. Most of the existing models, including the one recently published by Nature Catalysis (Li et al.), are suffering from the overfitting issue. In this study, we proposed a novel protocol to construct Kcat prediction models, introducing an intermedia step to separately develop substrate and protein processors. The substrate processor leverages analyzing Simplified Molecular Input Line Entry System (SMILES) strings using a graph neural network model, attentive FP, while the protein processor abstracts protein sequence information utilizing long short-term memory architecture. This protocol not only mitigates the impact of data imbalance in the original dataset but also provides greater flexibility in customizing the general-purpose Kcat prediction model to enhance the prediction accuracy for specific enzyme classes. Our general-purpose Kcat prediction model demonstrates significantly enhanced stability and slightly better accuracy (R2 value of 0.54 versus 0.50) in comparison with Li et al.'s model using the same dataset. Additionally, our modeling protocol enables personalization of fine-tuning the general-purpose Kcat model for specific enzyme categories through focused learning. Using Cytochrome P450 (CYP450) enzymes as a case study, we achieved the best R2 value of 0.64 for the focused model. The high-quality performance and expandability of the model guarantee its broad applications in enzyme engineering and drug research & development.
Collapse
Affiliation(s)
- Jingchen Zhai
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA 15261, United States
| | - Xiguang Qi
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA 15261, United States
| | - Lianjin Cai
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA 15261, United States
| | - Yue Liu
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA 15261, United States
| | - Haocheng Tang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA 15261, United States
| | - Lei Xie
- Department of Computer Science, Hunter College, The City University of New York, 695 Park Ave, New York, NY 10065, United States
- Helen & Robert Appel Alzheimer's Disease Research Institute, Feil Family Brain & Mind Research Institute, Weill Cornell Medicine, Cornell University, 413 E 69th St, New York, NY 10021, United States
| | - Junmei Wang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA 15261, United States
| |
Collapse
|
5
|
Pala MA. Graph-Aware AURALSTM: An Attentive Unified Representation Architecture with BiLSTM for Enhanced Molecular Property Prediction. Mol Divers 2025:10.1007/s11030-025-11197-4. [PMID: 40279083 DOI: 10.1007/s11030-025-11197-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2025] [Accepted: 04/12/2025] [Indexed: 04/26/2025]
Abstract
Predicting molecular properties with high accuracy is essential across scientific fields, from drug discovery and biotechnology to materials science and environmental research. In biomedical sciences, accurate molecular property prediction is crucial for elucidating disease mechanisms, identifying potential drug candidates, and optimising various processes. However, existing approaches, often based on low-dimensional representations, fail to capture the intricate spatial and structural complexities of molecular data. This study introduces a novel hybrid deep learning model, the Graph-Aware AURA-LSTM (Attentive Unified Representation Architecture-Long Short-Term Memory), designed to determine molecular properties with unprecedented accuracy using advanced graphical representations. AURA-LSTM combines multiple Graph Neural Network (GNN) architectures, specifically Graph Convolutional Networks (GCNs), Graph Attention Networks (GATs), and Graph Isomorphism Networks (GINs), in a parallel structure to comprehensively capture the multidimensional structural features of molecules. Within this architecture, GCNs incorporate local structural relationships, GATs apply attention mechanisms to highlight critical structural elements, and GINs capture intricate molecular details through isomorphic distinction, resulting in a richly detailed feature matrix. The feature layer then processes this BiLSTM matrix, which evaluates temporal relationships to enhance molecular feature classification. Evaluated on eight benchmark datasets, AURA-LSTM demonstrated superior performance, consistently achieving over 90% accuracy and outperforming state-of-the-art methods. These results position AURA-LSTM as a robust tool for molecular feature classification, uniquely capable of integrating temporally aware insights from distinct GNN architectures.
Collapse
Affiliation(s)
- Muhammed Ali Pala
- Department of Electrical and Electronics Engineering, Faculty of Technology, Sakarya University of Applied Sciences, 54050, Sakarya, Turkey.
- Biomedical Technologies Application and Research Center (BIYOTAM), Sakarya University of Applied Sciences, Sakarya, Turkey.
| |
Collapse
|
6
|
Jing Y, Zhao G, Xu Y, McGuire T, Hou G, Zhao J, Chen M, Lopez O, Xue Y, Xie XQ. GCN-BBB: Deep Learning Blood-Brain Barrier (BBB) Permeability PharmacoAnalytics with Graph Convolutional Neural (GCN) Network. AAPS J 2025; 27:73. [PMID: 40180695 DOI: 10.1208/s12248-025-01059-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Accepted: 03/19/2025] [Indexed: 04/05/2025] Open
Abstract
The Blood-Brain Barrier (BBB) is a selective barrier between the Central Nervous System (CNS) and the peripheral system, regulating the distribution of molecules. BBB permeability has been crucial in CNS-targeting drug development, such as glioblastoma-related drug discovery. In addition, more CNS diseases still present significant challenges, for instance, neurological disorders like Alzheimer's Disease (AD) and drug abuse. Conversely, cannabinoid drugs that do not cross the BBB are needed to avoid off-target CNS psychotropic effects. In vitro and in vivo experiments measuring BBB permeability are costly and low throughput. Computational pharmacoanalytics modeling, particularly using deep-learning Graph Neural Networks (GNNs), offers a promising alternative. GNNs excel at capturing intricate relationships in graph-based information, such as small molecular structures. In this study, we developed GNNs model for BBB permeability using the graph representation of drugs. The GNNs were compared with other algorithms using molecular fingerprints or physical-chemical descriptors. With a dataset of 1924 molecules, the best GNNs model, a convolutional graph neural network using a normalized Laplacian matrix (GCN_2), achieved a precision of 0.94, recall of 0.96, F1 score of 0.95, and MCC score of 0.77. This outperformed other machine learning algorithms with molecular fingerprints. The findings indicate that the graphic representation of small molecules combined with GNNs architecture is powerful in predicting BBB permeability with high accuracy and recall. The developed GNNs model can be utilized in the initial screening stage for new drug development.
Collapse
Affiliation(s)
- Yankang Jing
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, Pharmacometrics & System Pharmacology (PSP) Pharmacoanalytics, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA, 15261, United States of America
- National Center of Excellence for Computational Drug Abuse Research University of Pittsburgh, Pittsburgh, PA, 15261, United States of America
| | - Guangyi Zhao
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, Pharmacometrics & System Pharmacology (PSP) Pharmacoanalytics, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA, 15261, United States of America
- National Center of Excellence for Computational Drug Abuse Research University of Pittsburgh, Pittsburgh, PA, 15261, United States of America
| | - Yuanyuan Xu
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, Pharmacometrics & System Pharmacology (PSP) Pharmacoanalytics, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA, 15261, United States of America
- National Center of Excellence for Computational Drug Abuse Research University of Pittsburgh, Pittsburgh, PA, 15261, United States of America
| | - Terence McGuire
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, Pharmacometrics & System Pharmacology (PSP) Pharmacoanalytics, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA, 15261, United States of America
- National Center of Excellence for Computational Drug Abuse Research University of Pittsburgh, Pittsburgh, PA, 15261, United States of America
| | - Ganqian Hou
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, Pharmacometrics & System Pharmacology (PSP) Pharmacoanalytics, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA, 15261, United States of America
- National Center of Excellence for Computational Drug Abuse Research University of Pittsburgh, Pittsburgh, PA, 15261, United States of America
| | - Jack Zhao
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, Pharmacometrics & System Pharmacology (PSP) Pharmacoanalytics, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA, 15261, United States of America
- National Center of Excellence for Computational Drug Abuse Research University of Pittsburgh, Pittsburgh, PA, 15261, United States of America
| | - Maozi Chen
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, Pharmacometrics & System Pharmacology (PSP) Pharmacoanalytics, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA, 15261, United States of America
- National Center of Excellence for Computational Drug Abuse Research University of Pittsburgh, Pittsburgh, PA, 15261, United States of America
| | - Oscar Lopez
- Department of Neurology, Psychiatry and Clinical & Translational Sciences, Alzheimer'S Disease Research Center, University of Pittsburgh, Pittsburgh, 15260, United States of America.
| | - Ying Xue
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, Pharmacometrics & System Pharmacology (PSP) Pharmacoanalytics, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA, 15261, United States of America.
- National Center of Excellence for Computational Drug Abuse Research University of Pittsburgh, Pittsburgh, PA, 15261, United States of America.
- Department of Pharmacy and Therapeutics, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA, 15261, United States of America.
| | - Xiang-Qun Xie
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, Pharmacometrics & System Pharmacology (PSP) Pharmacoanalytics, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA, 15261, United States of America.
- National Center of Excellence for Computational Drug Abuse Research University of Pittsburgh, Pittsburgh, PA, 15261, United States of America.
- Drug Discovery Institute, University of Pittsburgh, Pittsburgh, 15261, United States of America.
- Department of Computational Biology and Department of Structural Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15261, United States of America.
| |
Collapse
|
7
|
Ci L, Li B, Xu J, Peng S, Jiang L, Long W. MulAFNet: Integrating Multiple Molecular Representations for Enhanced Property Prediction. ACS OMEGA 2025; 10:12043-12053. [PMID: 40191315 PMCID: PMC11966294 DOI: 10.1021/acsomega.4c09884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 02/12/2025] [Accepted: 02/28/2025] [Indexed: 04/09/2025]
Abstract
In computer-aided drug design, molecular representation plays a crucial role. Most existing multimodal approaches primarily perform simple concatenation of various feature representations, without adequately emphasizing effective integration among these features. To address this issue, this study proposes a network framework that integrates multimodal representations using a multihead attention flow (MulAFNet). MulAFNet utilizes SMILES string representation and two levels of molecular graph representations: atom-level and functional group-level graph structure. Pretraining tasks are established for each of these three representations, which are then fused in downstream tasks to predict molecular properties. The experiments were conducted on six classification data sets and three regression data sets, demonstrating that the use of multiple molecular representations as input has a significant impact on the results. In particular, the excellent performance of our fusion method in molecular property prediction outperforms other state-of-the-art methods, proving its superiority. Additionally, comparative experiments on fusion methods and ablation studies, further validate the effectiveness of MulAFNet. The results demonstrate that multiple molecular feature representations provide a more comprehensive molecular understanding, and appropriate pretraining tasks enhance molecular property prediction.
Collapse
Affiliation(s)
- Lei Ci
- School
of Information Engineering, Huzhou University, Huzhou 313000, China
| | - Beilei Li
- Huzhou
Fengshengwan Aquatic Products Co., Ltd, Huzhou 313000, China
| | - Jiahao Xu
- School
of Information Engineering, Huzhou University, Huzhou 313000, China
| | - Sihua Peng
- College
of Public Health, University of Georgia, Athens, Georgia 30602, United States
| | - Linhua Jiang
- School
of Information Engineering, Huzhou University, Huzhou 313000, China
| | - Wei Long
- School
of Information Engineering, Huzhou University, Huzhou 313000, China
| |
Collapse
|
8
|
Song W, Peng R, Yu H, Zhan M, Liu G, Li W, Ren G, Zhu B, Tang Y. Cocry-pred: A Dynamic Resource Propagation Method for Cocrystal Prediction. J Chem Inf Model 2025; 65:2868-2881. [PMID: 40070082 DOI: 10.1021/acs.jcim.5c00179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2025]
Abstract
Drug cocrystallization is a powerful strategy to enhance drug properties by modifying their physicochemical characteristics without altering their chemical structure. However, the identification of suitable coformers remains a challenging and resource-intensive task. To streamline this process, we developed a novel cocrystal prediction model, Cocry-pred, which utilizes the Network-Based Inference (NBI) algorithm─a dynamic resource propagation method─to recommend coformers for target molecules based on topological data from cocrystal network and molecular substructure information. We evaluated the impact of 13 types of molecular fingerprints and different numbers of propagation rounds on model performance. Additionally, to achieve optimal performance, we introduced three key hyperparameters─α (node weights), β (edge weights) and γ (penalty for high-degree nodes)─to balance the influence of various factors within the composite network. The best performance of Cocry-pred achieved an impressive AUC of 0.885 and an RS of 0.108. To validate the reliability of the model, we employed it to predict potential coformers for Apatinib. Subsequently, seven Apatinib cocrystals were then synthesized experimentally, among which single-crystal structures were obtained for two cocrystals. This advancement highlights the potential of Cocry-pred as a powerful tool, offering significant improvements in efficiency and providing valuable insights for cocrystal screening and design.
Collapse
Affiliation(s)
- Wenxiang Song
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Ren Peng
- State Key Laboratory of Bioreactor Engineering, Engineering Research Centre of Pharmaceutical Process Chemistry, Ministry of Education, Laboratory of Pharmaceutical Crystal Engineering & Technology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Hongbo Yu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Meiling Zhan
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Guobin Ren
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
- State Key Laboratory of Bioreactor Engineering, Engineering Research Centre of Pharmaceutical Process Chemistry, Ministry of Education, Laboratory of Pharmaceutical Crystal Engineering & Technology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Bin Zhu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
- State Key Laboratory of Bioreactor Engineering, Engineering Research Centre of Pharmaceutical Process Chemistry, Ministry of Education, Laboratory of Pharmaceutical Crystal Engineering & Technology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
9
|
Pang X, Lu M, Yang Y, Cao H, Sun Y, Zhou Z, Wang L, Liang Y. Screening of estrogen receptor activity of per- and polyfluoroalkyl substances based on deep learning and in vivo assessment. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2025; 369:125843. [PMID: 39947576 DOI: 10.1016/j.envpol.2025.125843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2024] [Revised: 01/17/2025] [Accepted: 02/10/2025] [Indexed: 02/18/2025]
Abstract
Over the past decades, exposure to per- and polyfluoroalkyl substances (PFAS), a group of synthetic chemicals notorious for their environmental persistence, has been shown to pose increased health risks. Despite that some PFAS were reported to have endocrine-disrupting toxicity in previous studies, accurate prediction models based on deep learning and the underlying structural characteristics related to the effect of molecular fluorination remain limited. To address these issues, we proposed a stacking deep learning architecture, GXDNet, that integrates molecular descriptors and molecular graphs to predict the estrogen receptor α (ERα) activities of compounds, enhancing the generalization ability compared to previous models. Subsequently, we screened the ERα activity of 10,067 PFAS molecules using the GXDNet model and identified potential ERα binders. The representative PFAS molecules with the top docking scores showed that the introduction of fluorinated alkane chains significantly increased the binding affinities of parent molecules with ERα, suggesting that the combination of phenol structural fragments and fluorinated alkane chains has a synergistic effect in improving the binding capacity of the ligands to ERα. The binding modes, SHapley Additive Explanations analysis, and attention map emphasized the importance of π-π stacking and hydrogen bonding interactions with the phenol group, while the fluorinated alkane chain enhanced the interaction with the hydrophobic amino acids of the active pocket. Experimental validation using zebrafish models further confirmed the ERα activity of the representative PFAS molecules. Overall, the current computational workflow is beneficial for the toxicological screening of emerging PFAS and accelerating the development of eco-friendly PFAS molecules, thereby mitigating the environmental and health risks associated with PFAS exposure.
Collapse
Affiliation(s)
- Xudi Pang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan, 430056, China
| | - Miao Lu
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan, 430056, China
| | - Ying Yang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan, 430056, China
| | - Huiming Cao
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan, 430056, China.
| | - Yuzhen Sun
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan, 430056, China
| | - Zhen Zhou
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan, 430056, China
| | - Ling Wang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan, 430056, China.
| | - Yong Liang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan, 430056, China
| |
Collapse
|
10
|
Zhang Y, Huang J, Li X, Sun W, Zhang N, Zhang J, Chen T, Wang L. Self-awareness of retrosynthesis via chemically inspired contrastive learning for reinforced molecule generation. Brief Bioinform 2025; 26:bbaf185. [PMID: 40254835 PMCID: PMC12009711 DOI: 10.1093/bib/bbaf185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2024] [Revised: 03/19/2025] [Accepted: 03/30/2025] [Indexed: 04/22/2025] Open
Abstract
The recent progress of deep generative models in modeling complex real-world data distributions has enabled the generation of novel compounds with potential therapeutic applications for various diseases. However, most studies fail to optimize the properties of generated molecules from the perspective of the intrinsic nature of chemical reactions. In this work, we propose a novel molecule generation model to overcome the limitation by deep reinforcement learning, in which an agent learns to optimize the properties of molecules initialized with a chemically inspired contrastive pretrained model. We finally assess the generation model by evaluating its ability to generate inhibitors against two prominent therapeutic targets in cancer treatment. Experimental results show that our model could generate 100% valid and novel structures and also exhibits superior performance in generating molecules with fewer structural alerts against several baselines. More importantly, the molecules generated by our proposed model show potent biological activities against ataxia telangiectasia and Rad3-related (ATR) and cyclin-dependent kinase 9 (CDK9) targets in wet-lab experiments.
Collapse
Affiliation(s)
- Yi Zhang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, No. 382 Waihuan East Road, Higher Education Mega Center, Guangzhou 510006, China
| | - Jindi Huang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, No. 382 Waihuan East Road, Higher Education Mega Center, Guangzhou 510006, China
| | - Xinze Li
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, No. 382 Waihuan East Road, Higher Education Mega Center, Guangzhou 510006, China
| | - Wenqi Sun
- Guizhou Provincial Engineering Technology Research Center for Chemical Drug R&D, College of Pharmacy, Guizhou Medical University, No. 6 Ankang Avenue, Guian New District, Guiyang 561113, China
| | - Nana Zhang
- Guizhou Provincial Engineering Technology Research Center for Chemical Drug R&D, College of Pharmacy, Guizhou Medical University, No. 6 Ankang Avenue, Guian New District, Guiyang 561113, China
| | - Jiquan Zhang
- Guizhou Provincial Engineering Technology Research Center for Chemical Drug R&D, College of Pharmacy, Guizhou Medical University, No. 6 Ankang Avenue, Guian New District, Guiyang 561113, China
| | - Tiegen Chen
- Zhongshan Institute for Drug Discovery, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Zhongshan Life Science Park, No. 10 Heqing Road, Tsui Hang New District, Zhongshan 528400, China
| | - Ling Wang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, No. 382 Waihuan East Road, Higher Education Mega Center, Guangzhou 510006, China
| |
Collapse
|
11
|
Park S, Lee S, Pak M, Kim S. Dual Representation Learning for Predicting Drug-Side Effect Frequency Using Protein Target Information. IEEE J Biomed Health Inform 2025; 29:1817-1827. [PMID: 38241108 DOI: 10.1109/jbhi.2024.3350083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2024]
Abstract
Knowledge of unintended effects of drugs is critical in assessing the risk of treatment and in drug repurposing. Although numerous existing studies predict drug-side effect presence, only four of them predict the frequency of the side effects. Unfortunately, current prediction methods 1) do not utilize drug targets, 2) do not predict well for unseen drugs, and 3) do not use multiple heterogeneous drug features. We propose a novel deep learning-based drug-side effect frequency prediction model. Our model utilized heterogeneous features such as target protein information as well as molecular graph, fingerprints, and chemical similarity to create drug embeddings simultaneously. Furthermore, the model represents drugs and side effects into a common vector space, learning the dual representation vectors of drugs and side effects, respectively. We also extended the predictive power of our model to compensate for the drugs without clear target proteins using the Adaboost method. We achieved state-of-the-art performance over the existing methods in predicting side effect frequencies, especially for unseen drugs. Ablation studies show that our model effectively combines and utilizes heterogeneous features of drugs. Moreover, we observed that, when the target information given, drugs with explicit targets resulted in better prediction than the drugs without explicit targets.
Collapse
|
12
|
Cai L, He Y, Fu X, Zhuo L, Zou Q, Yao X. AEGNN-M:A 3D Graph-Spatial Co-Representation Model for Molecular Property Prediction. IEEE J Biomed Health Inform 2025; 29:1726-1734. [PMID: 38386576 DOI: 10.1109/jbhi.2024.3368608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2024]
Abstract
Improving the drug development process can expedite the introduction of more novel drugs that cater to the demands of precision medicine. Accurately predicting molecular properties remains a fundamental challenge in drug discovery and development. Currently, a plethora of computer-aided drug discovery (CADD) methods have been widely employed in the field of molecular prediction. However, most of these methods primarily analyze molecules using low-dimensional representations such as SMILES notations, molecular fingerprints, and molecular graph-based descriptors. Only a few approaches have focused on incorporating and utilizing high-dimensional spatial structural representations of molecules. In light of the advancements in artificial intelligence, we introduce a 3D graph-spatial co-representation model called AEGNN-M, which combines two graph neural networks, GAT and EGNN. AEGNN-M enables learning of information from both molecular graphs representations and 3D spatial structural representations to predict molecular properties accurately. We conducted experiments on seven public datasets, three regression datasets and 14 breast cancer cell line phenotype screening datasets, comparing the performance of AEGNN-M with state-of-the-art deep learning methods. Extensive experimental results demonstrate the satisfactory performance of the AEGNN-M model. Furthermore, we analyzed the performance impact of different modules within AEGNN-M and the influence of spatial structural representations on the model's performance. The interpretability analysis also revealed the significance of specific atoms in determining particular molecular properties.
Collapse
|
13
|
Monsia R, Bhattacharyya S. Efficient and Explainable Virtual Screening of Molecules through Fingerprint-Generating Networks Integrated with Artificial Neural Networks. ACS OMEGA 2025; 10:4896-4911. [PMID: 39959102 PMCID: PMC11822703 DOI: 10.1021/acsomega.4c10289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/12/2024] [Revised: 01/07/2025] [Accepted: 01/13/2025] [Indexed: 02/18/2025]
Abstract
A machine learning-based drug screening technique has been developed and optimized using a novel, stitched neural network architecture with trainable, graph convolution-based fingerprints as a base into an artificial neural network. The architecture is efficient, explainable, and performant as a tool for the binary classification of ligands based on a user-chosen docking score threshold. Assessment using two standardized virtual screening databases substantiated the architecture's ability to learn molecular features and substructures and predict ligand classes based on binding affinity values more effectively than similar contemporary counterparts. Furthermore, to highlight the architecture's utility to groups and laboratories with varying resources, experiments were carried out using randomly sampled small molecules from the ZINC database and their computational docking scores against six drug-design relevant proteins. This new architecture proved to be more efficient in screening molecules that less favorably bind to a specific target thereby retaining top-hit molecules. Compared to similar protocols developed using Morgan fingerprints, the neural fingerprint-based model shows superiority in retaining the best ligands while filtering molecules at a higher relative rate. Lastly, the explainability of the model was investigated; it was revealed that the model accurately emphasized important chemical substructures and atoms through the intermediate fingerprint, which, in turn, contributed heavily to the ultimate prediction of a ligand as binding tightly to a certain protein.
Collapse
Affiliation(s)
| | - Sudeep Bhattacharyya
- Department of Chemistry and
Biochemistry, University of Wisconsin—Eau
Claire, Eau Claire, Wisconsin 54701, United States
| |
Collapse
|
14
|
Zhang Z, Gao R, Zhao M, Zhang X, Gao H, Qi Y, Wang R, Li Y. Computational Methods for Predicting Chemical Reactivity of Covalent Compounds. J Chem Inf Model 2025; 65:1140-1154. [PMID: 39823568 DOI: 10.1021/acs.jcim.4c01591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2025]
Abstract
In recent decades, covalent inhibitors have emerged as a promising strategy for therapeutic development, leveraging their unique mechanism of forming covalent bonds with target proteins. This approach offers advantages such as prolonged drug efficacy, precise targeting, and the potential to overcome resistance. However, the inherent reactivity of covalent compounds presents significant challenges, leading to off-target effects and toxicities. Accurately predicting and modulating this reactivity have become a critical focus in the field. In this work, we compiled a data set of 419 cysteine-targeted covalent compounds and their reactivity through an extensive literature review. Employing machine learning, deep learning, and quantum mechanical calculations, we evaluated the intrinsic reactivity of the covalent compounds. Our FP-Stack models demonstrated robust Pearson and Spearman correlations of approximately 0.80 and 0.75 on the test set, respectively. This empowers rapid and accurate reactivity predictions, significantly reducing computational costs and streamlining structural handling and experimental procedures. Experimental validation on acrylamide compounds underscored the predictive efficacy of our model. This study presents an efficient computational tool for the reactivity prediction of covalent compounds and is expected to offer valuable insights for guiding covalent drug discovery and development.
Collapse
Affiliation(s)
- Zhe Zhang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Ruyu Gao
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Meiling Zhao
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Xiangying Zhang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Haotian Gao
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Yifei Qi
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Renxiao Wang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Yan Li
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| |
Collapse
|
15
|
Mei S. Single-task regression naturally adapts to multi-species (eco)toxicological modelling: a case study on animals. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2025; 32:4910-4925. [PMID: 39891811 DOI: 10.1007/s11356-025-36025-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2024] [Accepted: 01/24/2025] [Indexed: 02/03/2025]
Abstract
In silico (eco)toxicological modelling has gained increasing popularity with chemical environmentalists in accelerating toxicity assessment of hazardous chemicals on environments, animal well-being and human health. Existing local and multi-task models commonly exhibit restricted extensibility in multi-species modelling scenarios. In this work, we propose a strategy of single-task regression to naturally adapt modelling to (eco)toxicological measurements on multiple species without requiring a certain number of common pesticides among tested species as multi-task regression does. This strategy treats all species equally in an integral model to facilitate data augmentation and inter-species transfer of common patterns of fragmental toxicities. We aggregate 37,305 measurements of 29,140 pesticides on 10 tested groups of animals to train four machine learning models including extreme gradient boosting (XGBoost), deep neural networks (DNN), random forest (RF) and support vector regression (SVR). Five-fold stratified cross-validation shows that the XGBoost outperforms the other three models with overall 0.67 R2, 0.44 RMSE and 0.29 MAE. As compared to local models focusing on one animal group, the proposed single-task regression model achieves a 0.08 ~ 0.49 R2 increase. XGBoost feature importance shows that Morgan bit 389 (five-atom fraction of the aromatic ring) exhibits top importance to single-task regression and single-animal regression. Lastly, taking the pesticide parathion and dimethoate as control baselines, we demonstrate the credibility of several case studies from the viewpoints of toxicity profile similarities and pesticide structural similarities.
Collapse
Affiliation(s)
- Suyu Mei
- Software College, Shenyang Normal University, Shenyang, 110034, China.
| |
Collapse
|
16
|
Pang X, He X, Yang Y, Wang L, Sun Y, Cao H, Liang Y. NeuTox 2.0: A hybrid deep learning architecture for screening potential neurotoxicity of chemicals based on multimodal feature fusion. ENVIRONMENT INTERNATIONAL 2025; 195:109244. [PMID: 39742830 DOI: 10.1016/j.envint.2024.109244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Revised: 12/09/2024] [Accepted: 12/25/2024] [Indexed: 01/04/2025]
Abstract
Chemically induced neurotoxicity is a critical aspect of chemical safety assessment. Traditional and costly experimental methods call for the development of high-throughput virtual screening. However, the small datasets of neurotoxicity have limited the application of advanced deep learning techniques. The current study developed a hybrid deep learning architecture, NeuTox 2.0, through multimodal feature fusion for enhanced prediction accuracy and generalization ability. We incorporated transfer learning based on self-supervised learning, graph neural networks, and molecular fingerprints/descriptors. Four datasets were used to profile neurotoxicity; these were related to blood-brain barrier permeability, neuronal cytotoxicity, microelectrode array-based neural activity, and mammalian neurotoxicity. Comprehensive performance evaluations demonstrated that NeuTox 2.0 has relatively higher predictive capability across all statistical metrics. Specifically, NeuTox 2.0 exhibits remarkable performance in three of the four datasets. In the BBB dataset, although it does not outperform the PaDEL descriptor model, its performance closely approximates that of the top single-modal model. The ablation experiments indicated that NeuTox 2.0 can learn the deeper structural differences of molecules from various feature extractions and capture complex interactions and mapping relationships between various modalities, thereby improving performance for neurotoxicity prediction. Evaluations of anti-noise ability indicated that NeuTox 2.0 has excellent noise resistance relative to traditional machine learning. We applied the NeuTox 2.0 model to predict the neurotoxicity of 315,790 compounds in the REACH database. The results showed that 701 compounds exhibited potential neurotoxicity in the four neurotoxicity-related predictions. In conclusion, NeuTox 2.0 can be used as an efficient tool for early neurotoxicity screening of environmental chemicals.
Collapse
Affiliation(s)
- Xudi Pang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Xuejun He
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Ying Yang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Ling Wang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Yuzhen Sun
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Huiming Cao
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China.
| | - Yong Liang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| |
Collapse
|
17
|
Peng J, Fu L, Yang G, Cao D. Advanced AI-Driven Prediction of Pregnancy-Related Adverse Drug Reactions. J Chem Inf Model 2024; 64:9286-9298. [PMID: 39611337 DOI: 10.1021/acs.jcim.4c01657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2024]
Abstract
Ensuring drug safety during pregnancy is critical due to the potential risks to both the mother and fetus. However, the exclusion of pregnant women from clinical trials complicates the assessment of adverse drug reactions (ADRs) in this population. This study aimed to develop and validate risk prediction models for pregnancy-related ADRs of drugs using advanced Machine Learning (ML) and Deep Learning (DL) techniques, leveraging real-world data from the FDA Adverse Event Reporting System. We explored three methods─Information Component, Reporting Odds Ratio, and 95% confidence interval of ROR─for classifying drugs into high-risk and low-risk categories. DL models, including Directed Message Passing Neural Networks (DMPNN), Graph Neural Networks, and Graph Convolutional Networks, were developed and compared to traditional ML models like Random Forest, Support Vector Machines, and XGBoost. Among these, the DMPNN model, which integrated molecular graph information and molecular descriptors, exhibited the highest predictive performance, particularly at the preferred term level. The model was validated against external data sets from SIDER and DailyMed, demonstrating strong generalizability. Additionally, the model was applied to assess the risk of 22 oral hypoglycemic drugs, and potential substructure alerts for pregnancy-related ADRs were identified. These findings suggest that the DMPNN model is a valuable tool for predicting ADRs in pregnant women, offering significant advancement in drug safety assessment and providing crucial insights for safer medication use during pregnancy.
Collapse
Affiliation(s)
- Jinfu Peng
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172 Tongzipo Road, Changsha 410031, Hunan, China
| | - Li Fu
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172 Tongzipo Road, Changsha 410031, Hunan, China
| | - Guoping Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172 Tongzipo Road, Changsha 410031, Hunan, China
- The Third Xiangya Hospital, Central South University, No. 138 Tongzipo Road, Changsha 410031, Hunan, China
| | - Dongshen Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172 Tongzipo Road, Changsha 410031, Hunan, China
| |
Collapse
|
18
|
Zhao D, Zhang Y, Chen Y, Li B, Zhou W, Wang L. Highly Accurate and Explainable Predictions of Small-Molecule Antioxidants for Eight In Vitro Assays Simultaneously through an Alternating Multitask Learning Strategy. J Chem Inf Model 2024; 64:9098-9110. [PMID: 38888465 DOI: 10.1021/acs.jcim.4c00748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/20/2024]
Abstract
Small molecule antioxidants can inhibit or retard oxidation reactions and protect against free radical damage to cells, thus playing a key role in food, cosmetics, pharmaceuticals, the environment, as well as materials. Experimentally driven antioxidant discovery is a major paradigm, and computationally assisted antioxidants are rarely reported. In this study, a functional-group-based alternating multitask self-supervised molecular representation learning method is proposed to simultaneously predict the antioxidant activities of small molecules for eight commonly used in vitro antioxidant assays. Extensive evaluation results reveal that compared with the baseline models, the multitask FG-BERT model achieves the best overall predictive performance, with the highest average F1, BA, ROC-AUC, and PRC-AUC values of 0.860, 0.880, 0.954, and 0.937 for the test sets, respectively. The Y-scrambling testing results further demonstrate that such a deep learning model was not constructed by accident and that it has reliable predictive capabilities. Additionally, the excellent interpretability of the multitask FG-BERT model makes it easy to identify key structural fragments/groups that contribute significantly to the antioxidant effect of a given molecule. Finally, an online antioxidant activity prediction platform called AOP (freely available at https://aop.idruglab.cn/) and its local version were developed based on the high-quality multitask FG-BERT model for experts and nonexperts in the field. We anticipate that it will contribute to the discovery of novel small-molecule antioxidants.
Collapse
Affiliation(s)
- Duancheng Zhao
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Yanhong Zhang
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Yihao Chen
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Biaoshun Li
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Wenguang Zhou
- Central Laboratory of The Sixth Affiliated Hospital, School of Medicine, South China University of Technology, Foshan 528200, China
| | - Ling Wang
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| |
Collapse
|
19
|
Cao PY, He Y, Cui MY, Zhang XM, Zhang Q, Zhang HY. Group graph: a molecular graph representation with enhanced performance, efficiency and interpretability. J Cheminform 2024; 16:133. [PMID: 39609909 PMCID: PMC11606038 DOI: 10.1186/s13321-024-00933-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Accepted: 11/15/2024] [Indexed: 11/30/2024] Open
Abstract
The exploration of chemical space holds promise for developing influential chemical entities. Molecular representations, which reflect features of molecular structure in silico, assist in navigating chemical space appropriately. Unlike atom-level molecular representations, such as SMILES and atom graph, which can sometimes lead to confusing interpretations about chemical substructures, substructure-level molecular representations encode important substructures into molecular features; they not only provide more information for predicting molecular properties and drug‒drug interactions but also help to interpret the correlations between molecular properties and substructures. However, it remains challenging to represent the entire molecular structure both intactly and simply with substructure-level molecular representations. In this study, we developed a novel substructure-level molecular representation and named it a group graph. The group graph offers three advantages: (a) the substructure of the group graph reflects the diversity and consistency of different molecular datasets; (b) the group graph retains molecular structural features with minimal information loss because the graph isomorphism network (GIN) of the group graph performs well in molecular properties and drug‒drug interactions prediction, showing higher accuracy and efficiency than the model of other molecular graphs, even without any pretraining; and (c) the molecular property may change when the substructure is substituted with another of differing importance in group graph, facilitating the detection of activity cliffs. In addition, we successfully predicted structural modifications to improve blood‒brain barrier permeability (BBBP) via the GIN of group graph. Therefore, the group graph takes advantages for simultaneously representing molecular local characteristics and global features.Scientific contribution The group graph, as a substructure-level molecular representation, has the ability to retain molecular structural features with minimal information loss. As a result, it shows superior performance in predicting molecular properties and drug‒drug interactions with enhanced efficiency and interpretability.
Collapse
Affiliation(s)
- Piao-Yang Cao
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, People's Republic of China
| | - Yang He
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, People's Republic of China
| | - Ming-Yang Cui
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, People's Republic of China
| | - Xiao-Min Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, People's Republic of China
| | - Qingye Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, People's Republic of China
| | - Hong-Yu Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, People's Republic of China.
| |
Collapse
|
20
|
Xu Y, Liu X, Xia W, Ge J, Ju CW, Zhang H, Zhang JZ. ChemXTree: A Feature-Enhanced Graph Neural Network-Neural Decision Tree Framework for ADMET Prediction. J Chem Inf Model 2024; 64:8440-8452. [PMID: 39497657 PMCID: PMC11600499 DOI: 10.1021/acs.jcim.4c01186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 10/18/2024] [Accepted: 10/29/2024] [Indexed: 11/07/2024]
Abstract
The rapid progression of machine learning, especially deep learning (DL), has catalyzed a new era in drug discovery, introducing innovative approaches for predicting molecular properties. Despite the many methods available for feature representation, efficiently utilizing rich, high-dimensional information remains a significant challenge. Our work introduces ChemXTree, a novel graph-based model that integrates a Gate Modulation Feature Unit (GMFU) and neural decision tree (NDT) in the output layer to address this challenge. Extensive evaluations on benchmark data sets, including MoleculeNet and eight additional drug databases, have demonstrated ChemXTree's superior performance, surpassing or matching the current state-of-the-art models. Visualization techniques clearly demonstrate that ChemXTree significantly improves the separation between substrates and nonsubstrates in the latent space. In summary, ChemXTree demonstrates a promising approach for integrating advanced feature extraction with neural decision trees, offering significant improvements in predictive accuracy for drug discovery tasks and opening new avenues for optimizing molecular properties.
Collapse
Affiliation(s)
- Yuzhi Xu
- Shanghai
Frontiers Science Center of Artificial Intelligence and Deep Learning
and NYU-ECNU Center for Computational Chemistry, NYU Shanghai, Shanghai 200062, China
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Xinxin Liu
- Department
of Computer and Information Science, University
of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
- Department
of Materials Science and Engineering, University
of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
| | - Wei Xia
- Shanghai
Frontiers Science Center of Artificial Intelligence and Deep Learning
and NYU-ECNU Center for Computational Chemistry, NYU Shanghai, Shanghai 200062, China
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Jiankai Ge
- Chemical
and Biomolecular Engineering, University
of Illinois at Urbana−Champaign, Urbana, Illinois 61801, United States
| | - Cheng-Wei Ju
- Pritzker
School of Molecular Engineering, The University
of Chicago, Chicago, Illinois 60615, United States
| | - Haiping Zhang
- Faculty of
Synthetic Biology, Shenzhen Institute of
Advanced Technology, Shenzhen 518055, China
| | - John Z.H. Zhang
- Shanghai
Frontiers Science Center of Artificial Intelligence and Deep Learning
and NYU-ECNU Center for Computational Chemistry, NYU Shanghai, Shanghai 200062, China
- Department
of Chemistry, New York University, New York, New York 10003, United States
- Faculty of
Synthetic Biology, Shenzhen Institute of
Advanced Technology, Shenzhen 518055, China
- Shanghai
Engineering Research Center of Molecular Therapeutics and New Drug
Development, School of Chemistry and Molecular Engineering, East China Normal University, 200062 Shanghai, China
| |
Collapse
|
21
|
Yang Y, Yang Z, Pang X, Cao H, Sun Y, Wang L, Zhou Z, Wang P, Liang Y, Wang Y. Molecular designing of potential environmentally friendly PFAS based on deep learning and generative models. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 953:176095. [PMID: 39245376 DOI: 10.1016/j.scitotenv.2024.176095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Revised: 09/03/2024] [Accepted: 09/04/2024] [Indexed: 09/10/2024]
Abstract
Perfluoroalkyl and polyfluoroalkyl substances (PFAS) are widely used across a spectrum of industrial and consumer goods. Nonetheless, their persistent nature and tendency to accumulate in biological systems pose substantial environmental and health threats. Consequently, striking a balance between maximizing product efficiency and minimizing environmental and health risks by tailoring the molecular structure of PFAS has become a pivotal challenge in the fields of environmental chemistry and sustainable development. To address this issue, a computational workflow was proposed for designing an environmentally friendly PFAS by incorporating deep learning (DL) and molecular generative models. The hybrid DL architecture MolHGT+ based on heterogeneous graph neural network with transformer-like attention was applied to predict the surface tension, bioaccumulation, and hepatotoxicity of the molecules. Through virtual screening of the PFAS master database using MolHGT+, the findings indicate that incorporating the siloxane group and betaine fragment can effectively decrease both the bioaccumulation and hepatotoxicity of PFAS while preserving low surface tension. In addition, molecular generative models were employed to create a structurally diverse pool of novel PFASs with the aforementioned hit molecules serving as the initial template structures. Overall, our study presents a promising AI-driven method for advancing the development of environmentally friendly PFAS.
Collapse
Affiliation(s)
- Ying Yang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Zeguo Yang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Xudi Pang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Huiming Cao
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China.
| | - Yuzhen Sun
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Ling Wang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Zhen Zhou
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Pu Wang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Yong Liang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China.
| | - Yawei Wang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China; State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| |
Collapse
|
22
|
Yang K, Cheng J, Cao S, Pan X, Shen HB, Yuan Y. Predicting transcriptional changes induced by molecules with MiTCP. Brief Bioinform 2024; 26:bbaf006. [PMID: 39847444 PMCID: PMC11756340 DOI: 10.1093/bib/bbaf006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Revised: 12/05/2024] [Accepted: 01/21/2025] [Indexed: 01/24/2025] Open
Abstract
Studying the changes in cellular transcriptional profiles induced by small molecules can significantly advance our understanding of cellular state alterations and response mechanisms under chemical perturbations, which plays a crucial role in drug discovery and screening processes. Considering that experimental measurements need substantial time and cost, we developed a deep learning-based method called Molecule-induced Transcriptional Change Predictor (MiTCP) to predict changes in transcriptional profiles (CTPs) of 978 landmark genes induced by molecules. MiTCP utilizes graph neural network-based approaches to simultaneously model molecular structure representation and gene co-expression relationships, and integrates them for CTP prediction. After training on the L1000 dataset, MiTCP achieves an average Pearson correlation coefficient (PCC) of 0.482 on the test set and an average PCC of 0.801 for predicting the top 50 differentially expressed genes, which outperforms other existing methods. Furthermore, we used MiTCP to predict CTPs of three cancer drugs, palbociclib, irinotecan and goserelin, and performed gene enrichment analysis on the top differentially expressed genes and found that the enriched pathways and Gene Ontology terms are highly relevant to the corresponding diseases, which reveals the potential of MiTCP in drug development.
Collapse
Affiliation(s)
- Kaiyuan Yang
- Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
| | - Jiabei Cheng
- Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
| | - Shenghao Cao
- Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
| | - Xiaoyong Pan
- Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
| | - Hong-Bin Shen
- Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
| | - Ye Yuan
- Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
- State Key Laboratory of Biopharmaceutical Preparation and Delivery, Institute of Process Engineering, Chinese Academy of Sciences, 1 North 2nd Street, Zhongguancun, Haidian District, Beijing 100190, China
| |
Collapse
|
23
|
Kang Y, Xia Q, Jiang Y, Li Z. MVGNet: Prediction of PI3K Inhibitors Using Multitask Learning and Multiview Frameworks. ACS OMEGA 2024; 9:45159-45168. [PMID: 39554430 PMCID: PMC11561616 DOI: 10.1021/acsomega.4c06224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Revised: 10/09/2024] [Accepted: 10/15/2024] [Indexed: 11/19/2024]
Abstract
PI3K (phosphatidylinositol 3-kinase) is an intracellular phosphatidylinositol kinase composed of a regulatory subunit, p85, and a catalytic subunit, p110. Based on the different structures of the p110 catalytic subunit, PI3K can be divided into four isoforms: PI3Kα, PI3Kβ, PI3Kγ, and PI3Kδ. As molecularly targeted drugs, PI3K inhibitors have demonstrated antiproliferative effects on tumor cells and can also induce cancer cell death. In this study, a multiview deep learning framework (MVGNet) is proposed, which integrates fragment-based pharmacophore information and utilizes multitask learning to capture correlation information between subtasks. This framework predicts the inhibitory activity of molecules against the four PI3K isoforms (PI3Kα, PI3Kβ, PI3Kγ, and PI3Kδ). Compared to baseline prediction models based on three traditional machine learning methods (RF, SVM, and XGBoost) and four deep learning algorithms (GAT, D-MPNN, CMPNN, and KANO), our model demonstrates superior performance. The evaluation results show that our model achieves the highest average AUC-ROC and AUC-PR values on the test set, which are 0.927 ± 0.006 and 0.980 ± 0.002, respectively. This study provides a reference for exploring the structure-activity relationship of PI3K inhibitors.
Collapse
Affiliation(s)
- Yanlei Kang
- Zhejiang Province Key Laboratory of Smart Management & Application of Modern Agricultural Re-sources, School of Information Engineering, Huzhou University, Huzhou 313000, Zhejiang Province,China
| | - Qiwei Xia
- Zhejiang Province Key Laboratory of Smart Management & Application of Modern Agricultural Re-sources, School of Information Engineering, Huzhou University, Huzhou 313000, Zhejiang Province,China
| | - Yunliang Jiang
- Zhejiang Province Key Laboratory of Smart Management & Application of Modern Agricultural Re-sources, School of Information Engineering, Huzhou University, Huzhou 313000, Zhejiang Province,China
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, Zhejiang Province, China
| | - Zhong Li
- Zhejiang Province Key Laboratory of Smart Management & Application of Modern Agricultural Re-sources, School of Information Engineering, Huzhou University, Huzhou 313000, Zhejiang Province,China
| |
Collapse
|
24
|
Yang R, Zhou H, Wang F, Yang G. DigFrag as a digital fragmentation method used for artificial intelligence-based drug design. Commun Chem 2024; 7:258. [PMID: 39528759 PMCID: PMC11555370 DOI: 10.1038/s42004-024-01346-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Accepted: 10/29/2024] [Indexed: 11/16/2024] Open
Abstract
Fragment-Based Drug Design (FBDD) plays a pivotal role in the field of drug discovery and development. The construction of high-quality fragment libraries is a critical step in FBDD. Conventional fragmentation approaches often rely on rigid rules and chemical intuition, limiting their adaptability to diverse molecular structures. The rapid development of Artificial Intelligence (AI) technology offers a transformative opportunity to rethink traditional methods. Here, we present DigFrag, a digital fragmentation method that highlights important substructures by focusing locally within the molecular graph. In addition, we feed the fragments segmented by machine intelligence and human expertise into the deep generative model to compare the preference for data from different sources. Experimental results show that the structural diversity of fragments segmented by DigFrag is higher, and more desirable compounds are generated based on these fragments. These results also demonstrate that data generated based on AI methods may be more suitable for AI models. Moreover, a user-friendly platform called MolFrag ( https://dpai.ccnu.edu.cn/MolFrag/ ) is developed based on various fragmentation techniques to support molecular segmentation.
Collapse
Affiliation(s)
- Ruoqi Yang
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, China
| | - Hao Zhou
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, China
| | - Fan Wang
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, China.
| | - Guangfu Yang
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, China.
| |
Collapse
|
25
|
Wang G, Feng H, Du M, Feng Y, Cao C. Multimodal Representation Learning via Graph Isomorphism Network for Toxicity Multitask Learning. J Chem Inf Model 2024; 64:8322-8338. [PMID: 39432821 DOI: 10.1021/acs.jcim.4c01061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2024]
Abstract
Toxicity is paramount for comprehending compound properties, particularly in the early stages of drug design. Due to the diversity and complexity of toxic effects, it became a challenge to compute compound toxicity tasks. To address this issue, we propose a multimodal representation learning model, termed multimodal graph isomorphism network (MMGIN), to address this challenge for compound toxicity multitask learning. Based on fingerprints and molecular graphs of compounds, our MMGIN model incorporates a multimodal representation learning model to acquire a comprehensive compound representation. This model adopts a two-channel structure to independently learn fingerprint representation and molecular graph representation. Subsequently, two feedforward neural networks utilize the learned multimodal compound representation to perform multitask learning, encompassing compound toxicity classification and multiple compound category classification simultaneously. To test the effectiveness of our model, we constructed a novel data set, termed the compound toxicity multitask learning (CTMTL) data set, derived from the TOXRIC data set. We compare our MMGIN model with other representative machine learning and deep learning models on the CTMTL and Tox21 data sets. The experimental results demonstrate significant advancements achieved by our MMGIN model. Furthermore, the ablation study underscores the effectiveness of the introduced fingerprints, molecular graphs, the multimodal representation learning model, and the multitask learning model, showcasing the model's superior predictive capability and robustness.
Collapse
Affiliation(s)
- Guishen Wang
- School of Computer Science and Engineering, Changchun University of Technology, North Yuanda Street No. 3000, Changchun, 130012 Jilin, China
| | - Hui Feng
- School of Computer Science and Engineering, Changchun University of Technology, North Yuanda Street No. 3000, Changchun, 130012 Jilin, China
| | - Mengyan Du
- School of Computer Science and Engineering, Changchun University of Technology, North Yuanda Street No. 3000, Changchun, 130012 Jilin, China
| | - Yuncong Feng
- School of Computer Science and Engineering, Changchun University of Technology, North Yuanda Street No. 3000, Changchun, 130012 Jilin, China
| | - Chen Cao
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Longmian Avenue No. 101, Nanjing, 211166 Jiangsu, China
| |
Collapse
|
26
|
Lin M, Cai J, Wei Y, Peng X, Luo Q, Li B, Chen Y, Wang L. MalariaFlow: A comprehensive deep learning platform for multistage phenotypic antimalarial drug discovery. Eur J Med Chem 2024; 277:116776. [PMID: 39173285 DOI: 10.1016/j.ejmech.2024.116776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 07/31/2024] [Accepted: 08/01/2024] [Indexed: 08/24/2024]
Abstract
Malaria remains a significant global health challenge due to the growing drug resistance of Plasmodium parasites and the failure to block transmission within human host. While machine learning (ML) and deep learning (DL) methods have shown promise in accelerating antimalarial drug discovery, the performance of deep learning models based on molecular graph and other co-representation approaches warrants further exploration. Current research has overlooked mutant strains of the malaria parasite with varying degrees of sensitivity or resistance, and has not covered the prediction of inhibitory activities across the three major life cycle stages (liver, asexual blood, and gametocyte) within the human host, which is crucial for both treatment and transmission blocking. In this study, we manually curated a benchmark antimalarial activity dataset comprising 407,404 unique compounds and 410,654 bioactivity data points across ten Plasmodium phenotypes and three stages. The performance was systematically compared among two fingerprint-based ML models (RF::Morgan and XGBoost:Morgan), four graph-based DL models (GCN, GAT, MPNN, and Attentive FP), and three co-representations DL models (FP-GNN, HiGNN, and FG-BERT), which reveal that: 1) The FP-GNN model achieved the best predictive performance, outperforming the other methods in distinguishing active and inactive compounds across balanced, more positive, and more negative datasets, with an overall AUROC of 0.900; 2) Fingerprint-based ML models outperformed graph-based DL models on large datasets (>1000 compounds), but the three co-representations DL models were able to incorporate domain-specific chemical knowledge to bridge this gap, achieving better predictive performance. These findings provide valuable guidance for selecting appropriate ML and DL methods for antimalarial activity prediction tasks. The interpretability analysis of the FP-GNN model revealed its ability to accurately capture the key structural features responsible for the liver- and blood-stage activities of the known antimalarial drug atovaquone. Finally, we developed a web server, MalariaFlow, incorporating these high-quality models for antimalarial activity prediction, virtual screening, and similarity search, successfully predicting novel triple-stage antimalarial hits validated through experimental testing, demonstrating its effectiveness and value in discovering potential multistage antimalarial drug candidates.
Collapse
Affiliation(s)
- Mujie Lin
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Junxi Cai
- School of Civil Engineering and Transportation, South China University of Technology, Guangzhou, 510006, China
| | - Yuancheng Wei
- School of Software Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Xinru Peng
- School of Software Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Qianhui Luo
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Biaoshun Li
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Yihao Chen
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Ling Wang
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China.
| |
Collapse
|
27
|
Wang L, Wang S, Yang H, Li S, Wang X, Zhou Y, Tian S, Liu L, Bai F. Conformational Space Profiling Enhances Generic Molecular Representation for AI-Powered Ligand-Based Drug Discovery. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2403998. [PMID: 39206753 PMCID: PMC11516098 DOI: 10.1002/advs.202403998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 06/25/2024] [Indexed: 09/04/2024]
Abstract
The molecular representation model is a neural network that converts molecular representations (SMILES, Graph) into feature vectors, and is an essential module applied across a wide range of artificial intelligence-driven drug discovery scenarios. However, current molecular representation models rarely consider the three-dimensional conformational space of molecules, losing sight of the dynamic nature of small molecules as well as the essence of molecular conformational space that covers the heterogeneity of molecule properties, such as the multi-target mechanism of action, recognition of different biomolecules, dynamics in cytoplasm and membrane. In this study, a new model named GeminiMol is proposed to incorporate conformational space profiles into molecular representation learning, which extracts the feature of capturing the complicated interplay between the molecular structure and the conformational space. Although GeminiMol is pre-trained on a relatively small-scale molecular dataset (39290 molecules), it shows balanced and superior performance not only on 67 molecular properties predictions but also on 73 cellular activity predictions and 171 zero-shot tasks (including virtual screening and target identification). By capturing the molecular conformational space profile, the strategy paves the way for rapid exploration of chemical space and facilitates changing paradigms for drug design.
Collapse
Affiliation(s)
- Lin Wang
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and TechnologyShanghai Tech UniversityShanghai201210China
| | - Shihang Wang
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and TechnologyShanghai Tech UniversityShanghai201210China
| | - Hao Yang
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and TechnologyShanghai Tech UniversityShanghai201210China
| | - Shiwei Li
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and TechnologyShanghai Tech UniversityShanghai201210China
| | - Xinyu Wang
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and TechnologyShanghai Tech UniversityShanghai201210China
| | - Yongqi Zhou
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and TechnologyShanghai Tech UniversityShanghai201210China
| | - Siyuan Tian
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and TechnologyShanghai Tech UniversityShanghai201210China
| | - Lu Liu
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and TechnologyShanghai Tech UniversityShanghai201210China
| | - Fang Bai
- Shanghai Institute for Advanced Immunochemical StudiesSchool of Life Science and TechnologyInformation Science and TechnologyShanghai Tech UniversityShanghai Clinical Research and Trial CenterShanghai201210China
| |
Collapse
|
28
|
He G, Liu S, Liu Z, Wang C, Zhang K, Li H. Prototype-based contrastive substructure identification for molecular property prediction. Brief Bioinform 2024; 25:bbae565. [PMID: 39494969 PMCID: PMC11533112 DOI: 10.1093/bib/bbae565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 08/11/2024] [Accepted: 10/22/2024] [Indexed: 11/05/2024] Open
Abstract
Substructure-based representation learning has emerged as a powerful approach to featurize complex attributed graphs, with promising results in molecular property prediction (MPP). However, existing MPP methods mainly rely on manually defined rules to extract substructures. It remains an open challenge to adaptively identify meaningful substructures from numerous molecular graphs to accommodate MPP tasks. To this end, this paper proposes Prototype-based cOntrastive Substructure IdentificaTion (POSIT), a self-supervised framework to autonomously discover substructural prototypes across graphs so as to guide end-to-end molecular fragmentation. During pre-training, POSIT emphasizes two key aspects of substructure identification: firstly, it imposes a soft connectivity constraint to encourage the generation of topologically meaningful substructures; secondly, it aligns resultant substructures with derived prototypes through a prototype-substructure contrastive clustering objective, ensuring attribute-based similarity within clusters. In the fine-tuning stage, a cross-scale attention mechanism is designed to integrate substructure-level information to enhance molecular representations. The effectiveness of the POSIT framework is demonstrated by experimental results from diverse real-world datasets, covering both classification and regression tasks. Moreover, visualization analysis validates the consistency of chemical priors with identified substructures. The source code is publicly available at https://github.com/VRPharmer/POSIT.
Collapse
Affiliation(s)
- Gaoqi He
- School of Computer Science and Technology, East China Normal University, 200062 Shanghai, China
| | - Shun Liu
- School of Computer Science and Technology, East China Normal University, 200062 Shanghai, China
| | - Zhuoran Liu
- School of Computer Science and Technology, East China Normal University, 200062 Shanghai, China
| | - Changbo Wang
- School of Computer Science and Technology, East China Normal University, 200062 Shanghai, China
| | - Kai Zhang
- School of Computer Science and Technology, East China Normal University, 200062 Shanghai, China
| | - Honglin Li
- Innovation Center for AI and Drug Discovery, East China Normal University, 200062 Shanghai, China
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, 200237 Shanghai, China
| |
Collapse
|
29
|
Jiang X, Tan L, Zou Q. DGCL: dual-graph neural networks contrastive learning for molecular property prediction. Brief Bioinform 2024; 25:bbae474. [PMID: 39331017 PMCID: PMC11428321 DOI: 10.1093/bib/bbae474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 08/16/2024] [Accepted: 09/13/2024] [Indexed: 09/28/2024] Open
Abstract
In this paper, we propose DGCL, a dual-graph neural networks (GNNs)-based contrastive learning (CL) integrated with mixed molecular fingerprints (MFPs) for molecular property prediction. The DGCL-MFP method contains two stages. In the first pretraining stage, we utilize two different GNNs as encoders to construct CL, rather than using the method of generating enhanced graphs as before. Precisely, DGCL aggregates and enhances features of the same molecule by the Graph Isomorphism Network and the Graph Attention Network, with representations extracted from the same molecule serving as positive samples, and others marked as negative ones. In the downstream tasks training stage, features extracted from the two above pretrained graph networks and the meticulously selected MFPs are concated together to predict molecular properties. Our experiments show that DGCL enhances the performance of existing GNNs by achieving or surpassing the state-of-the-art self-supervised learning models on multiple benchmark datasets. Specifically, DGCL increases the average performance of classification tasks by 3.73$\%$ and improves the performance of regression task Lipo by 0.126. Through ablation studies, we validate the impact of network fusion strategies and MFPs on model performance. In addition, DGCL's predictive performance is further enhanced by weighting different molecular features based on the Extended Connectivity Fingerprint. The code and datasets of DGCL will be made publicly available.
Collapse
Affiliation(s)
- Xiuyu Jiang
- School of Computer Science and Engineering, Sun Yat-sen University, Waihuan East Street, Guangzhou 510006, China
| | - Liqin Tan
- School of Computer Science and Engineering, Sun Yat-sen University, Waihuan East Street, Guangzhou 510006, China
| | - Qingsong Zou
- School of Computer Science and Engineering, Sun Yat-sen University, Waihuan East Street, Guangzhou 510006, China
| |
Collapse
|
30
|
Niu Z, Xiao X, Wu W, Cai Q, Jiang Y, Jin W, Wang M, Yang G, Kong L, Jin X, Yang G, Chen H. PharmaBench: Enhancing ADMET benchmarks with large language models. Sci Data 2024; 11:985. [PMID: 39256394 PMCID: PMC11387650 DOI: 10.1038/s41597-024-03793-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Accepted: 08/19/2024] [Indexed: 09/12/2024] Open
Abstract
Accurately predicting ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) properties early in drug development is essential for selecting compounds with optimal pharmacokinetics and minimal toxicity. Existing ADMET-related benchmark sets are limited in utility due to their small dataset sizes and the lack of representation of compounds used in drug discovery projects. These shortcomings hinder their application in model building for drug discovery. To address this issue, we propose a multi-agent data mining system based on Large Language Models that effectively identifies experimental conditions within 14,401 bioassays. This approach facilitates merging entries from different sources, culminating in the creation of PharmaBench. Additionally, we have developed a data processing workflow to integrate data from various sources, resulting in 156,618 raw entries. Through this workflow, we constructed PharmaBench, a comprehensive benchmark set for ADMET properties, which comprises eleven ADMET datasets and 52,482 entries. This benchmark set is designed to serve as an open-source dataset for the development of AI models relevant to drug discovery projects.
Collapse
Affiliation(s)
- Zhangming Niu
- MindRank AI, Hangzhou, Zhejiang, China
- National Heart and Lung Institute, Imperial College London, London, SW7 2AZ, UK
| | - Xianglu Xiao
- MindRank AI, Hangzhou, Zhejiang, China
- Bioengineering Department and Imperial-X, Imperial College London, London, W12 7SL, UK
| | - Wenfan Wu
- MindRank AI, Hangzhou, Zhejiang, China
- Department of Bioinformatics and Systems Biology, Huazhong University of Science and Technology College of Life Sciences and Technology, Wuhan, Hubei, China
- Guangzhou National Laboratory, Guangzhou, 510005, China
| | - Qiwei Cai
- MindRank AI, Hangzhou, Zhejiang, China
| | | | | | | | | | | | - Xurui Jin
- MindRank AI, Hangzhou, Zhejiang, China
| | - Guang Yang
- National Heart and Lung Institute, Imperial College London, London, SW7 2AZ, UK.
- Bioengineering Department and Imperial-X, Imperial College London, London, W12 7SL, UK.
- Cardiovascular Research Centre, Royal Brompton Hospital, London, SW3 6NP, UK.
- School of Biomedical Engineering & Imaging Sciences, King's College London, London, UK.
| | - Hongming Chen
- Department of Bioinformatics and Systems Biology, Huazhong University of Science and Technology College of Life Sciences and Technology, Wuhan, Hubei, China.
- Guangzhou National Laboratory, Guangzhou, 510005, China.
- School of pharmaceutical sciences, Guangzhou Medical University, Guangzhou, 511495, China.
| |
Collapse
|
31
|
Zhu Y, Zhang Y, Li X, Wang L. 3MTox: A motif-level graph-based multi-view chemical language model for toxicity identification with deep interpretation. JOURNAL OF HAZARDOUS MATERIALS 2024; 476:135114. [PMID: 38986414 DOI: 10.1016/j.jhazmat.2024.135114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 06/24/2024] [Accepted: 07/04/2024] [Indexed: 07/12/2024]
Abstract
Toxicity identification plays a key role in maintaining human health, as it can alert humans to the potential hazards caused by long-term exposure to a wide variety of chemical compounds. Experimental methods for determining toxicity are time-consuming, and costly, while computational methods offer an alternative for the early identification of toxicity. For example, some classical ML and DL methods, which demonstrate excellent performance in toxicity prediction. However, these methods also have some defects, such as over-reliance on artificial features and easy overfitting, etc. Proposing novel models with superior prediction performance is still an urgent task. In this study, we propose a motifs-level graph-based multi-view pretraining language model, called 3MTox, for toxicity identification. The 3MTox model uses Bidirectional Encoder Representations from Transformers (BERT) as the backbone framework, and a motif graph as input. The results of extensive experiments showed that our 3MTox model achieved state-of-the-art performance on toxicity benchmark datasets and outperformed the baseline models considered. In addition, the interpretability of the model ensures that the it can quickly and accurately identify toxicity sites in a given molecule, thereby contributing to the determination of the status of toxicity and associated analyses. We think that the 3MTox model is among the most promising tools that are currently available for toxicity identification.
Collapse
Affiliation(s)
- Yingying Zhu
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Yanhong Zhang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Xinze Li
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Ling Wang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China.
| |
Collapse
|
32
|
Zhang Q, Mao D, Tu Y, Wu YY. A New Fingerprint and Graph Hybrid Neural Network for Predicting Molecular Properties. J Chem Inf Model 2024; 64:5853-5866. [PMID: 39052623 DOI: 10.1021/acs.jcim.4c00586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
Machine learning plays a role in accelerating drug discovery, and the design of effective machine learning models is crucial for accurately predicting molecular properties. Characterizing molecules typically involves the use of molecular fingerprints and molecular graphs. These are input into a multilayer perceptron (MLP) and variants of graph neural networks, such as graph attention networks (GATs). Due to the diverse types and large dimension of fingerprints, models may contain many features that are relatively irrelevant or redundant; meanwhile, although the GAT excels in handling heterogeneous graph tasks, it lacks the ability to extract collaborative information from neighboring nodes, which is crucial in scenarios where it cannot capture the joint influence of adjacent groups on atoms. To overcome these challenges, we introduce a hybrid model, combining improved GAT and MLP. In GAT, the recurrent neural network is employed to capture collaborative information. To address the dimensionality issue, we propose a feature selection algorithm, which is based on the principle of maximizing relevance while minimizing redundancy. Through experiments on 13 public data sets and 14 breast cell lines, our model demonstrates superior performance compared to state-of-the-art deep learning and traditional machine learning algorithms. Additionally, a series of ablation experiments were conducted to demonstrate the advantages of our improved version, as well as its antinoise capability and interpretability. These results indicate that our model holds promising prospects for practical applications.
Collapse
Affiliation(s)
- Qingtian Zhang
- College of Physics Science and Technology, Yangzhou University, Jiangsu 225009, China
| | - Dangxin Mao
- College of Physics Science and Technology, Yangzhou University, Jiangsu 225009, China
| | - Yusong Tu
- College of Physics Science and Technology, Yangzhou University, Jiangsu 225009, China
| | - Yuan-Yan Wu
- College of Physics Science and Technology, Yangzhou University, Jiangsu 225009, China
| |
Collapse
|
33
|
Tan X, Liu Q, Fang Y, Zhu Y, Chen F, Zeng W, Ouyang D, Dong J. Predicting Peptide Permeability Across Diverse Barriers: A Systematic Investigation. Mol Pharm 2024; 21:4116-4127. [PMID: 39031123 DOI: 10.1021/acs.molpharmaceut.4c00478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/22/2024]
Abstract
Peptide-based therapeutics hold immense promise for the treatment of various diseases. However, their effectiveness is often hampered by poor cell membrane permeability, hindering targeted intracellular delivery and oral drug development. This study addressed this challenge by introducing a novel graph neural network (GNN) framework and advanced machine learning algorithms to build predictive models for peptide permeability. Our models offer systematic evaluation across diverse peptides (natural, modified, linear and cyclic) and cell lines [Caco-2, Ralph Russ canine kidney (RRCK) and parallel artificial membrane permeability assay (PAMPA)]. The predictive models for linear and cyclic peptides in Caco-2 and RRCK cell lines were constructed for the first time, with an impressive coefficient of determination (R2) of 0.708, 0.484, 0.553, and 0.528 in the test set, respectively. Notably, the GNN framework behaved better in permeability prediction with larger data sets and improved the accuracy of cyclic peptide prediction in the PAMPA cell line. The R2 increased by about 0.32 compared with the reported models. Furthermore, the important molecular structural features that contribute to good permeability were interpreted; the influence of cell lines, peptide modification, and cyclization on permeability were successfully revealed. To facilitate broader use, we deployed these models on the user-friendly KNIME platform (https://github.com/ifyoungnet/PharmPapp). This work provides a rapid and reliable strategy for systematically assessing peptide permeability, aiding researchers in drug delivery optimization, peptide preselection during drug discovery, and potentially the design of targeted peptide-based materials.
Collapse
Affiliation(s)
- Xiaorong Tan
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410083, China
| | - Qianhui Liu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410083, China
| | - Yanpeng Fang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410083, China
| | - Yingli Zhu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410083, China
| | - Fei Chen
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410083, China
| | - Wenbin Zeng
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410083, China
| | - Defang Ouyang
- Institute of Chinese Medical Sciences (ICMS), State Key Laboratory of Quality Research in Chinese Medicine, University of Macau, Macau 999078, China
| | - Jie Dong
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410083, China
| |
Collapse
|
34
|
Gu X, Myung Y, Rodrigues CHM, Ascher DB. EFG-CS: Predicting chemical shifts from amino acid sequences with protein structure prediction using machine learning and deep learning models. Protein Sci 2024; 33:e5096. [PMID: 38979954 PMCID: PMC11232051 DOI: 10.1002/pro.5096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Revised: 05/06/2024] [Accepted: 06/15/2024] [Indexed: 07/10/2024]
Abstract
Nuclear magnetic resonance (NMR) crystallography is one of the main methods in structural biology for analyzing protein stereochemistry and structure. The chemical shift of the resonance frequency reflects the effect of the protons in a molecule producing distinct NMR signals in different chemical environments. Apprehending chemical shifts from NMR signals can be challenging since having an NMR structure does not necessarily provide all the required chemical shift information, making predictive models essential for accurately deducing chemical shifts, either from protein structures or, more ideally, directly from amino acid sequences. Here, we present EFG-CS, a web server that specializes in chemical shift prediction. EFG-CS employs a machine learning-based transfer prediction model for backbone atom chemical shift prediction, using ESMFold-predicted protein structures. Additionally, ESG-CS incorporates a graph neural network-based model to provide comprehensive side-chain atom chemical shift predictions. Our method demonstrated reliable performance in backbone atom prediction, achieving comparable accuracy levels with root mean square errors (RMSE) of 0.30 ppm for H, 0.22 ppm for Hα, 0.89 ppm for C, 0.89 ppm for Cα, 0.84 ppm for Cβ, and 1.69 ppm for N. Moreover, our approach also showed predictive capabilities in side-chain atom chemical shift prediction achieving RMSE values of 0.71 ppm for Hβ, 0.74-1.15 ppm for Hδ, and 0.58-0.94 ppm for Hγ, solely utilizing amino acid sequences without homology or feature curation. This work shows for the first time that generative AI protein models can predict NMR shifts nearly comparable to experimental models. This web server is freely available at https://biosig.lab.uq.edu.au/efg_cs, and the chemical shift prediction results can be downloaded in tabular format and visualized in 3D format.
Collapse
Affiliation(s)
- Xiaotong Gu
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular BiosciencesUniversity of QueenslandBrisbaneQueenslandAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
| | - Yoochan Myung
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular BiosciencesUniversity of QueenslandBrisbaneQueenslandAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
| | - Carlos H. M. Rodrigues
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
| | - David B. Ascher
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular BiosciencesUniversity of QueenslandBrisbaneQueenslandAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
| |
Collapse
|
35
|
Lavecchia A. Advancing drug discovery with deep attention neural networks. Drug Discov Today 2024; 29:104067. [PMID: 38925473 DOI: 10.1016/j.drudis.2024.104067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 06/10/2024] [Accepted: 06/19/2024] [Indexed: 06/28/2024]
Abstract
In the dynamic field of drug discovery, deep attention neural networks are revolutionizing our approach to complex data. This review explores the attention mechanism and its extended architectures, including graph attention networks (GATs), transformers, bidirectional encoder representations from transformers (BERT), generative pre-trained transformers (GPTs) and bidirectional and auto-regressive transformers (BART). Delving into their core principles and multifaceted applications, we uncover their pivotal roles in catalyzing de novo drug design, predicting intricate molecular properties and deciphering elusive drug-target interactions. Despite challenges, these attention-based architectures hold unparalleled promise to drive transformative breakthroughs and accelerate progress in pharmaceutical research.
Collapse
Affiliation(s)
- Antonio Lavecchia
- Drug Discovery Laboratory, Department of Pharmacy, University of Napoli Federico II, I-80131 Naples, Italy.
| |
Collapse
|
36
|
Kang L, Zhou S, Fang S, Liu S. Adapting differential molecular representation with hierarchical prompts for multi-label property prediction. Brief Bioinform 2024; 25:bbae438. [PMID: 39252594 PMCID: PMC11383732 DOI: 10.1093/bib/bbae438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 08/05/2024] [Accepted: 08/21/2024] [Indexed: 09/11/2024] Open
Abstract
Accurate prediction of molecular properties is crucial in drug discovery. Traditional methods often overlook that real-world molecules typically exhibit multiple property labels with complex correlations. To this end, we propose a novel framework, HiPM, which stands for Hierarchical Prompted Molecular representation learning framework. HiPM leverages task-aware prompts to enhance the differential expression of tasks in molecular representations and mitigate negative transfer caused by conflicts in individual task information. Our framework comprises two core components: the Molecular Representation Encoder (MRE) and the Task-Aware Prompter (TAP). MRE employs a hierarchical message-passing network architecture to capture molecular features at both the atom and motif levels. Meanwhile, TAP utilizes agglomerative hierarchical clustering algorithm to construct a prompt tree that reflects task affinity and distinctiveness, enabling the model to consider multi-granular correlation information among tasks, thereby effectively handling the complexity of multi-label property prediction. Extensive experiments demonstrate that HiPM achieves state-of-the-art performance across various multi-label datasets, offering a novel perspective on multi-label molecular representation learning.
Collapse
Affiliation(s)
- Linjia Kang
- College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Songhua Zhou
- College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Shuyan Fang
- College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Shichao Liu
- College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| |
Collapse
|
37
|
Liu H, Hu B, Chen P, Wang X, Wang H, Wang S, Wang J, Lin B, Cheng M. Docking Score ML: Target-Specific Machine Learning Models Improving Docking-Based Virtual Screening in 155 Targets. J Chem Inf Model 2024; 64:5413-5426. [PMID: 38958413 DOI: 10.1021/acs.jcim.4c00072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/04/2024]
Abstract
In drug discovery, molecular docking methods face challenges in accurately predicting energy. Scoring functions used in molecular docking often fail to simulate complex protein-ligand interactions fully and accurately leading to biases and inaccuracies in virtual screening and target predictions. We introduce the "Docking Score ML", developed from an analysis of over 200,000 docked complexes from 155 known targets for cancer treatments. The scoring functions used are founded on bioactivity data sourced from ChEMBL and have been fine-tuned using both supervised machine learning and deep learning techniques. We validated our approach extensively using multiple data sets such as validation of selectivity mechanism, the DUDE, DUD-AD, and LIT-PCBA data sets, and performed a multitarget analysis on drugs like sunitinib. To enhance prediction accuracy, feature fusion techniques were explored. By merging the capabilities of the Graph Convolutional Network (GCN) with multiple docking functions, our results indicated a clear superiority of our methodologies over conventional approaches. These advantages demonstrate that Docking Score ML is an efficient and accurate tool for virtual screening and reverse docking.
Collapse
Affiliation(s)
- Haihan Liu
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Baichun Hu
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Peiying Chen
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Xiao Wang
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Hanxun Wang
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Shizun Wang
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Jian Wang
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Bin Lin
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Maosheng Cheng
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| |
Collapse
|
38
|
Shi S, Fu L, Yi J, Yang Z, Zhang X, Deng Y, Wang W, Wu C, Zhao W, Hou T, Zeng X, Lyu A, Cao D. ChemFH: an integrated tool for screening frequent false positives in chemical biology and drug discovery. Nucleic Acids Res 2024; 52:W439-W449. [PMID: 38783035 PMCID: PMC11223804 DOI: 10.1093/nar/gkae424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 04/25/2024] [Accepted: 05/10/2024] [Indexed: 05/25/2024] Open
Abstract
High-throughput screening rapidly tests an extensive array of chemical compounds to identify hit compounds for specific biological targets in drug discovery. However, false-positive results disrupt hit compound screening, leading to wastage of time and resources. To address this, we propose ChemFH, an integrated online platform facilitating rapid virtual evaluation of potential false positives, including colloidal aggregators, spectroscopic interference compounds, firefly luciferase inhibitors, chemical reactive compounds, promiscuous compounds, and other assay interferences. By leveraging a dataset containing 823 391 compounds, we constructed high-quality prediction models using multi-task directed message-passing network (DMPNN) architectures combining uncertainty estimation, yielding an average AUC value of 0.91. Furthermore, ChemFH incorporated 1441 representative alert substructures derived from the collected data and ten commonly used frequent hitter screening rules. ChemFH was validated with an external set of 75 compounds. Subsequently, the virtual screening capability of ChemFH was successfully confirmed through its application to five virtual screening libraries. Furthermore, ChemFH underwent additional validation on two natural products and FDA-approved drugs, yielding reliable and accurate results. ChemFH is a comprehensive, reliable, and computationally efficient screening pipeline that facilitates the identification of true positive results in assays, contributing to enhanced efficiency and success rates in drug discovery. ChemFH is freely available via https://chemfh.scbdd.com/.
Collapse
Affiliation(s)
- Shaohua Shi
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
- School of Chinese Medicine, Hong Kong Baptist University, Kowloon, Hong Kong SAR, 999077, P.R. China
| | - Li Fu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Jiacai Yi
- School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, P.R. China
| | - Ziyi Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Xiaochen Zhang
- School of Information Technology, Shangqiu Normal University, Shangqiu, Henan 476000, P.R. China
| | - Youchao Deng
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Wenxuan Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Chengkun Wu
- School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, P.R. China
| | - Wentao Zhao
- School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, P.R. China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, P.R. China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, P.R. China
| | - Aiping Lyu
- School of Chinese Medicine, Hong Kong Baptist University, Kowloon, Hong Kong SAR, 999077, P.R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| |
Collapse
|
39
|
Fu L, Shi S, Yi J, Wang N, He Y, Wu Z, Peng J, Deng Y, Wang W, Wu C, Lyu A, Zeng X, Zhao W, Hou T, Cao D. ADMETlab 3.0: an updated comprehensive online ADMET prediction platform enhanced with broader coverage, improved performance, API functionality and decision support. Nucleic Acids Res 2024; 52:W422-W431. [PMID: 38572755 PMCID: PMC11223840 DOI: 10.1093/nar/gkae236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 03/10/2024] [Accepted: 03/21/2024] [Indexed: 04/05/2024] Open
Abstract
ADMETlab 3.0 is the second updated version of the web server that provides a comprehensive and efficient platform for evaluating ADMET-related parameters as well as physicochemical properties and medicinal chemistry characteristics involved in the drug discovery process. This new release addresses the limitations of the previous version and offers broader coverage, improved performance, API functionality, and decision support. For supporting data and endpoints, this version includes 119 features, an increase of 31 compared to the previous version. The updated number of entries is 1.5 times larger than the previous version with over 400 000 entries. ADMETlab 3.0 incorporates a multi-task DMPNN architecture coupled with molecular descriptors, a method that not only guaranteed calculation speed for each endpoint simultaneously, but also achieved a superior performance in terms of accuracy and robustness. In addition, an API has been introduced to meet the growing demand for programmatic access to large amounts of data in ADMETlab 3.0. Moreover, this version includes uncertainty estimates in the prediction results, aiding in the confident selection of candidate compounds for further studies and experiments. ADMETlab 3.0 is publicly for access without the need for registration at: https://admetlab3.scbdd.com.
Collapse
Affiliation(s)
- Li Fu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Shaohua Shi
- School of Chinese Medicine, Hong Kong Baptist University, Kowloon, Hong Kong SAR, 999077, P.R. China
| | - Jiacai Yi
- School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, P.R. China
| | - Ningning Wang
- Xiangya Hospital of Central South University, Changsha, Hunan 410008, P.R. China
| | - Yuanhang He
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Zhenxing Wu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, P.R. China
| | - Jinfu Peng
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Youchao Deng
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Wenxuan Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Chengkun Wu
- School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, P.R. China
| | - Aiping Lyu
- School of Chinese Medicine, Hong Kong Baptist University, Kowloon, Hong Kong SAR, 999077, P.R. China
| | - Xiangxiang Zeng
- Department of Computer Science, Hunan University, Changsha, Hunan 410082, P.R. China
| | - Wentao Zhao
- School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, P.R. China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, P.R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| |
Collapse
|
40
|
Yang J, Hu Z, Zhang L, Peng B. Predicting Drugs Suspected of Causing Adverse Drug Reactions Using Graph Features and Attention Mechanisms. Pharmaceuticals (Basel) 2024; 17:822. [PMID: 39065673 PMCID: PMC11279999 DOI: 10.3390/ph17070822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 06/12/2024] [Accepted: 06/20/2024] [Indexed: 07/28/2024] Open
Abstract
BACKGROUND Adverse drug reactions (ADRs) refer to an unintended harmful reaction that occurs after the administration of a medication for therapeutic purposes, which is unrelated to the intended pharmacological action of the drug. In the United States, ADRs account for 6% of all hospital admissions annually. The cost of ADR-related illnesses in 2016 was estimated at USD 528.4 billion. Increasing the awareness of ADRs is an effective measure to prevent them. Assessing suspected drugs in adverse events helps to enhance the awareness of ADRs. METHODS In this study, a suspect drug assisted judgment model (SDAJM) is designed to identify suspected drugs in adverse events. This framework utilizes the graph isomorphism network (GIN) and an attention mechanism to extract features based on patients' demographic information, drug information, and ADR information. RESULTS By comparing it with other models, the results of various tests show that this model performs well in predicting the suspected drugs in adverse reaction events. ADR signal detection was conducted on a group of cardiovascular system drugs, and case analyses were performed on two classic drugs, Mexiletine and Captopril, as well as on two classic antithyroid drugs. The results indicate that the model can accomplish the task of predicting drug ADRs. Validation using benchmark datasets from ten drug discovery domains shows that the model is applicable to classification tasks on the Tox21 and SIDER datasets. CONCLUSIONS This study applies deep learning methods to construct the SDAJM model for three purposes: (1) identifying drugs suspected to cause adverse drug events (ADEs), (2) predicting the ADRs of drugs, and (3) other drug discovery tasks. The results indicate that this method can offer new directions for research in the field of ADRs.
Collapse
Affiliation(s)
| | | | | | - Bin Peng
- College of Public Health, Chongqing Medical University, Chongqing 401331, China; (J.Y.); (Z.H.); (L.Z.)
| |
Collapse
|
41
|
Cui Z, Ma R, Yang CH, Malpani A, Chu TN, Ghazi A, Davis JW, Miles BJ, Lau C, Liu Y, Hung AJ. Capturing relationships between suturing sub-skills to improve automatic suturing assessment. NPJ Digit Med 2024; 7:152. [PMID: 38862627 PMCID: PMC11167055 DOI: 10.1038/s41746-024-01143-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 05/22/2024] [Indexed: 06/13/2024] Open
Abstract
Suturing skill scores have demonstrated strong predictive capabilities for patient functional recovery. The suturing can be broken down into several substep components, including needle repositioning, needle entry angle, etc. Artificial intelligence (AI) systems have been explored to automate suturing skill scoring. Traditional approaches to skill assessment typically focus on evaluating individual sub-skills required for particular substeps in isolation. However, surgical procedures require the integration and coordination of multiple sub-skills to achieve successful outcomes. Significant associations among the technical sub-skill have been established by existing studies. In this paper, we propose a framework for joint skill assessment that takes into account the interconnected nature of sub-skills required in surgery. The prior known relationships among sub-skills are firstly identified. Our proposed AI system is then empowered by the prior known relationships to perform the suturing skill scoring for each sub-skill domain simultaneously. Our approach can effectively improve skill assessment performance through the prior known relationships among sub-skills. Through the proposed approach to joint skill assessment, we aspire to enhance the evaluation of surgical proficiency and ultimately improve patient outcomes in surgery.
Collapse
Affiliation(s)
- Zijun Cui
- University of Southern California, Los Angeles, CA, USA
| | - Runzhuo Ma
- Department of Urology, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Cherine H Yang
- Department of Urology, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | | | - Timothy N Chu
- University of Southern California, Los Angeles, CA, USA
| | - Ahmed Ghazi
- Johns Hopkins University, Baltimore, MD, USA
| | - John W Davis
- University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | | | | | - Yan Liu
- University of Southern California, Los Angeles, CA, USA
| | - Andrew J Hung
- Department of Urology, Cedars-Sinai Medical Center, Los Angeles, CA, USA.
| |
Collapse
|
42
|
Ru J, Zhu Z, Shi J. Spatial and geometric learning for classification of breast tumors from multi-center ultrasound images: a hybrid learning approach. BMC Med Imaging 2024; 24:133. [PMID: 38840240 PMCID: PMC11155188 DOI: 10.1186/s12880-024-01307-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Accepted: 05/27/2024] [Indexed: 06/07/2024] Open
Abstract
BACKGROUND Breast cancer is the most common cancer among women, and ultrasound is a usual tool for early screening. Nowadays, deep learning technique is applied as an auxiliary tool to provide the predictive results for doctors to decide whether to make further examinations or treatments. This study aimed to develop a hybrid learning approach for breast ultrasound classification by extracting more potential features from local and multi-center ultrasound data. METHODS We proposed a hybrid learning approach to classify the breast tumors into benign and malignant. Three multi-center datasets (BUSI, BUS, OASBUD) were used to pretrain a model by federated learning, then every dataset was fine-tuned at local. The proposed model consisted of a convolutional neural network (CNN) and a graph neural network (GNN), aiming to extract features from images at a spatial level and from graphs at a geometric level. The input images are small-sized and free from pixel-level labels, and the input graphs are generated automatically in an unsupervised manner, which saves the costs of labor and memory space. RESULTS The classification AUCROC of our proposed method is 0.911, 0.871 and 0.767 for BUSI, BUS and OASBUD. The balanced accuracy is 87.6%, 85.2% and 61.4% respectively. The results show that our method outperforms conventional methods. CONCLUSIONS Our hybrid approach can learn the inter-feature among multi-center data and the intra-feature of local data. It shows potential in aiding doctors for breast tumor classification in ultrasound at an early stage.
Collapse
Affiliation(s)
- Jintao Ru
- Department of Medical Engineering, Shaoxing Hospital of Traditional Chinese Medicine, Shaoxing, Zhejiang, People's Republic of China.
| | - Zili Zhu
- Department of Radiology, The First Affiliated Hospital of Ningbo University, Ningbo, Zhejiang, People's Republic of China
| | - Jialin Shi
- Rehabilitation Medicine Institute, Zhejiang Rehabilitation Medical Center, Hangzhou, Zhejiang, People's Republic of China
| |
Collapse
|
43
|
Qian X, Ju B, Shen P, Yang K, Li L, Liu Q. Meta Learning with Attention Based FP-GNNs for Few-Shot Molecular Property Prediction. ACS OMEGA 2024; 9:23940-23948. [PMID: 38854580 PMCID: PMC11154901 DOI: 10.1021/acsomega.4c02147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 05/09/2024] [Accepted: 05/14/2024] [Indexed: 06/11/2024]
Abstract
Molecular property prediction holds significant importance in drug discovery, enabling the identification of biologically active compounds with favorable drug-like properties. However, the low data problem, arising from the scarcity of labeled data in drug discovery, poses a substantial obstacle for accurate predictions. To address this challenge, we introduce a novel architecture, AttFPGNN-MAML, for few-shot molecular property prediction. The proposed approach incorporates a hybrid feature representation to enrich molecular representations and model intermolecular relationships specific to the task. By leveraging ProtoMAML, a meta-learning strategy, our model is trained and adapted to new tasks. Evaluation on two few-shot data sets, MoleculeNet and FS-Mol, demonstrates our method's superior performance in three out of four tasks and across various support set sizes. These results convincingly validate the effectiveness of our method in the realm of few-shot molecular property prediction. The source code is publicly available at https://github.com/sanomics-lab/AttFPGNN-MAML.
Collapse
Affiliation(s)
- Xiaoliang Qian
- Translational
Medical Center for Stem Cell Therapy and Institute for Regenerative
Medicine, Shanghai East Hospital, Frontier Science Center for Stem
Cell Research, Bioinformatics Department, School of Life Sciences
and Technology, Tongji University, Shanghai 200092, China
- SanOmics
AI Co., Ltd., Hangzhou 311103, China
| | - Bin Ju
- SanOmics
AI Co., Ltd., Hangzhou 311103, China
- State
Key Laboratory for Diagnosis and Treatment of Infectious Diseases,
National Clinical Research Center for Infectious Diseases, Collaborative
Innovation Center for Diagnosis and Treatment of Infectious Diseases,
The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou 310009, China
| | - Ping Shen
- State
Key Laboratory for Diagnosis and Treatment of Infectious Diseases,
National Clinical Research Center for Infectious Diseases, Collaborative
Innovation Center for Diagnosis and Treatment of Infectious Diseases,
The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou 310009, China
| | - Keda Yang
- Shulan
International Medical College, Zhejiang
Shuren University, Hangzhou 310015, China
| | - Li Li
- Department
of Hepatobiliary Surgery, The First People’s
Hospital of Kunming, Kunming 650034, China
| | - Qi Liu
- Translational
Medical Center for Stem Cell Therapy and Institute for Regenerative
Medicine, Shanghai East Hospital, Frontier Science Center for Stem
Cell Research, Bioinformatics Department, School of Life Sciences
and Technology, Tongji University, Shanghai 200092, China
- Key
Laboratory
of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University),
Ministry of Education, Orthopaedic Department of Tongji Hospital,
Frontier Science Center for Stem Cell Research, Bioinformatics Department,
School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
- Shanghai
Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China
| |
Collapse
|
44
|
Zhang R, Lin Y, Wu Y, Deng L, Zhang H, Liao M, Peng Y. MvMRL: a multi-view molecular representation learning method for molecular property prediction. Brief Bioinform 2024; 25:bbae298. [PMID: 38920342 PMCID: PMC11200189 DOI: 10.1093/bib/bbae298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 05/09/2024] [Accepted: 06/07/2024] [Indexed: 06/27/2024] Open
Abstract
Effective molecular representation learning is very important for Artificial Intelligence-driven Drug Design because it affects the accuracy and efficiency of molecular property prediction and other molecular modeling relevant tasks. However, previous molecular representation learning studies often suffer from limitations, such as over-reliance on a single molecular representation, failure to fully capture both local and global information in molecular structure, and ineffective integration of multiscale features from different molecular representations. These limitations restrict the complete and accurate representation of molecular structure and properties, ultimately impacting the accuracy of predicting molecular properties. To this end, we propose a novel multi-view molecular representation learning method called MvMRL, which can incorporate feature information from multiple molecular representations and capture both local and global information from different views well, thus improving molecular property prediction. Specifically, MvMRL consists of four parts: a multiscale CNN-SE Simplified Molecular Input Line Entry System (SMILES) learning component and a multiscale Graph Neural Network encoder to extract local feature information and global feature information from the SMILES view and the molecular graph view, respectively; a Multi-Layer Perceptron network to capture complex non-linear relationship features from the molecular fingerprint view; and a dual cross-attention component to fuse feature information on the multi-views deeply for predicting molecular properties. We evaluate the performance of MvMRL on 11 benchmark datasets, and experimental results show that MvMRL outperforms state-of-the-art methods, indicating its rationality and effectiveness in molecular property prediction. The source code of MvMRL was released in https://github.com/jedison-github/MvMRL.
Collapse
Affiliation(s)
- Ru Zhang
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Nanning Normal University, No. 175, Mingxiu East Road, Xixiang Tang District, Nanning 530001, China
| | - Yanmei Lin
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Nanning Normal University, No. 175, Mingxiu East Road, Xixiang Tang District, Nanning 530001, China
- Center for Applied Mathematics of Guangxi, Nanning Normal University, 508 Xinning Road, Wuming District, Nanning 530100, China
| | - Yijia Wu
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Nanning Normal University, No. 175, Mingxiu East Road, Xixiang Tang District, Nanning 530001, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, 932 Lushan South Road, Changsha 410083, China
| | - Hao Zhang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen 518000, China
| | - Mingzhi Liao
- Center of Bioinformatics, College of Life Sciences, Northwest A&F University, 3 Taicheng Road, Yangling, Shaanxi 712100, China
| | - Yuzhong Peng
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Nanning Normal University, No. 175, Mingxiu East Road, Xixiang Tang District, Nanning 530001, China
- Guangxi Academy of Sciences, 174 East University Road, Nanning 530007, China
| |
Collapse
|
45
|
Pang Y, Chen Y, Lin M, Zhang Y, Zhang J, Wang L. MMSyn: A New Multimodal Deep Learning Framework for Enhanced Prediction of Synergistic Drug Combinations. J Chem Inf Model 2024; 64:3689-3705. [PMID: 38676916 DOI: 10.1021/acs.jcim.4c00165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2024]
Abstract
Combination therapy is a promising strategy for the successful treatment of cancer. The large number of possible combinations, however, mean that it is laborious and expensive to screen for synergistic drug combinations in vitro. Nevertheless, because of the availability of high-throughput screening data and advances in computational techniques, deep learning (DL) can be a useful tool for the prediction of synergistic drug combinations. In this study, we proposed a multimodal DL framework, MMSyn, for the prediction of synergistic drug combinations. First, features embedded in the drug molecules were extracted: structure, fingerprint, and string encoding. Then, gene expression data, DNA copy number, and pathway activity were used to describe cancer cell lines. Finally, these processed features were integrated using an attention mechanism and an interaction module and then input into a multilayer perceptron to predict drug synergy. Experimental results showed that our method outperformed five state-of-the-art DL methods and three traditional machine learning models for drug combination prediction. We verified that MMSyn achieved superior performance in stratified cross-validation settings using both the drug combination and cell line data. Moreover, we performed a set of ablation experiments to illustrate the effectiveness of each component and the efficacy of our model. In addition, our visual representation and case studies further confirmed the effectiveness of our model. All results showed that MMSyn can be used as a powerful tool for the prediction of synergistic drug combinations.
Collapse
Affiliation(s)
- Yu Pang
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Yihao Chen
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Mujie Lin
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Yanhong Zhang
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Jiquan Zhang
- Guizhou Provincial Engineering Technology Research Center for Chemical Drug R&D, College of Pharmacy, Guizhou Medical University, Guiyang 550025, P. R. China
| | - Ling Wang
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| |
Collapse
|
46
|
Yao R, Shen Z, Xu X, Ling G, Xiang R, Song T, Zhai F, Zhai Y. Knowledge mapping of graph neural networks for drug discovery: a bibliometric and visualized analysis. Front Pharmacol 2024; 15:1393415. [PMID: 38799167 PMCID: PMC11116974 DOI: 10.3389/fphar.2024.1393415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 04/12/2024] [Indexed: 05/29/2024] Open
Abstract
Introduction In recent years, graph neural network has been extensively applied to drug discovery research. Although researchers have made significant progress in this field, there is less research on bibliometrics. The purpose of this study is to conduct a comprehensive bibliometric analysis of graph neural network applications in drug discovery in order to identify current research hotspots and trends, as well as serve as a reference for future research. Methods Publications from 2017 to 2023 about the application of graph neural network in drug discovery were collected from the Web of Science Core Collection. Bibliometrix, VOSviewer, and Citespace were mainly used for bibliometric studies. Results and Discussion In this paper, a total of 652 papers from 48 countries/regions were included. Research interest in this field is continuously increasing. China and the United States have a significant advantage in terms of funding, the number of publications, and collaborations with other institutions and countries. Although some cooperation networks have been formed in this field, extensive worldwide cooperation still needs to be strengthened. The results of the keyword analysis clarified that graph neural network has primarily been applied to drug-target interaction, drug repurposing, and drug-drug interaction, while graph convolutional neural network and its related optimization methods are currently the core algorithms in this field. Data availability and ethical supervision, balancing computing resources, and developing novel graph neural network models with better interpretability are the key technical issues currently faced. This paper analyzes the current state, hot spots, and trends of graph neural network applications in drug discovery through bibliometric approaches, as well as the current issues and challenges in this field. These findings provide researchers with valuable insights on the current status and future directions of this field.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Fei Zhai
- Faculty of Medical Device, Shenyang Pharmaceutical University, Shenyang, China
| | - Yuxuan Zhai
- Faculty of Medical Device, Shenyang Pharmaceutical University, Shenyang, China
| |
Collapse
|
47
|
Zhang X, Sheng Y, Liu X, Yang J, Goddard Iii WA, Ye C, Zhang W. Polymer-Unit Graph: Advancing Interpretability in Graph Neural Network Machine Learning for Organic Polymer Semiconductor Materials. J Chem Theory Comput 2024; 20:2908-2920. [PMID: 38551455 DOI: 10.1021/acs.jctc.3c01385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/10/2024]
Abstract
The graph representation of complex materials plays a crucial role in the field of inorganic and organic materials investigations for developing data-centric materials science, such as those using graph neural networks (GNNs). However, the currently prevalent GNN models are primarily employed for investigating periodic crystals and organic small molecule data, yet they still encounter challenges in terms of interpretability and computational efficiency when applied to polymer monomers and organic macromolecules data. There is still a lack of graph representation of organic polymers and macromolecules specifically tailored for GNN models to explore the structural characteristics. The Polymer-unit Graph, a novel coarse-grained graph representation method introduced in study, is dedicated to expressing and analyzing polymers and macromolecules. By incorporating the Polymer-unit Graph into the GNN models and analyzing the organic semiconductor (OSC) materials database, it becomes possible to uncover intricate structure-property relationships involving branched-chain engineering, fluoridation substitution, and donor-acceptor combination effects on the elementary structure of OSC polymers. Furthermore, the Polymer-unit Graph enables visualizing the relationship between target properties and polymer units while reducing training time by an impressive 98% and minimizing molecular graph representation models. In conclusion, the Polymer-unit Graph successfully integrates the concept of Polymer-unit into the field of GNNs, enabling more accurate analysis and understanding of organic polymers and macromolecules.
Collapse
Affiliation(s)
- Xinyue Zhang
- Department of Materials Science and Engineering & Guangdong Provincial Key Laboratory of Computational Science and Material Design, Southern University of Science and Technology, Shenzhen 518055, PR China
| | - Ye Sheng
- Department of Materials Science and Engineering & Guangdong Provincial Key Laboratory of Computational Science and Material Design, Southern University of Science and Technology, Shenzhen 518055, PR China
| | - Xiumin Liu
- Department of Materials Science and Engineering & Guangdong Provincial Key Laboratory of Computational Science and Material Design, Southern University of Science and Technology, Shenzhen 518055, PR China
- Key Laboratory of Soft Chemistry and Functional Materials of MOE, School of Chemistry and Chemical Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China
| | - Jiong Yang
- Materials Genome Institute, Shanghai University, Shanghai 200444, PR China
| | - William A Goddard Iii
- Materials and Process Simulation Center (MSC), California Institute of Technology, Pasadena, California 91125, United States
| | - Caichao Ye
- Department of Materials Science and Engineering & Guangdong Provincial Key Laboratory of Computational Science and Material Design, Southern University of Science and Technology, Shenzhen 518055, PR China
- Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology, Shenzhen 518055, PR China
| | - Wenqing Zhang
- Department of Materials Science and Engineering & Guangdong Provincial Key Laboratory of Computational Science and Material Design, Southern University of Science and Technology, Shenzhen 518055, PR China
| |
Collapse
|
48
|
Wu K, Yang X, Wang Z, Li N, Zhang J, Liu L. Data-balanced transformer for accelerated ionizable lipid nanoparticles screening in mRNA delivery. Brief Bioinform 2024; 25:bbae186. [PMID: 38670158 PMCID: PMC11052633 DOI: 10.1093/bib/bbae186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 02/26/2024] [Accepted: 04/05/2024] [Indexed: 04/28/2024] Open
Abstract
Despite the widespread use of ionizable lipid nanoparticles (LNPs) in clinical applications for messenger RNA (mRNA) delivery, the mRNA drug delivery system faces an efficient challenge in the screening of LNPs. Traditional screening methods often require a substantial amount of experimental time and incur high research and development costs. To accelerate the early development stage of LNPs, we propose TransLNP, a transformer-based transfection prediction model designed to aid in the selection of LNPs for mRNA drug delivery systems. TransLNP uses two types of molecular information to perceive the relationship between structure and transfection efficiency: coarse-grained atomic sequence information and fine-grained atomic spatial relationship information. Due to the scarcity of existing LNPs experimental data, we find that pretraining the molecular model is crucial for better understanding the task of predicting LNPs properties, which is achieved through reconstructing atomic 3D coordinates and masking atom predictions. In addition, the issue of data imbalance is particularly prominent in the real-world exploration of LNPs. We introduce the BalMol block to solve this problem by smoothing the distribution of labels and molecular features. Our approach outperforms state-of-the-art works in transfection property prediction under both random and scaffold data splitting. Additionally, we establish a relationship between molecular structural similarity and transfection differences, selecting 4267 pairs of molecular transfection cliffs, which are pairs of molecules that exhibit high structural similarity but significant differences in transfection efficiency, thereby revealing the primary source of prediction errors. The code, model and data are made publicly available at https://github.com/wklix/TransLNP.
Collapse
Affiliation(s)
- Kun Wu
- Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiulong Yang
- Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zixu Wang
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| | - Na Li
- National Facility for Protein Science in Shanghai, Zhangjiang Laboratory, Shanghai Advanced Research Institute, Chinese Academy of Sciences
| | - Jialu Zhang
- Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lizhuang Liu
- Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
49
|
Yang Z, Wang L, Yang Y, Pang X, Sun Y, Liang Y, Cao H. Screening of the Antagonistic Activity of Potential Bisphenol A Alternatives toward the Androgen Receptor Using Machine Learning and Molecular Dynamics Simulation. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:2817-2829. [PMID: 38291630 DOI: 10.1021/acs.est.3c09779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Over the past few decades, extensive research has indicated that exposure to bisphenol A (BPA) increases the health risks in humans. Toxicological studies have demonstrated that BPA can bind to the androgen receptor (AR), resulting in endocrine-disrupting effects. In recent investigations, many alternatives to BPA have been detected in various environmental media as major pollutants. However, related experimental evaluations of BPA alternatives have not been systematically implemented for the assessment of chemical safety and the effects of structural characteristics on the antagonistic activity of the AR. To promote the green development of BPA alternatives, high-throughput toxicological screening is fundamental for prioritizing chemical tests. Therefore, we proposed a hybrid deep learning architecture that combines molecular descriptors and molecular graphs to predict AR antagonistic activity. Compared to previous models, this hybrid architecture can extract substantial chemical information from various molecular representations to improve the model's generalization ability for BPA alternatives. Our predictions suggest that lignin-derivable bisguaiacols, as alternatives to BPA, are likely to be nonantagonist for AR compared to bisphenol analogues. Additionally, molecular dynamics (MD) simulations identified the dihydrotestosterone-bound pocket, rather than the surface, as the major binding site of bisphenol analogues. The conformational changes of key helix H12 from an agonistic to an antagonistic conformation can be evaluated qualitatively by accelerated MD simulations to explain the underlying mechanism. Overall, our computational study is helpful for toxicological screening of BPA alternatives and the design of environmentally friendly BPA alternatives.
Collapse
Affiliation(s)
- Zeguo Yang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Ling Wang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Ying Yang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Xudi Pang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Yuzhen Sun
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Yong Liang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Huiming Cao
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| |
Collapse
|
50
|
Ma M, Lei X. A deep learning framework for predicting molecular property based on multi-type features fusion. Comput Biol Med 2024; 169:107911. [PMID: 38160501 DOI: 10.1016/j.compbiomed.2023.107911] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 12/18/2023] [Accepted: 12/24/2023] [Indexed: 01/03/2024]
Abstract
Extracting expressive molecular features is essential for molecular property prediction. Sequence-based representation is a common representation of molecules, which ignores the structure information of molecules. While molecular graph representation has a weak ability in expressing the 3D structure. In this article, we try to make use of the advantages of different type representations simultaneously for molecular property prediction. Thus, we propose a fusion model named DLF-MFF, which integrates the multi-type molecular features. Specifically, we first extract four different types of features from molecular fingerprints, 2D molecular graph, 3D molecular graph and molecular image. Then, in order to learn molecular features individually, we use four essential deep learning frameworks, which correspond to four distinct molecular representations. The final molecular representation is created by integrating the four feature vectors and feeding them into prediction layer to predict molecular property. We compare DLF-MFF with 7 state-of-the-art methods on 6 benchmark datasets consisting of multiple molecular properties, the experimental results show that DLF-MFF achieves state-of-the-art performance on 6 benchmark datasets. Moreover, DLF-MFF is applied to identify potential anti-SARS-CoV-2 inhibitor from 2500 drugs. We predict probability of each drug being inferred as a 3CL protease inhibitor and also calculate the binding affinity scores between each drug and 3CL protease. The results show that DLF-MFF product better performance in the identification of anti-SARS-CoV-2 inhibitor. This work is expected to offer novel research perspectives for accurate prediction of molecular properties and provide valuable insights into drug repurposing for COVID-19.
Collapse
Affiliation(s)
- Mei Ma
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China; School of Mathematics and Statistics, Qinghai Normal University, Qinghai, 810000, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China.
| |
Collapse
|