1
|
Gao X, Yan M, Zhang C, Wu G, Shang J, Zhang C, Yang K. MDNN-DTA: a multimodal deep neural network for drug-target affinity prediction. Front Genet 2025; 16:1527300. [PMID: 40182923 PMCID: PMC11965683 DOI: 10.3389/fgene.2025.1527300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2024] [Accepted: 02/24/2025] [Indexed: 04/05/2025] Open
Abstract
Determining drug-target affinity (DTA) is a pivotal step in drug discovery, where in silico methods can significantly improve efficiency and reduce costs. Artificial intelligence (AI), especially deep learning models, can automatically extract high-dimensional features from the biological sequences of drug molecules and target proteins. This technology demonstrates lower complexity in DTA prediction compared to traditional experimental methods, particularly when handling large-scale data. In this study, we introduce a multimodal deep neural network model for DTA prediction, referred to as MDNN-DTA. This model employs Graph Convolutional Networks (GCN) and Convolutional Neural Networks (CNN) to extract features from the drug and protein sequences, respectively. One notable strength of our method is its ability to accurately predict DTA directly from the sequences of the target proteins, obviating the need for protein 3D structures, which are frequently unavailable in drug discovery. To comprehensively extract features from the protein sequence, we leverage an ESM pre-trained model for extracting biochemical features and design a specific Protein Feature Extraction (PFE) block for capturing both global and local features of the protein sequence. Furthermore, a Protein Feature Fusion (PFF) Block is engineered to augment the integration of multi-scale protein features derived from the abovementioned techniques. We then compare MDNN-DTA with other models on the same dataset, conducting a series of ablation experiments to assess the performance and efficacy of each component. The results highlight the advantages and effectiveness of the MDNN-DTA method.
Collapse
Affiliation(s)
- Xu Gao
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China
- National Supercomputing Center in Zhengzhou, Zhengzhou, China
| | - Mengfan Yan
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China
- National Supercomputing Center in Zhengzhou, Zhengzhou, China
| | - Chengwei Zhang
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China
- National Supercomputing Center in Zhengzhou, Zhengzhou, China
| | - Gang Wu
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China
- National Supercomputing Center in Zhengzhou, Zhengzhou, China
| | - Jiandong Shang
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China
- National Supercomputing Center in Zhengzhou, Zhengzhou, China
| | - Congxiang Zhang
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China
- National Supercomputing Center in Zhengzhou, Zhengzhou, China
| | - Kecheng Yang
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China
- National Supercomputing Center in Zhengzhou, Zhengzhou, China
| |
Collapse
|
2
|
Zhao PC, Wei XX, Wang Q, Wang QH, Li JN, Shang J, Lu C, Shi JY. Single-step retrosynthesis prediction via multitask graph representation learning. Nat Commun 2025; 16:814. [PMID: 39827189 PMCID: PMC11742932 DOI: 10.1038/s41467-025-56062-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 01/08/2025] [Indexed: 01/22/2025] Open
Abstract
Inferring appropriate synthesis reaction (i.e., retrosynthesis) routes for newly designed molecules is vital. Recently, computational methods have produced promising single-step retrosynthesis predictions. However, template-based methods are limited by the known synthesis templates; template-free methods are weakly interpretable; and semi template-based methods are deficient with regard to utilizing the associations between chemical entities. To address these issues, this paper leverages the intra-associations between synthons, the inter-associations between synthons and leaving groups (LGs), and the intra-associations between LGs. It develops a multitask graph representation learning model for single-step retrosynthesis prediction (Retro-MTGR) to solve reaction centre deduction and LG identification simultaneously. A comparison with 16 state-of-the-art methods first demonstrates the superiority of Retro-MTGR. Then, its robustness and scalability and the contributions of its crucial components are validated. More importantly, it can determine whether a bond can be a reaction centre and what LGs are appropriate for a given synthon, respectively. The answers reflect underlying chemical synthesis rules, especially opposite electrical properties between chemical entities (e.g., reaction sites, synthons, and LGs). Finally, case studies demonstrate that the retrosynthesis routes inferred by Retro-MTGR are promising for single-step synthesis reactions. The code and data of this study are freely available at https://doi.org/10.5281/zenodo.14346324 .
Collapse
Affiliation(s)
- Peng-Cheng Zhao
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Xue-Xin Wei
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Qiong Wang
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Qi-Hao Wang
- School of Chemistry and Chemical Engineering, Northwestern Polytechnical University, Xi'an, China
| | - Jia-Ning Li
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Jie Shang
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China.
| | - Cheng Lu
- Institute of Basic Research in Clinical Medicine China Academy of Chinese Medical Sciences, Beijing, China.
| | - Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China.
| |
Collapse
|
3
|
Zhu M, Xiao Z, Zhang T, Lu G. Construction of interpretable ensemble learning models for predicting bioaccumulation parameters of organic chemicals in fish. JOURNAL OF HAZARDOUS MATERIALS 2025; 482:136606. [PMID: 39579709 DOI: 10.1016/j.jhazmat.2024.136606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Revised: 11/14/2024] [Accepted: 11/19/2024] [Indexed: 11/25/2024]
Abstract
Accurate prediction of bioaccumulation parameters is essential for assessing exposure, hazards, and risks of chemicals. However, the majority of prediction models on bioaccumulation parameters are individual models based on a single algorithm and lack model interpretation, resulting in unsatisfactory prediction accuracy due to inherent constraints of the algorithm and weak interpretability. Ensemble learning (EL) that combine multiple algorithms, coupled with SHapley Additive exPlanation (SHAP) method, may overcome the limitations. Herein, EL models were constructed for three bioaccumulation parameters using datasets covering 2496 chemicals. The EL models demonstrated superior prediction accuracy compared to both individual models developed in this study and those from previous research, achieving a coefficient of determination of up to 0.861 on the validation sets. Applicability domains were characterized using a structure-activity landscape-based (abbreviated as ADSAL) methodology. The optimal EL models, together with the ADSAL, were successfully used to predict bioaccumulation parameters for 4374 chemicals included in the Inventory of Existing Chemical Substances of China. Model interpretation using the SHAP method offered insight into key features influencing bioaccumulation potential, including hydrophobicity, water solubility, polarizability, ionization potential, weight, and volume of molecules. Overall, the study provides data and models to support the sound management and risk assessment of chemicals.
Collapse
Affiliation(s)
- Minghua Zhu
- Key Laboratory of Integrated Regulation and Resources Development of Shallow Lakes of Ministry of Education, Hohai University, Nanjing 210098, China; College of Environment, Hohai University, Nanjing 210098, China
| | - Zijun Xiao
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Tao Zhang
- State Key Laboratory of Urban Water Resources and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, China
| | - Guanghua Lu
- Key Laboratory of Integrated Regulation and Resources Development of Shallow Lakes of Ministry of Education, Hohai University, Nanjing 210098, China; College of Environment, Hohai University, Nanjing 210098, China.
| |
Collapse
|
4
|
Yu X, Chen Y, Chen L, Li W, Wang Y, Tang Y, Liu G. GCLmf: A Novel Molecular Graph Contrastive Learning Framework Based on Hard Negatives and Application in Toxicity Prediction. Mol Inform 2025; 44:e202400169. [PMID: 39421969 DOI: 10.1002/minf.202400169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 09/23/2024] [Accepted: 09/24/2024] [Indexed: 10/19/2024]
Abstract
In silico methods for prediction of chemical toxicity can decrease the cost and increase the efficiency in the early stage of drug discovery. However, due to low accessibility of sufficient and reliable toxicity data, constructing robust and accurate prediction models is challenging. Contrastive learning, a type of self-supervised learning, leverages large unlabeled data to obtain more expressive molecular representations, which can boost the prediction performance on downstream tasks. While molecular graph contrastive learning has gathered growing attentions, current models neglect the quality of negative data set. Here, we proposed a self-supervised pretraining deep learning framework named GCLmf. We first utilized molecular fragments that meet specific conditions as hard negative samples to boost the quality of the negative set and thus increase the difficulty of the proxy tasks during pre-training to learn informative representations. GCLmf has shown excellent predictive power on various molecular property benchmarks and demonstrates high performance in 33 toxicity tasks in comparison with multiple baselines. In addition, we further investigated the necessity of introducing hard negatives in model building and the impact of the proportion of hard negatives on the model.
Collapse
Affiliation(s)
- Xinxin Yu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| | - Yuanting Chen
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| | - Long Chen
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| | - Yuhao Wang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| |
Collapse
|
5
|
Peng J, Fu L, Yang G, Cao D. Advanced AI-Driven Prediction of Pregnancy-Related Adverse Drug Reactions. J Chem Inf Model 2024; 64:9286-9298. [PMID: 39611337 DOI: 10.1021/acs.jcim.4c01657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2024]
Abstract
Ensuring drug safety during pregnancy is critical due to the potential risks to both the mother and fetus. However, the exclusion of pregnant women from clinical trials complicates the assessment of adverse drug reactions (ADRs) in this population. This study aimed to develop and validate risk prediction models for pregnancy-related ADRs of drugs using advanced Machine Learning (ML) and Deep Learning (DL) techniques, leveraging real-world data from the FDA Adverse Event Reporting System. We explored three methods─Information Component, Reporting Odds Ratio, and 95% confidence interval of ROR─for classifying drugs into high-risk and low-risk categories. DL models, including Directed Message Passing Neural Networks (DMPNN), Graph Neural Networks, and Graph Convolutional Networks, were developed and compared to traditional ML models like Random Forest, Support Vector Machines, and XGBoost. Among these, the DMPNN model, which integrated molecular graph information and molecular descriptors, exhibited the highest predictive performance, particularly at the preferred term level. The model was validated against external data sets from SIDER and DailyMed, demonstrating strong generalizability. Additionally, the model was applied to assess the risk of 22 oral hypoglycemic drugs, and potential substructure alerts for pregnancy-related ADRs were identified. These findings suggest that the DMPNN model is a valuable tool for predicting ADRs in pregnant women, offering significant advancement in drug safety assessment and providing crucial insights for safer medication use during pregnancy.
Collapse
Affiliation(s)
- Jinfu Peng
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172 Tongzipo Road, Changsha 410031, Hunan, China
| | - Li Fu
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172 Tongzipo Road, Changsha 410031, Hunan, China
| | - Guoping Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172 Tongzipo Road, Changsha 410031, Hunan, China
- The Third Xiangya Hospital, Central South University, No. 138 Tongzipo Road, Changsha 410031, Hunan, China
| | - Dongshen Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172 Tongzipo Road, Changsha 410031, Hunan, China
| |
Collapse
|
6
|
Liu Q, He D, Fan M, Wang J, Cui Z, Wang H, Mi Y, Li N, Meng Q, Hou Y. Prediction and Interpretation Microglia Cytotoxicity by Machine Learning. J Chem Inf Model 2024; 64:9306-9326. [PMID: 38949724 DOI: 10.1021/acs.jcim.4c00366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Ameliorating microglia-mediated neuroinflammation is a crucial strategy in developing new drugs for neurodegenerative diseases. Plant compounds are an important screening target for the discovery of drugs for the treatment of neurodegenerative diseases. However, due to the spatial complexity of phytochemicals, it becomes particularly important to evaluate the effectiveness of compounds while avoiding the mixing of cytotoxic substances in the early stages of compound screening. Traditional high-throughput screening methods suffer from high cost and low efficiency. A computational model based on machine learning provides a novel avenue for cytotoxicity determination. In this study, a microglia cytotoxicity classifier was developed using a machine learning approach. First, we proposed a data splitting strategy based on the molecule murcko generic scaffold, under this condition, three machine learning approaches were coupled with three kinds of molecular representation methods to construct microglia cytotoxicity classifier, which were then compared and assessed by the predictive accuracy, balanced accuracy, F1-score, and Matthews Correlation Coefficient. Then, the recursive feature elimination integrated with support vector machine (RFE-SVC) dimension reduction method was introduced to molecular fingerprints with high dimensions to further improve the model performance. Among all the microglial cytotoxicity classifiers, the SVM coupled with ECFP4 fingerprint after feature selection (ECFP4-RFE-SVM) obtained the most accurate classification for the test set (ACC of 0.99, BA of 0.99, F1-score of 0.99, MCC of 0.97). Finally, the Shapley additive explanations (SHAP) method was used in interpreting the microglia cytotoxicity classifier and key substructure smart identified as structural alerts. Experimental results show that ECFP4-RFE-SVM have reliable classification capability for microglia cytotoxicity, and SHAP can not only provide a rational explanation for microglia cytotoxicity predictions, but also offer a guideline for subsequent molecular cytotoxicity modifications.
Collapse
Affiliation(s)
- Qing Liu
- College of Information Science and Engineering, State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, P. R. China
| | - Dakuo He
- College of Information Science and Engineering, State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, P. R. China
| | - Mengmeng Fan
- College of Information Science and Engineering, State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, P. R. China
| | - Jinpeng Wang
- College of Information Science and Engineering, State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, P. R. China
| | - Zeyu Cui
- College of Information Science and Engineering, State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, P. R. China
| | - Hao Wang
- College of Information Science and Engineering, State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, P. R. China
| | - Yan Mi
- Key Laboratory of Bioresource Research and Development of Liaoning Province, College of Life and Health Sciences, National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Key Laboratory of Data Analytics and Optimization for Smart Industry, Ministry of Education, Northeastern University, Shenyang 110169, P. R. China
| | - Ning Li
- School of Traditional Chinese Materia Medica, Key Laboratory for TCM Material Basis Study and Innovative Drug Development of Shenyang City, Shenyang Pharmaceutical University, Shenyang 110016, P. R. China
| | - Qingqi Meng
- Key Laboratory of Bioresource Research and Development of Liaoning Province, College of Life and Health Sciences, National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Key Laboratory of Data Analytics and Optimization for Smart Industry, Ministry of Education, Northeastern University, Shenyang 110169, P. R. China
| | - Yue Hou
- Key Laboratory of Bioresource Research and Development of Liaoning Province, College of Life and Health Sciences, National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Key Laboratory of Data Analytics and Optimization for Smart Industry, Ministry of Education, Northeastern University, Shenyang 110169, P. R. China
| |
Collapse
|
7
|
Ren JN, Chen Q, Ye HYX, Cao C, Guo YM, Yang JR, Wang H, Khan MZI, Chen JZ. FGTN: Fragment-based graph transformer network for predicting reproductive toxicity. Arch Toxicol 2024; 98:4077-4092. [PMID: 39292235 DOI: 10.1007/s00204-024-03866-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Accepted: 09/10/2024] [Indexed: 09/19/2024]
Abstract
Reproductive toxicity is one of the important issues in chemical safety. Traditional laboratory testing methods are costly and time-consuming with raised ethical issues. Only a few in silico models have been reported to predict human reproductive toxicity, but none of them make full use of the topological information of compounds. In addition, most existing atom-based graph neural network methods focus on attributing model predictions to individual nodes or edges rather than chemically meaningful fragments or substructures. In current studies, we develop a novel fragment-based graph transformer network (FGTN) approach to generate the QSAR model of human reproductive toxicity by considering internal topological structure information of compounds. In the FGTN model, the compound is represented by a graph architecture using fragments to be nodes and bonds linking two fragments to be edges. A super molecule-level node is further proposed to connect all fragment nodes by undirected edges, obtaining global molecular features from fragment embeddings. The FGTN model achieved an accuracy (ACC) of 0.861 and an area under the receiver operating characteristic curve (AUC) value of 0.914 on nonredundant blind tests, outperforming traditional fingerprint-based machine learning models and atom-based GCN model. The FGTN model can attribute toxic predictions to fragments, generating specific structural alerts for the positive compound. Moreover, FGTN may also have the capability to distinguish various chemical isomers. We believe that FGTN can be used as a reliable and effective tool for human reproductive toxicity prediction in contribution to the advancement of chemical safety assessment.
Collapse
Affiliation(s)
- Jia-Nan Ren
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China
| | - Qiang Chen
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China
| | - Hong-Yu-Xiang Ye
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China
| | - Cheng Cao
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China
- Polytechnic Institute, Zhejiang University, 269 Shixiang Rd., Hangzhou, 310015, Zhejiang, China
| | - Ya-Min Guo
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China
| | - Jin-Rong Yang
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China
- Polytechnic Institute, Zhejiang University, 269 Shixiang Rd., Hangzhou, 310015, Zhejiang, China
| | - Hao Wang
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China
| | - Muhammad Zafar Irshad Khan
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China
| | - Jian-Zhong Chen
- College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058, Zhejiang, China.
| |
Collapse
|
8
|
Yang Y, Yang Z, Pang X, Cao H, Sun Y, Wang L, Zhou Z, Wang P, Liang Y, Wang Y. Molecular designing of potential environmentally friendly PFAS based on deep learning and generative models. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 953:176095. [PMID: 39245376 DOI: 10.1016/j.scitotenv.2024.176095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Revised: 09/03/2024] [Accepted: 09/04/2024] [Indexed: 09/10/2024]
Abstract
Perfluoroalkyl and polyfluoroalkyl substances (PFAS) are widely used across a spectrum of industrial and consumer goods. Nonetheless, their persistent nature and tendency to accumulate in biological systems pose substantial environmental and health threats. Consequently, striking a balance between maximizing product efficiency and minimizing environmental and health risks by tailoring the molecular structure of PFAS has become a pivotal challenge in the fields of environmental chemistry and sustainable development. To address this issue, a computational workflow was proposed for designing an environmentally friendly PFAS by incorporating deep learning (DL) and molecular generative models. The hybrid DL architecture MolHGT+ based on heterogeneous graph neural network with transformer-like attention was applied to predict the surface tension, bioaccumulation, and hepatotoxicity of the molecules. Through virtual screening of the PFAS master database using MolHGT+, the findings indicate that incorporating the siloxane group and betaine fragment can effectively decrease both the bioaccumulation and hepatotoxicity of PFAS while preserving low surface tension. In addition, molecular generative models were employed to create a structurally diverse pool of novel PFASs with the aforementioned hit molecules serving as the initial template structures. Overall, our study presents a promising AI-driven method for advancing the development of environmentally friendly PFAS.
Collapse
Affiliation(s)
- Ying Yang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Zeguo Yang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Xudi Pang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Huiming Cao
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China.
| | - Yuzhen Sun
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Ling Wang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Zhen Zhou
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Pu Wang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Yong Liang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China.
| | - Yawei Wang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China; State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| |
Collapse
|
9
|
Yang R, Zhou H, Wang F, Yang G. DigFrag as a digital fragmentation method used for artificial intelligence-based drug design. Commun Chem 2024; 7:258. [PMID: 39528759 PMCID: PMC11555370 DOI: 10.1038/s42004-024-01346-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Accepted: 10/29/2024] [Indexed: 11/16/2024] Open
Abstract
Fragment-Based Drug Design (FBDD) plays a pivotal role in the field of drug discovery and development. The construction of high-quality fragment libraries is a critical step in FBDD. Conventional fragmentation approaches often rely on rigid rules and chemical intuition, limiting their adaptability to diverse molecular structures. The rapid development of Artificial Intelligence (AI) technology offers a transformative opportunity to rethink traditional methods. Here, we present DigFrag, a digital fragmentation method that highlights important substructures by focusing locally within the molecular graph. In addition, we feed the fragments segmented by machine intelligence and human expertise into the deep generative model to compare the preference for data from different sources. Experimental results show that the structural diversity of fragments segmented by DigFrag is higher, and more desirable compounds are generated based on these fragments. These results also demonstrate that data generated based on AI methods may be more suitable for AI models. Moreover, a user-friendly platform called MolFrag ( https://dpai.ccnu.edu.cn/MolFrag/ ) is developed based on various fragmentation techniques to support molecular segmentation.
Collapse
Affiliation(s)
- Ruoqi Yang
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, China
| | - Hao Zhou
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, China
| | - Fan Wang
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, China.
| | - Guangfu Yang
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, China.
| |
Collapse
|
10
|
Wang G, Feng H, Du M, Feng Y, Cao C. Multimodal Representation Learning via Graph Isomorphism Network for Toxicity Multitask Learning. J Chem Inf Model 2024; 64:8322-8338. [PMID: 39432821 DOI: 10.1021/acs.jcim.4c01061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2024]
Abstract
Toxicity is paramount for comprehending compound properties, particularly in the early stages of drug design. Due to the diversity and complexity of toxic effects, it became a challenge to compute compound toxicity tasks. To address this issue, we propose a multimodal representation learning model, termed multimodal graph isomorphism network (MMGIN), to address this challenge for compound toxicity multitask learning. Based on fingerprints and molecular graphs of compounds, our MMGIN model incorporates a multimodal representation learning model to acquire a comprehensive compound representation. This model adopts a two-channel structure to independently learn fingerprint representation and molecular graph representation. Subsequently, two feedforward neural networks utilize the learned multimodal compound representation to perform multitask learning, encompassing compound toxicity classification and multiple compound category classification simultaneously. To test the effectiveness of our model, we constructed a novel data set, termed the compound toxicity multitask learning (CTMTL) data set, derived from the TOXRIC data set. We compare our MMGIN model with other representative machine learning and deep learning models on the CTMTL and Tox21 data sets. The experimental results demonstrate significant advancements achieved by our MMGIN model. Furthermore, the ablation study underscores the effectiveness of the introduced fingerprints, molecular graphs, the multimodal representation learning model, and the multitask learning model, showcasing the model's superior predictive capability and robustness.
Collapse
Affiliation(s)
- Guishen Wang
- School of Computer Science and Engineering, Changchun University of Technology, North Yuanda Street No. 3000, Changchun, 130012 Jilin, China
| | - Hui Feng
- School of Computer Science and Engineering, Changchun University of Technology, North Yuanda Street No. 3000, Changchun, 130012 Jilin, China
| | - Mengyan Du
- School of Computer Science and Engineering, Changchun University of Technology, North Yuanda Street No. 3000, Changchun, 130012 Jilin, China
| | - Yuncong Feng
- School of Computer Science and Engineering, Changchun University of Technology, North Yuanda Street No. 3000, Changchun, 130012 Jilin, China
| | - Chen Cao
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Longmian Avenue No. 101, Nanjing, 211166 Jiangsu, China
| |
Collapse
|
11
|
Wang N, Li X, Xiao J, Liu S, Cao D. Data-driven toxicity prediction in drug discovery: Current status and future directions. Drug Discov Today 2024; 29:104195. [PMID: 39357621 DOI: 10.1016/j.drudis.2024.104195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 09/13/2024] [Accepted: 09/26/2024] [Indexed: 10/04/2024]
Abstract
Early toxicity assessment plays a vital role in the drug discovery process on account of its significant influence on the attrition rate of candidates. Recently, constant upgrading of information technology has greatly promoted the continuous development of toxicity prediction. To give an overview of the current state of data-driven toxicity prediction, we reviewed relevant studies and summarized them in three main respects: the features and difficulties of toxicity prediction, the evolution of modeling approaches, and the available tools for toxicity prediction. For each part, we expound the research status, existing challenges, and feasible solutions. Finally, several new directions and suggestions for toxicity prediction are also put forward.
Collapse
Affiliation(s)
- Ningning Wang
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; The Hunan Institute of Pharmacy Practice and Clinical Research, Changsha 410008 Hunan, PR China
| | - Xinliang Li
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; The Hunan Institute of Pharmacy Practice and Clinical Research, Changsha 410008 Hunan, PR China
| | - Jing Xiao
- Hunan Institute for Drug Control, Changsha 410001 Hunan, PR China
| | - Shao Liu
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; The Hunan Institute of Pharmacy Practice and Clinical Research, Changsha 410008 Hunan, PR China.
| | - Dongsheng Cao
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, PR China.
| |
Collapse
|
12
|
Collins JW, Ebrahimkhani M, Ramirez D, Deiloff J, Gonzalez M, Abedi M, Philippe-Venec L, Cole BM, Moore B, Nwankwo JO. Attentive graph neural network models for the prediction of blood brain barrier permeability. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.12.617907. [PMID: 39463958 PMCID: PMC11507759 DOI: 10.1101/2024.10.12.617907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
The blood brain barrier's (BBB) unique endothelial cells and tight junctions selectively regulate passage of molecules to the central nervous system (CNS) to prevent pathogen entry and maintain neural homeostasis. Various neurological conditions and neurodegenerative diseases benefit from small molecules capable of BBB penetration (BBBP) to elicit a therapeutic effect. Predicting BBBP often involves in silico assessment of molecular properties such as lipophilicity (log P ) and polar surface area (PSA) using the CNS multiparameter optimization (MPO) method. This study curated an open-source dataset to benchmark rigorously machine learning (ML) and neural network (NN) models with each other and with MPO for predicting BBBP. Our analysis demonstrated that AI models, especially attentive NNs using stereochemical features, significantly outperform MPO in predicting BBBP. An attentive graph neural network (GNN), we refer to as CANDID-CNS™, achieved a 0.23-0.26 higher AUROC score than MPO on full test sets, and a 0.17-0.19 higher score on stereoisomers filtered subsets. Regarding stereoisomers that differ in BBBP, which MPO cannot distinguish, attentive GNNs correctly classify these with AUROC and MCC metrics comparable to or better than MPO's AUROC and MCC on less difficult test molecules. These findings suggest that integrating attentive GNN models into pharmaceutical drug discovery processes can substantially improve prediction rates, and thereby reduce the timeline, cost, and increase probability of success of designing brain penetrant therapeutics for the treatment of a wide variety of neurological and neurodegenerative diseases.
Collapse
|
13
|
Tan Z, Zhao Y, Lin K, Zhou T. Multi-task pretrained language model with novel application domains enables more comprehensive health and ecological toxicity prediction. JOURNAL OF HAZARDOUS MATERIALS 2024; 477:135265. [PMID: 39038381 DOI: 10.1016/j.jhazmat.2024.135265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Revised: 06/29/2024] [Accepted: 07/18/2024] [Indexed: 07/24/2024]
Abstract
In silico models for screening substances of healthy and ecological concern are essential for effective chemical management. However, current data-driven toxicity prediction models confront formidable challenges related to expressive capacity, data scarcity, and reliability issues. Thus, this study introduces TOX-BERT, a SMILES-based pretrained model for screening health and ecological toxicity. Results show that masked atom recovery pretraining and multi-task learning offer promising solutions to enhance model capacity and address data scarcity issues. Two novel application domain (AD) parameters, termed PCA-AD and LDS, were proposed to improve prediction reliability of TOX-BERT with accuracy surpassing 90 % and mean absolute error (MAE) below 0.52. TOX-BERT was applied to 18,905 IECSC chemicals, revealing distinct toxicity relationships that align with experimental studies such as those between cardiotoxicity and acute ecotoxicity. In addition to previous PBT screening, 156 potential high-risk chemicals for specific endpoint were identified covering 7 categories. Furthermore, a SMILES-based toxicity site detection approach was developed for structural toxicity analysis. These advancements carry profound implications to address challenges faced by current data-driven toxicity prediction models. TOX-BERT emerges as a valuable tool for more comprehensive, reliable, and applicable predictions of health and ecological toxicity in chemical risk assessment and management.
Collapse
Affiliation(s)
- Zhichao Tan
- The State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, 1239 Siping Road, Shanghai 200092, China; Shanghai Institute of Pollution Control and Ecological Security, 1515 North Zhongshan Rd. (No. 2), Shanghai 200092, PR China.
| | - Youcai Zhao
- The State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, 1239 Siping Road, Shanghai 200092, China; Shanghai Institute of Pollution Control and Ecological Security, 1515 North Zhongshan Rd. (No. 2), Shanghai 200092, PR China.
| | - Kunsen Lin
- The State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, 1239 Siping Road, Shanghai 200092, China; Shanghai Institute of Pollution Control and Ecological Security, 1515 North Zhongshan Rd. (No. 2), Shanghai 200092, PR China.
| | - Tao Zhou
- The State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, 1239 Siping Road, Shanghai 200092, China; Shanghai Institute of Pollution Control and Ecological Security, 1515 North Zhongshan Rd. (No. 2), Shanghai 200092, PR China.
| |
Collapse
|
14
|
Xiao Z, Zhu M, Chen J, You Z. Integrated Transfer Learning and Multitask Learning Strategies to Construct Graph Neural Network Models for Predicting Bioaccumulation Parameters of Chemicals. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:15650-15660. [PMID: 39051472 DOI: 10.1021/acs.est.4c02421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
Accurate prediction of parameters related to the environmental exposure of chemicals is crucial for the sound management of chemicals. However, the lack of large data sets for training models may result in poor prediction accuracy and robustness. Herein, integrated transfer learning (TL) and multitask learning (MTL) was proposed for constructing a graph neural network (GNN) model (abbreviated as TL-MTL-GNN model) using n-octanol/water partition coefficients as a source domain. The TL-MTL-GNN model was trained to predict three bioaccumulation parameters based on enlarged data sets that cover 2496 compounds with at least one bioaccumulation parameter. Results show that the TL-MTL-GNN model outperformed single-task GNN models with and without the TL, as well as conventional machine learning models trained with molecular descriptors or fingerprints. Applicability domains were characterized by a state-of-the-art structure-activity landscape-based (abbreviated as ADSAL) methodology. The TL-MTL-GNN model coupled with the optimal ADSAL was employed to predict bioaccumulation parameters for around 60,000 chemicals, with more than 13,000 compounds identified as bioaccumulative chemicals. The high predictive accuracy and robustness of the TL-MTL-GNN model demonstrate the feasibility of integrating the TL and MTL strategy in modeling small-sized data sets. The strategy holds significant potential for addressing small data challenges in modeling environmental chemicals.
Collapse
Affiliation(s)
- Zijun Xiao
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Minghua Zhu
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
- Key Laboratory of Integrated Regulation and Resources Development of Shallow Lakes of Ministry of Education, College of Environment, Hohai University, Nanjing 210098, China
| | - Jingwen Chen
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Zecang You
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
15
|
Wang T, Du Z, Zhuo L, Fu X, Zou Q, Yao X. MultiCBlo: Enhancing predictions of compound-induced inhibition of cardiac ion channels with advanced multimodal learning. Int J Biol Macromol 2024; 276:133825. [PMID: 39002900 DOI: 10.1016/j.ijbiomac.2024.133825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2024] [Revised: 07/09/2024] [Accepted: 07/10/2024] [Indexed: 07/15/2024]
Abstract
Predicting compound-induced inhibition of cardiac ion channels is crucial and challenging, significantly impacting cardiac drug efficacy and safety assessments. Despite the development of various computational methods for compound-induced inhibition prediction in cardiac ion channels, their performance remains limited. Most methods struggle to fuse multi-source data, relying solely on specific dataset training, leading to poor accuracy and generalization. We introduce MultiCBlo, a model that fuses multimodal information through a progressive learning approach, designed to predict compound-induced inhibition of cardiac ion channels with high accuracy. MultiCBlo employs progressive multimodal information fusion technology to integrate the compound's SMILES sequence, graph structure, and fingerprint, enhancing its representation. This is the first application of progressive multimodal learning for predicting compound-induced inhibition of cardiac ion channels, to our knowledge. The objective of this study was to predict the compound-induced inhibition of three major cardiac ion channels: hERG, Cav1.2, and Nav1.5. The results indicate that MultiCBlo significantly outperforms current models in predicting compound-induced inhibition of cardiac ion channels. We hope that MultiCBlo will facilitate cardiac drug development and reduce compound toxicity risks. Code and data are accessible at: https://github.com/taowang11/MultiCBlo. The online prediction platform is freely accessible at: https://huggingface.co/spaces/wtttt/PCICB.
Collapse
Affiliation(s)
- Tao Wang
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, 325027 Wenzhou, China
| | - Zhenya Du
- Guangzhou Xinhua University, 510520 Guangzhou, China
| | - Linlin Zhuo
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, 325027 Wenzhou, China.
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, 410012 Changsha, China.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, 611730 Chengdu, China
| | - Xiaojun Yao
- Faculty of Applied Sciences, Macao Polytechnic University, 999078 Macao, China.
| |
Collapse
|
16
|
Xu H, Zhao Y, Zhang Y, Han J, Zan P, He S, Bo X. Deep active learning with high structural discriminability for molecular mutagenicity prediction. Commun Biol 2024; 7:1071. [PMID: 39217273 PMCID: PMC11366013 DOI: 10.1038/s42003-024-06758-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 08/21/2024] [Indexed: 09/04/2024] Open
Abstract
The assessment of mutagenicity is essential in drug discovery, as it may lead to cancer and germ cells damage. Although in silico methods have been proposed for mutagenicity prediction, their performance is hindered by the scarcity of labeled molecules. However, experimental mutagenicity testing can be time-consuming and costly. One solution to reduce the annotation cost is active learning, where the algorithm actively selects the most valuable molecules from a vast chemical space and presents them to the oracle (e.g., a human expert) for annotation, thereby rapidly improving the model's predictive performance with a smaller annotation cost. In this paper, we propose muTOX-AL, a deep active learning framework, which can actively explore the chemical space and identify the most valuable molecules, resulting in competitive performance with a small number of labeled samples. The experimental results show that, compared to the random sampling strategy, muTOX-AL can reduce the number of training molecules by about 57%. Additionally, muTOX-AL exhibits outstanding molecular structural discriminability, allowing it to pick molecules with high structural similarity but opposite properties.
Collapse
Affiliation(s)
- Huiyan Xu
- Shanghai Key Laboratory of Power Station Automation Technology, School of Mechatronics Engineering and Automation, Shanghai University, Shanghai, China
- Academy of Military Medical Sciences, Beijing, China
| | - Yanpeng Zhao
- Academy of Military Medical Sciences, Beijing, China
| | - Yixin Zhang
- Academy of Military Medical Sciences, Beijing, China
| | - Junshan Han
- Academy of Military Medical Sciences, Beijing, China
| | - Peng Zan
- Shanghai Key Laboratory of Power Station Automation Technology, School of Mechatronics Engineering and Automation, Shanghai University, Shanghai, China.
| | - Song He
- Academy of Military Medical Sciences, Beijing, China.
| | - Xiaochen Bo
- Academy of Military Medical Sciences, Beijing, China.
| |
Collapse
|
17
|
Srisongkram T. DeepRA: A novel deep learning-read-across framework and its application in non-sugar sweeteners mutagenicity prediction. Comput Biol Med 2024; 178:108731. [PMID: 38870727 DOI: 10.1016/j.compbiomed.2024.108731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 05/07/2024] [Accepted: 06/08/2024] [Indexed: 06/15/2024]
Abstract
Non-sugar sweeteners (NSSs) or artificial sweeteners have long been used as food chemicals since World War II. NSSs, however, also raise a concern about their mutagenicity. Evaluating the mutagenic ability of NSSs is crucial for food safety; this step is needed for every new chemical registration in the food and pharmaceutical industries. A computational assessment provides less time, money, and involved animals than the in vivo experiments; thus, this study developed a novel computational method from an ensemble convolutional deep neural network and read-across algorithms, called DeepRA, to classify the mutagenicity of chemicals. The mutagenicity data were obtained from the curated Ames test data set. The DeepRA model was developed using both molecular descriptors and molecular fingerprints. The obtained DeepRA model provides accurate and reliable mutagenicity classification through an independent test set. This model was then used to examine the NSSs-related chemicals, enabling the evaluation of mutagenicity from the NSSs-like substances. Finally, this model was publicly available at https://github.com/taraponglab/deepra for further use in chemical regulation and risk assessment.
Collapse
Affiliation(s)
- Tarapong Srisongkram
- Division of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Khon Kaen University, 40002, Thailand.
| |
Collapse
|
18
|
Shen C, Song J, Hsieh CY, Cao D, Kang Y, Ye W, Wu Z, Wang J, Zhang O, Zhang X, Zeng H, Cai H, Chen Y, Chen L, Luo H, Zhao X, Jian T, Chen T, Jiang D, Wang M, Ye Q, Wu J, Du H, Shi H, Deng Y, Hou T. DrugFlow: An AI-Driven One-Stop Platform for Innovative Drug Discovery. J Chem Inf Model 2024; 64:5381-5391. [PMID: 38920405 DOI: 10.1021/acs.jcim.4c00621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/27/2024]
Abstract
Artificial intelligence (AI)-aided drug design has demonstrated unprecedented effects on modern drug discovery, but there is still an urgent need for user-friendly interfaces that bridge the gap between these sophisticated tools and scientists, particularly those who are less computer savvy. Herein, we present DrugFlow, an AI-driven one-stop platform that offers a clean, convenient, and cloud-based interface to streamline early drug discovery workflows. By seamlessly integrating a range of innovative AI algorithms, covering molecular docking, quantitative structure-activity relationship modeling, molecular generation, ADMET (absorption, distribution, metabolism, excretion and toxicity) prediction, and virtual screening, DrugFlow can offer effective AI solutions for almost all crucial stages in early drug discovery, including hit identification and hit/lead optimization. We hope that the platform can provide sufficiently valuable guidance to aid real-word drug design and discovery. The platform is available at https://drugflow.com.
Collapse
Affiliation(s)
- Chao Shen
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jianfei Song
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Chang-Yu Hsieh
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410004, Hunan, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Wenling Ye
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Odin Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Hao Zeng
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Heng Cai
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Yu Chen
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Linkang Chen
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Hao Luo
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Xinda Zhao
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Tianye Jian
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Tong Chen
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Mingyang Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Qing Ye
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jialu Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Hongyan Du
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Hui Shi
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Yafeng Deng
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Tingjun Hou
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
19
|
Shi S, Fu L, Yi J, Yang Z, Zhang X, Deng Y, Wang W, Wu C, Zhao W, Hou T, Zeng X, Lyu A, Cao D. ChemFH: an integrated tool for screening frequent false positives in chemical biology and drug discovery. Nucleic Acids Res 2024; 52:W439-W449. [PMID: 38783035 PMCID: PMC11223804 DOI: 10.1093/nar/gkae424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 04/25/2024] [Accepted: 05/10/2024] [Indexed: 05/25/2024] Open
Abstract
High-throughput screening rapidly tests an extensive array of chemical compounds to identify hit compounds for specific biological targets in drug discovery. However, false-positive results disrupt hit compound screening, leading to wastage of time and resources. To address this, we propose ChemFH, an integrated online platform facilitating rapid virtual evaluation of potential false positives, including colloidal aggregators, spectroscopic interference compounds, firefly luciferase inhibitors, chemical reactive compounds, promiscuous compounds, and other assay interferences. By leveraging a dataset containing 823 391 compounds, we constructed high-quality prediction models using multi-task directed message-passing network (DMPNN) architectures combining uncertainty estimation, yielding an average AUC value of 0.91. Furthermore, ChemFH incorporated 1441 representative alert substructures derived from the collected data and ten commonly used frequent hitter screening rules. ChemFH was validated with an external set of 75 compounds. Subsequently, the virtual screening capability of ChemFH was successfully confirmed through its application to five virtual screening libraries. Furthermore, ChemFH underwent additional validation on two natural products and FDA-approved drugs, yielding reliable and accurate results. ChemFH is a comprehensive, reliable, and computationally efficient screening pipeline that facilitates the identification of true positive results in assays, contributing to enhanced efficiency and success rates in drug discovery. ChemFH is freely available via https://chemfh.scbdd.com/.
Collapse
Affiliation(s)
- Shaohua Shi
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
- School of Chinese Medicine, Hong Kong Baptist University, Kowloon, Hong Kong SAR, 999077, P.R. China
| | - Li Fu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Jiacai Yi
- School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, P.R. China
| | - Ziyi Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Xiaochen Zhang
- School of Information Technology, Shangqiu Normal University, Shangqiu, Henan 476000, P.R. China
| | - Youchao Deng
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Wenxuan Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Chengkun Wu
- School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, P.R. China
| | - Wentao Zhao
- School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, P.R. China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, P.R. China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, P.R. China
| | - Aiping Lyu
- School of Chinese Medicine, Hong Kong Baptist University, Kowloon, Hong Kong SAR, 999077, P.R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| |
Collapse
|
20
|
Zhou Y, Ning C, Tan Y, Li Y, Wang J, Shu Y, Liang S, Liu Z, Wang Y. ToxMPNN: A deep learning model for small molecule toxicity prediction. J Appl Toxicol 2024; 44:953-964. [PMID: 38409892 DOI: 10.1002/jat.4591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 01/23/2024] [Accepted: 02/02/2024] [Indexed: 02/28/2024]
Abstract
Machine learning (ML) has shown a great promise in predicting toxicity of small molecules. However, the availability of data for such predictions is often limited. Because of the unsatisfactory performance of models trained on a single toxicity endpoint, we collected toxic small molecules with multiple toxicity endpoints from previous study. The dataset comprises 27 toxic endpoints categorized into seven toxicity classes, namely, carcinogenicity and mutagenicity, acute oral toxicity, respiratory toxicity, irritation and corrosion, cardiotoxicity, CYP450, and endocrine disruption. In addition, a binary classification Common-Toxicity task was added based on the aforementioned dataset. To improve the performance of the models, we added marketed drugs as negative samples. This study presents a toxicity predictive model, ToxMPNN, based on the message passing neural network (MPNN) architecture, aiming to predict the toxicity of small molecules. The results demonstrate that ToxMPNN outperforms other models in capturing toxic features within the molecular structure, resulting in more precise predictions with the ROC_AUC testing score of 0.886 for the Toxicity_drug dataset. Furthermore, it was observed that adding marketed drugs as negative samples not only improves the predictive performance of the binary classification Common-Toxicity task but also enhances the stability of the model prediction. It shows that the graph-based deep learning (DL) algorithms in this study can be used as a trustworthy and effective tool to assess small molecule toxicity in the development of new drugs.
Collapse
Affiliation(s)
- Yini Zhou
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, China
- Institute of Interdisciplinary Studies, Hunan Normal University, Changsha, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha, China
| | - Chao Ning
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, China
- Institute of Interdisciplinary Studies, Hunan Normal University, Changsha, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha, China
| | - Yijun Tan
- School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha, China
| | - Yaqi Li
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, China
- Institute of Interdisciplinary Studies, Hunan Normal University, Changsha, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha, China
| | - Jiaxu Wang
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, China
- Institute of Interdisciplinary Studies, Hunan Normal University, Changsha, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha, China
| | - Yuanyuan Shu
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, China
- Institute of Interdisciplinary Studies, Hunan Normal University, Changsha, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha, China
| | - Songping Liang
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, China
- Institute of Interdisciplinary Studies, Hunan Normal University, Changsha, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha, China
| | - Zhonghua Liu
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, China
- Institute of Interdisciplinary Studies, Hunan Normal University, Changsha, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha, China
| | - Ying Wang
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, China
- Institute of Interdisciplinary Studies, Hunan Normal University, Changsha, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha, China
| |
Collapse
|
21
|
Wu H, Liu J, Zhang R, Lu Y, Cui G, Cui Z, Ding Y. A review of deep learning methods for ligand based drug virtual screening. FUNDAMENTAL RESEARCH 2024; 4:715-737. [PMID: 39156568 PMCID: PMC11330120 DOI: 10.1016/j.fmre.2024.02.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/10/2024] [Accepted: 02/18/2024] [Indexed: 08/20/2024] Open
Abstract
Drug discovery is costly and time consuming, and modern drug discovery endeavors are progressively reliant on computational methodologies, aiming to mitigate temporal and financial expenditures associated with the process. In particular, the time required for vaccine and drug discovery is prolonged during emergency situations such as the coronavirus 2019 pandemic. Recently, the performance of deep learning methods in drug virtual screening has been particularly prominent. It has become a concern for researchers how to summarize the existing deep learning in drug virtual screening, select different models for different drug screening problems, exploit the advantages of deep learning models, and further improve the capability of deep learning in drug virtual screening. This review first introduces the basic concepts of drug virtual screening, common datasets, and data representation methods. Then, large numbers of common deep learning methods for drug virtual screening are compared and analyzed. In addition, a dataset of different sizes is constructed independently to evaluate the performance of each deep learning model for the difficult problem of large-scale ligand virtual screening. Finally, the existing challenges and future directions in the field of virtual screening are presented.
Collapse
Affiliation(s)
- Hongjie Wu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Junkai Liu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Runhua Zhang
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Yaoyao Lu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Guozeng Cui
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Zhiming Cui
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| |
Collapse
|
22
|
Duan Y, Yang X, Zeng X, Wang W, Deng Y, Cao D. Enhancing Molecular Property Prediction through Task-Oriented Transfer Learning: Integrating Universal Structural Insights and Domain-Specific Knowledge. J Med Chem 2024; 67:9575-9586. [PMID: 38748846 DOI: 10.1021/acs.jmedchem.4c00692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2024]
Abstract
Precisely predicting molecular properties is crucial in drug discovery, but the scarcity of labeled data poses a challenge for applying deep learning methods. While large-scale self-supervised pretraining has proven an effective solution, it often neglects domain-specific knowledge. To tackle this issue, we introduce Task-Oriented Multilevel Learning based on BERT (TOML-BERT), a dual-level pretraining framework that considers both structural patterns and domain knowledge of molecules. TOML-BERT achieved state-of-the-art prediction performance on 10 pharmaceutical datasets. It has the capability to mine contextual information within molecular structures and extract domain knowledge from massive pseudo-labeled data. The dual-level pretraining accomplished significant positive transfer, with its two components making complementary contributions. Interpretive analysis elucidated that the effectiveness of the dual-level pretraining lies in the prior learning of a task-related molecular representation. Overall, TOML-BERT demonstrates the potential of combining multiple pretraining tasks to extract task-oriented knowledge, advancing molecular property prediction in drug discovery.
Collapse
Affiliation(s)
- Yanjing Duan
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha Hunan 410013, P. R. China
| | - Xixi Yang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410013, P. R. China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410013, P. R. China
| | - Wenxuan Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha Hunan 410013, P. R. China
| | - Youchao Deng
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha Hunan 410013, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha Hunan 410013, P. R. China
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, P. R. China
| |
Collapse
|
23
|
Rahu I, Kull M, Kruve A. Predicting the Activity of Unidentified Chemicals in Complementary Bioassays from the HRMS Data to Pinpoint Potential Endocrine Disruptors. J Chem Inf Model 2024; 64:3093-3104. [PMID: 38523265 PMCID: PMC11040721 DOI: 10.1021/acs.jcim.3c02050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 03/07/2024] [Accepted: 03/15/2024] [Indexed: 03/26/2024]
Abstract
The majority of chemicals detected via nontarget liquid chromatography high-resolution mass spectrometry (HRMS) in environmental samples remain unidentified, challenging the capability of existing machine learning models to pinpoint potential endocrine disruptors (EDs). Here, we predict the activity of unidentified chemicals across 12 bioassays related to EDs within the Tox21 10K dataset. Single- and multi-output models, utilizing various machine learning algorithms and molecular fingerprint features as an input, were trained for this purpose. To evaluate the models under near real-world conditions, Monte Carlo sampling was implemented for the first time. This technique enables the use of probabilistic fingerprint features derived from the experimental HRMS data with SIRIUS+CSI:FingerID as an input for models trained on true binary fingerprint features. Depending on the bioassay, the lowest false-positive rate at 90% recall ranged from 0.251 (sr.mmp, mitochondrial membrane potential) to 0.824 (nr.ar, androgen receptor), which is consistent with the trends observed in the models' performances submitted for the Tox21 Data Challenge. These findings underscore the informativeness of fingerprint features that can be compiled from HRMS in predicting the endocrine-disrupting activity. Moreover, an in-depth SHapley Additive exPlanations analysis unveiled the models' ability to pinpoint structural patterns linked to the modes of action of active chemicals. Despite the superior performance of the single-output models compared to that of the multi-output models, the latter's potential cannot be disregarded for similar tasks in the field of in silico toxicology. This study presents a significant advancement in identifying potentially toxic chemicals within complex mixtures without unambiguous identification and effectively reducing the workload for postprocessing by up to 75% in nontarget HRMS.
Collapse
Affiliation(s)
- Ida Rahu
- Institute
of Computer Science, University of Tartu, Narva mnt 18, Tartu 51009, Estonia
- Department
of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, Stockholm SE-106 91, Sweden
| | - Meelis Kull
- Institute
of Computer Science, University of Tartu, Narva mnt 18, Tartu 51009, Estonia
| | - Anneli Kruve
- Department
of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, Stockholm SE-106 91, Sweden
- Department
of Environmental Science, Stockholm University, Svante Arrhenius Väg 16, Stockholm SE-106 91, Sweden
| |
Collapse
|
24
|
Shkil DO, Muhamedzhanova AA, Petrov PI, Skorb EV, Aliev TA, Steshin IS, Tumanov AV, Kislinskiy AS, Fedorov MV. Expanding Predictive Capacities in Toxicology: Insights from Hackathon-Enhanced Data and Model Aggregation. Molecules 2024; 29:1826. [PMID: 38675645 PMCID: PMC11055041 DOI: 10.3390/molecules29081826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 04/11/2024] [Accepted: 04/15/2024] [Indexed: 04/28/2024] Open
Abstract
In the realm of predictive toxicology for small molecules, the applicability domain of QSAR models is often limited by the coverage of the chemical space in the training set. Consequently, classical models fail to provide reliable predictions for wide classes of molecules. However, the emergence of innovative data collection methods such as intensive hackathons have promise to quickly expand the available chemical space for model construction. Combined with algorithmic refinement methods, these tools can address the challenges of toxicity prediction, enhancing both the robustness and applicability of the corresponding models. This study aimed to investigate the roles of gradient boosting and strategic data aggregation in enhancing the predictivity ability of models for the toxicity of small organic molecules. We focused on evaluating the impact of incorporating fragment features and expanding the chemical space, facilitated by a comprehensive dataset procured in an open hackathon. We used gradient boosting techniques, accounting for critical features such as the structural fragments or functional groups often associated with manifestations of toxicity.
Collapse
Affiliation(s)
- Dmitrii O. Shkil
- Syntelly LLC, Moscow 121205, Russia; (A.A.M.); (I.S.S.); (A.V.T.); (A.S.K.)
- Moscow Institute of Physics and Technology, Moscow 141700, Russia
| | | | | | - Ekaterina V. Skorb
- Infochemistry Scientific Center, ITMO University, Saint-Petersburg 191002, Russia; (E.V.S.); (T.A.A.)
| | - Timur A. Aliev
- Infochemistry Scientific Center, ITMO University, Saint-Petersburg 191002, Russia; (E.V.S.); (T.A.A.)
| | - Ilya S. Steshin
- Syntelly LLC, Moscow 121205, Russia; (A.A.M.); (I.S.S.); (A.V.T.); (A.S.K.)
| | | | | | - Maxim V. Fedorov
- Kharkevich Institute for Information Transmission Problems of Russian Academy of Sciences, Moscow 127994, Russia
| |
Collapse
|
25
|
Zhao L, Xue Q, Zhang H, Hao Y, Yi H, Liu X, Pan W, Fu J, Zhang A. CatNet: Sequence-based deep learning with cross-attention mechanism for identifying endocrine-disrupting chemicals. JOURNAL OF HAZARDOUS MATERIALS 2024; 465:133055. [PMID: 38016311 DOI: 10.1016/j.jhazmat.2023.133055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 11/02/2023] [Accepted: 11/20/2023] [Indexed: 11/30/2023]
Abstract
Endocrine-disrupting chemicals (EDCs) pose significant environmental and health risks due to their potential to interfere with nuclear receptors (NRs), key regulators of physiological processes. Despite the evident risks, the majority of existing research narrows its focus on the interaction between compounds and the individual NR target, neglecting a comprehensive assessment across the entire NR family. In response, this study assembled a comprehensive human NR dataset, capturing 49,244 interactions between 35,467 unique compounds and 42 NRs. We introduced a cross-attention network framework, "CatNet", innovatively integrating compound and protein representations through cross-attention mechanisms. The results showed that CatNet model achieved excellent performance with an area under the receiver operating characteristic curve (AUCROC) = 0.916 on the test set, and exhibited reliable generalization on unseen compound-NR pairs. A distinguishing feature of our research is its capacity to expand to novel targets. Beyond its predictive accuracy, CatNet offers a valuable mechanistic perspective on compound-NR interactions through feature visualization. Augmenting the utility of our research, we have also developed a graphical user interface, empowering researchers to predict chemical binding to diverse NRs. Our model enables the prediction of human NR-related EDCs and shows the potential to identify EDCs related to other targets.
Collapse
Affiliation(s)
- Lu Zhao
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China; Sino-Danish College, University of Chinese Academy of Sciences, Beijing 100049, PR China
| | - Qiao Xue
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China.
| | - Huazhou Zhang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China; Sino-Danish College, University of Chinese Academy of Sciences, Beijing 100049, PR China
| | - Yuxing Hao
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China; Sino-Danish College, University of Chinese Academy of Sciences, Beijing 100049, PR China
| | - Hang Yi
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100190, PR China
| | - Xian Liu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China
| | - Wenxiao Pan
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China
| | - Jianjie Fu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100190, PR China; School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310012, PR China
| | - Aiqian Zhang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China; Sino-Danish College, University of Chinese Academy of Sciences, Beijing 100049, PR China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100190, PR China; School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310012, PR China.
| |
Collapse
|
26
|
Wang M, Wu Z, Wang J, Weng G, Kang Y, Pan P, Li D, Deng Y, Yao X, Bing Z, Hsieh CY, Hou T. Genetic Algorithm-Based Receptor Ligand: A Genetic Algorithm-Guided Generative Model to Boost the Novelty and Drug-Likeness of Molecules in a Sampling Chemical Space. J Chem Inf Model 2024; 64:1213-1228. [PMID: 38302422 DOI: 10.1021/acs.jcim.3c01964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Deep learning-based de novo molecular design has recently gained significant attention. While numerous DL-based generative models have been successfully developed for designing novel compounds, the majority of the generated molecules lack sufficiently novel scaffolds or high drug-like profiles. The aforementioned issues may not be fully captured by commonly used metrics for the assessment of molecular generative models, such as novelty, diversity, and quantitative estimation of the drug-likeness score. To address these limitations, we proposed a genetic algorithm-guided generative model called GARel (genetic algorithm-based receptor-ligand interaction generator), a novel framework for training a DL-based generative model to produce drug-like molecules with novel scaffolds. To efficiently train the GARel model, we utilized dense net to update the parameters based on molecules with novel scaffolds and drug-like features. To demonstrate the capability of the GARel model, we used it to design inhibitors for three targets: AA2AR, EGFR, and SARS-Cov2. The results indicate that GARel-generated molecules feature more diverse and novel scaffolds and possess more desirable physicochemical properties and favorable docking scores. Compared with other generative models, GARel makes significant progress in balancing novelty and drug-likeness, providing a promising direction for the further development of DL-based de novo design methodology with potential impacts on drug discovery.
Collapse
Affiliation(s)
- Mingyang Wang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
- CarbonSilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang ,China
| | - Zhengjian Wu
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
- School of Computer Science, Wuhan University, Wuhan 430072, Hubei ,China
| | - Jike Wang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
- CarbonSilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang ,China
| | - Gaoqi Weng
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Yu Kang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Peichen Pan
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Dan Li
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang ,China
| | - Xiaojun Yao
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery Macau Institute for Applied Research in Medicine and Health State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Taipa, Macau 999078, China
| | - Zhitong Bing
- Institute of Modern Physics, Chinese Academy of Sciences, Lanzhou, Gansu 730000, China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Tingjun Hou
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| |
Collapse
|
27
|
Wu H, Liu J, Jiang T, Zou Q, Qi S, Cui Z, Tiwari P, Ding Y. AttentionMGT-DTA: A multi-modal drug-target affinity prediction using graph transformer and attention mechanism. Neural Netw 2024; 169:623-636. [PMID: 37976593 DOI: 10.1016/j.neunet.2023.11.018] [Citation(s) in RCA: 36] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 09/29/2023] [Accepted: 11/07/2023] [Indexed: 11/19/2023]
Abstract
The accurate prediction of drug-target affinity (DTA) is a crucial step in drug discovery and design. Traditional experiments are very expensive and time-consuming. Recently, deep learning methods have achieved notable performance improvements in DTA prediction. However, one challenge for deep learning-based models is appropriate and accurate representations of drugs and targets, especially the lack of effective exploration of target representations. Another challenge is how to comprehensively capture the interaction information between different instances, which is also important for predicting DTA. In this study, we propose AttentionMGT-DTA, a multi-modal attention-based model for DTA prediction. AttentionMGT-DTA represents drugs and targets by a molecular graph and binding pocket graph, respectively. Two attention mechanisms are adopted to integrate and interact information between different protein modalities and drug-target pairs. The experimental results showed that our proposed model outperformed state-of-the-art baselines on two benchmark datasets. In addition, AttentionMGT-DTA also had high interpretability by modeling the interaction strength between drug atoms and protein residues. Our code is available at https://github.com/JK-Liu7/AttentionMGT-DTA.
Collapse
Affiliation(s)
- Hongjie Wu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.
| | - Junkai Liu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China; Yangtze Delta Region Institute(Quzhou), University of Electronic Science and Technology of China, Quzhou, 324003, China.
| | - Tengsheng Jiang
- Gusu School, Nanjing Medical University, Suzhou, 215009, China.
| | - Quan Zou
- Yangtze Delta Region Institute(Quzhou), University of Electronic Science and Technology of China, Quzhou, 324003, China.
| | - Shujie Qi
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.
| | - Zhiming Cui
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.
| | - Prayag Tiwari
- School of Information Technology, Halmstad University, Sweden.
| | - Yijie Ding
- Yangtze Delta Region Institute(Quzhou), University of Electronic Science and Technology of China, Quzhou, 324003, China.
| |
Collapse
|
28
|
Wang J, Zhang L, Sun J, Yang X, Wu W, Chen W, Zhao Q. Predicting drug-induced liver injury using graph attention mechanism and molecular fingerprints. Methods 2024; 221:18-26. [PMID: 38040204 DOI: 10.1016/j.ymeth.2023.11.014] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 11/14/2023] [Accepted: 11/25/2023] [Indexed: 12/03/2023] Open
Abstract
Drug-induced liver injury (DILI) is a significant issue in drug development and clinical treatment due to its potential to cause liver dysfunction or damage, which, in severe cases, can lead to liver failure or even fatality. DILI has numerous pathogenic factors, many of which remain incompletely understood. Consequently, it is imperative to devise methodologies and tools for anticipatory assessment of DILI risk in the initial phases of drug development. In this study, we present DMFPGA, a novel deep learning predictive model designed to predict DILI. To provide a comprehensive description of molecular properties, we employ a multi-head graph attention mechanism to extract features from the molecular graphs, representing characteristics at the level of compound nodes. Additionally, we combine multiple fingerprints of molecules to capture features at the molecular level of compounds. The fusion of molecular fingerprints and graph features can more fully express the properties of compounds. Subsequently, we employ a fully connected neural network to classify compounds as either DILI-positive or DILI-negative. To rigorously evaluate DMFPGA's performance, we conduct a 5-fold cross-validation experiment. The obtained results demonstrate the superiority of our method over four existing state-of-the-art computational approaches, exhibiting an average AUC of 0.935 and an average ACC of 0.934. We believe that DMFPGA is helpful for early-stage DILI prediction and assessment in drug development.
Collapse
Affiliation(s)
- Jifeng Wang
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China
| | - Li Zhang
- School of Life Science, Liaoning University, Shenyang 110036, China
| | - Jianqiang Sun
- School of Information Science and Engineering, Linyi University, Linyi 276000, China
| | - Xin Yang
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China
| | - Wei Wu
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China
| | - Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China.
| |
Collapse
|
29
|
Yu Z, Wu Z, Zhou M, Cao K, Li W, Liu G, Tang Y. EDC-Predictor: A Novel Strategy for Prediction of Endocrine-Disrupting Chemicals by Integrating Pharmacological and Toxicological Profiles. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:18013-18025. [PMID: 37053516 DOI: 10.1021/acs.est.2c08558] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Identification of endocrine-disrupting chemicals (EDCs) is crucial in the reduction of human health risks. However, it is hard to do so because of the complex mechanisms of the EDCs. In this study, we propose a novel strategy named EDC-Predictor to integrate pharmacological and toxicological profiles for the prediction of EDCs. Different from conventional methods that only focus on a few nuclear receptors (NRs), EDC-Predictor considers more targets. It uses computational target profiles from network-based and machine learning-based methods to characterize compounds, including both EDCs and non-EDCs. The best model constructed by these target profiles outperformed those models by molecular fingerprints. In a case study to predict NR-related EDCs, EDC-Predictor showed a wider applicability domain and higher accuracy than four previous tools. Another case study further demonstrated that EDC-Predictor could predict EDCs targeting other proteins rather than NRs. Finally, a free web server was developed to make EDC prediction easier (http://lmmd.ecust.edu.cn/edcpred/). In summary, EDC-Predictor would be a powerful tool in EDC prediction and drug safety assessment.
Collapse
Affiliation(s)
- Zhuohang Yu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Zengrui Wu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Moran Zhou
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Kangjia Cao
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| |
Collapse
|
30
|
Du BX, Xu Y, Yiu SM, Yu H, Shi JY. ADMET property prediction via multi-task graph learning under adaptive auxiliary task selection. iScience 2023; 26:108285. [PMID: 38026198 PMCID: PMC10654589 DOI: 10.1016/j.isci.2023.108285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 09/18/2023] [Accepted: 10/18/2023] [Indexed: 12/01/2023] Open
Abstract
It is a critical step in lead optimization to evaluate the absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties of drug-like compounds. Classical single-task learning (STL) has effectively predicted individual ADMET endpoints with abundant labels. Conversely, multi-task learning (MTL) can predict multiple ADMET endpoints with fewer labels, but ensuring task synergy and highlighting key molecular substructures remain challenges. To tackle these issues, this work elaborates a multi-task graph learning framework for predicting multiple ADMET properties of drug-like small molecules (MTGL-ADMET) by holding a new paradigm of MTL, "one primary, multiple auxiliaries." It first adeptly combines status theory with maximum flow for auxiliary task selection. The subsequent phase introduces a primary-task-centric MTL model with integrated modules. MTGL-ADMET not only outstrips existing STL and MTL methods but also offers a transparent lens into crucial molecular substructures. It is anticipated that this work can promote lead compound finding and optimization in drug discovery.
Collapse
Affiliation(s)
- Bing-Xue Du
- School of Life Sciences, Northwestern Polytechnical University, Xi’an 710072, China
| | - Yi Xu
- School of Life Sciences, Northwestern Polytechnical University, Xi’an 710072, China
| | - Siu-Ming Yiu
- Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Hui Yu
- School of Life Sciences, Northwestern Polytechnical University, Xi’an 710072, China
| | - Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi’an 710072, China
| |
Collapse
|
31
|
Wu Z, Wang J, Du H, Jiang D, Kang Y, Li D, Pan P, Deng Y, Cao D, Hsieh CY, Hou T. Chemistry-intuitive explanation of graph neural networks for molecular property prediction with substructure masking. Nat Commun 2023; 14:2585. [PMID: 37142585 PMCID: PMC10160109 DOI: 10.1038/s41467-023-38192-3] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Accepted: 04/12/2023] [Indexed: 05/06/2023] Open
Abstract
Graph neural networks (GNNs) have been widely used in molecular property prediction, but explaining their black-box predictions is still a challenge. Most existing explanation methods for GNNs in chemistry focus on attributing model predictions to individual nodes, edges or fragments that are not necessarily derived from a chemically meaningful segmentation of molecules. To address this challenge, we propose a method named substructure mask explanation (SME). SME is based on well-established molecular segmentation methods and provides an interpretation that aligns with the understanding of chemists. We apply SME to elucidate how GNNs learn to predict aqueous solubility, genotoxicity, cardiotoxicity and blood-brain barrier permeation for small molecules. SME provides interpretation that is consistent with the understanding of chemists, alerts them to unreliable performance, and guides them in structural optimization for target properties. Hence, we believe that SME empowers chemists to confidently mine structure-activity relationship (SAR) from reliable GNNs through a transparent inspection on how GNNs pick up useful signals when learning from data.
Collapse
Affiliation(s)
- Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018, Zhejiang, P.R. China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018, Zhejiang, P.R. China
- National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, 430072, Hubei, P.R. China
| | - Hongyan Du
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018, Zhejiang, P.R. China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018, Zhejiang, P.R. China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
| | - Dan Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018, Zhejiang, P.R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410004, Hunan, P.R. China.
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China.
| |
Collapse
|
32
|
Wang D, Wu Z, Shen C, Bao L, Luo H, Wang Z, Yao H, Kong DX, Luo C, Hou T. Learning with uncertainty to accelerate the discovery of histone lysine-specific demethylase 1A (KDM1A/LSD1) inhibitors. Brief Bioinform 2023; 24:6961473. [PMID: 36573494 DOI: 10.1093/bib/bbac592] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 12/01/2022] [Accepted: 12/02/2022] [Indexed: 12/28/2022] Open
Abstract
Machine learning including modern deep learning models has been extensively used in drug design and screening. However, reliable prediction of molecular properties is still challenging when exploring out-of-domain regimes, even for deep neural networks. Therefore, it is important to understand the uncertainty of model predictions, especially when the predictions are used to guide further experiments. In this study, we explored the utility and effectiveness of evidential uncertainty in compound screening. The evidential Graphormer model was proposed for uncertainty-guided discovery of KDM1A/LSD1 inhibitors. The benchmarking results illustrated that (i) Graphormer exhibited comparative predictive power to state-of-the-art models, and (ii) evidential regression enabled well-ranked uncertainty estimates and calibrated predictions. Subsequently, we leveraged time-splitting on the curated KDM1A/LSD1 dataset to simulate out-of-distribution predictions. The retrospective virtual screening showed that the evidential uncertainties helped reduce false positives among the top-acquired compounds and thus enabled higher experimental validation rates. The trained model was then used to virtually screen an independent in-house compound set. The top 50 compounds ranked by two different ranking strategies were experimentally validated, respectively. In general, our study highlighted the importance to understand the uncertainty in prediction, which can be recognized as an interpretable dimension to model predictions.
Collapse
Affiliation(s)
- Dong Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058 Zhejiang, China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058 Zhejiang, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058 Zhejiang, China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, China
| | - Lingjie Bao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058 Zhejiang, China
| | - Hao Luo
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058 Zhejiang, China
| | - Zhe Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058 Zhejiang, China
| | - Hucheng Yao
- State Key Laboratory of Agricultural Microbiology, Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - De-Xin Kong
- State Key Laboratory of Agricultural Microbiology, Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Cheng Luo
- The Center for Chemical Biology, Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203 China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058 Zhejiang, China
| |
Collapse
|
33
|
Wu L, Yan B, Han J, Li R, Xiao J, He S, Bo X. TOXRIC: a comprehensive database of toxicological data and benchmarks. Nucleic Acids Res 2023; 51:D1432-D1445. [PMID: 36400569 PMCID: PMC9825425 DOI: 10.1093/nar/gkac1074] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 10/10/2022] [Accepted: 10/26/2022] [Indexed: 11/20/2022] Open
Abstract
The toxic effects of compounds on environment, humans, and other organisms have been a major focus of many research areas, including drug discovery and ecological research. Identifying the potential toxicity in the early stage of compound/drug discovery is critical. The rapid development of computational methods for evaluating various toxicity categories has increased the need for comprehensive and system-level collection of toxicological data, associated attributes, and benchmarks. To contribute toward this goal, we proposed TOXRIC (https://toxric.bioinforai.tech/), a database with comprehensive toxicological data, standardized attribute data, practical benchmarks, informative visualization of molecular representations, and an intuitive function interface. The data stored in TOXRIC contains 113 372 compounds, 13 toxicity categories, 1474 toxicity endpoints covering in vivo/in vitro endpoints and 39 feature types, covering structural, target, transcriptome, metabolic data, and other descriptors. All the curated datasets of endpoints and features can be retrieved, downloaded and directly used as output or input to Machine Learning (ML)-based prediction models. In addition to serving as a data repository, TOXRIC also provides visualization of benchmarks and molecular representations for all endpoint datasets. Based on these results, researchers can better understand and select optimal feature types, molecular representations, and baseline algorithms for each endpoint prediction task. We believe that the rich information on compound toxicology, ML-ready datasets, benchmarks and molecular representation distribution can greatly facilitate toxicological investigations, interpretation of toxicological mechanisms, compound/drug discovery and the development of computational methods.
Collapse
Affiliation(s)
- Lianlian Wu
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China
| | - Bowei Yan
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
- State Key Laboratory of Genetic Engineering and Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Institute of Biomedical Sciences, Human Phenome Institute, Fudan University, Shanghai 200433, China
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences, Beijing 102206, China
| | - Junshan Han
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Ruijiang Li
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Jian Xiao
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008, Hunan, China
- Institute for Rational and Safe Medication Practices, National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008, Hunan, China
| | - Song He
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaochen Bo
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China
| |
Collapse
|
34
|
Bao L, Wang Z, Wu Z, Luo H, Yu J, Kang Y, Cao D, Hou T. Kinome-wide polypharmacology profiling of small molecules by multi-task graph isomorphism network approach. Acta Pharm Sin B 2023; 13:54-67. [PMID: 36815050 PMCID: PMC9939366 DOI: 10.1016/j.apsb.2022.05.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 04/15/2022] [Accepted: 04/30/2022] [Indexed: 11/18/2022] Open
Abstract
Prediction of the interactions between small molecules and their targets play important roles in various applications of drug development, such as lead discovery, drug repurposing and elucidation of potential drug side effects. Therefore, a variety of machine learning-based models have been developed to predict these interactions. In this study, a model called auxiliary multi-task graph isomorphism network with uncertainty weighting (AMGU) was developed to predict the inhibitory activities of small molecules against 204 different kinases based on the multi-task Graph Isomorphism Network (MT-GIN) with the auxiliary learning and uncertainty weighting strategy. The calculation results illustrate that the AMGU model outperformed the descriptor-based models and state-of-the-art graph neural networks (GNN) models on the internal test set. Furthermore, it also exhibited much better performance on two external test sets, suggesting that the AMGU model has enhanced generalizability due to its great transfer learning capacity. Then, a naïve model-agnostic interpretable method for GNN called edges masking was devised to explain the underlying predictive mechanisms, and the consistency of the interpretability results for 5 typical epidermal growth factor receptor (EGFR) inhibitors with their structure‒activity relationships could be observed. Finally, a free online web server called KIP was developed to predict the kinome-wide polypharmacology effects of small molecules (http://cadd.zju.edu.cn/kip).
Collapse
Affiliation(s)
- Lingjie Bao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Zhe Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Hao Luo
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Jiahui Yu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Corresponding authors. Tel./fax: +86 571 88208412.
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, China
- Corresponding authors. Tel./fax: +86 571 88208412.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058, China
- Corresponding authors. Tel./fax: +86 571 88208412.
| |
Collapse
|
35
|
Becker M, Dai J, Chang AL, Feyaerts D, Stelzer IA, Zhang M, Berson E, Saarunya G, De Francesco D, Espinosa C, Kim Y, Marić I, Mataraso S, Payrovnaziri SN, Phongpreecha T, Ravindra NG, Shome S, Tan Y, Thuraiappah M, Xue L, Mayo JA, Quaintance CC, Laborde A, King LS, Dhabhar FS, Gotlib IH, Wong RJ, Angst MS, Shaw GM, Stevenson DK, Gaudilliere B, Aghaeepour N. Revealing the impact of lifestyle stressors on the risk of adverse pregnancy outcomes with multitask machine learning. Front Pediatr 2022; 10:933266. [PMID: 36582513 PMCID: PMC9793100 DOI: 10.3389/fped.2022.933266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 11/14/2022] [Indexed: 12/15/2022] Open
Abstract
Psychosocial and stress-related factors (PSFs), defined as internal or external stimuli that induce biological changes, are potentially modifiable factors and accessible targets for interventions that are associated with adverse pregnancy outcomes (APOs). Although individual APOs have been shown to be connected to PSFs, they are biologically interconnected, relatively infrequent, and therefore challenging to model. In this context, multi-task machine learning (MML) is an ideal tool for exploring the interconnectedness of APOs on the one hand and building on joint combinatorial outcomes to increase predictive power on the other hand. Additionally, by integrating single cell immunological profiling of underlying biological processes, the effects of stress-based therapeutics may be measurable, facilitating the development of precision medicine approaches. Objectives The primary objectives were to jointly model multiple APOs and their connection to stress early in pregnancy, and to explore the underlying biology to guide development of accessible and measurable interventions. Materials and Methods In a prospective cohort study, PSFs were assessed during the first trimester with an extensive self-filled questionnaire for 200 women. We used MML to simultaneously model, and predict APOs (severe preeclampsia, superimposed preeclampsia, gestational diabetes and early gestational age) as well as several risk factors (BMI, diabetes, hypertension) for these patients based on PSFs. Strongly interrelated stressors were categorized to identify potential therapeutic targets. Furthermore, for a subset of 14 women, we modeled the connection of PSFs to the maternal immune system to APOs by building corresponding ML models based on an extensive single cell immune dataset generated by mass cytometry time of flight (CyTOF). Results Jointly modeling APOs in a MML setting significantly increased modeling capabilities and yielded a highly predictive integrated model of APOs underscoring their interconnectedness. Most APOs were associated with mental health, life stress, and perceived health risks. Biologically, stressors were associated with specific immune characteristics revolving around CD4/CD8 T cells. Immune characteristics predicted based on stress were in turn found to be associated with APOs. Conclusions Elucidating connections among stress, multiple APOs simultaneously, and immune characteristics has the potential to facilitate the implementation of ML-based, individualized, integrative models of pregnancy in clinical decision making. The modifiable nature of stressors may enable the development of accessible interventions, with success tracked through immune characteristics.
Collapse
Affiliation(s)
- Martin Becker
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
- Chair for Intelligent Data Analytics, Institute for Visual and Analytic Computing, Department of Computer Science and Electrical Engineering, University of Rostock, Rostock, Germany
| | - Jennifer Dai
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Alan L. Chang
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Dorien Feyaerts
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
| | - Ina A. Stelzer
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
| | - Miao Zhang
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Eloise Berson
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Pathology, Stanford University, Palo Alto, CA, United States
| | - Geetha Saarunya
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Davide De Francesco
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Camilo Espinosa
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Yeasul Kim
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Ivana Marić
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Samson Mataraso
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Seyedeh Neelufar Payrovnaziri
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Thanaphong Phongpreecha
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
- Department of Pathology, Stanford University, Palo Alto, CA, United States
| | - Neal G. Ravindra
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Sayane Shome
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Yuqi Tan
- Department of Microbiology & Immunology, Stanford University, Palo Alto, CA, United States
- Baxter Laboratory for Stem Cell Biology, Stanford University, Palo Alto, CA, United States
| | - Melan Thuraiappah
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Lei Xue
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Jonathan A. Mayo
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
| | | | - Ana Laborde
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
| | - Lucy S. King
- Department of Psychology, Stanford University, Palo Alto, CA, United States
| | - Firdaus S. Dhabhar
- Department of Psychiatry & Behavioral Science, University of Miami, Miami, FL, United States
- Department of Microbiology & Immunology, University of Miami, Miami, FL, United States
- Sylvester Comprehensive Cancer Center, University of Miami, Miami, FL, United States
- Miller School of Medicine, University of Miami, Miami, FL, United States
| | - Ian H. Gotlib
- Department of Psychology, Stanford University, Palo Alto, CA, United States
| | - Ronald J. Wong
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| | - Martin S. Angst
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
| | - Gary M. Shaw
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
| | - David K. Stevenson
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
| | - Brice Gaudilliere
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
| | - Nima Aghaeepour
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Palo Alto, CA, United States
- Department of Pediatrics, Stanford University, Palo Alto, CA, United States
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| |
Collapse
|
36
|
Wu J, Wang J, Wu Z, Zhang S, Deng Y, Kang Y, Cao D, Hsieh CY, Hou T. ALipSol: An Attention-Driven Mixture-of-Experts Model for Lipophilicity and Solubility Prediction. J Chem Inf Model 2022; 62:5975-5987. [PMID: 36417544 DOI: 10.1021/acs.jcim.2c01290] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Lipophilicity (logD) and aqueous solubility (logSw) play a central role in drug development. The accurate prediction of these properties remains to be solved due to data scarcity. Current methodologies neglect the intrinsic relationships between physicochemical properties and usually ignore the ionization effects. Here, we propose an attention-driven mixture-of-experts (MoE) model named ALipSol, which explicitly reproduces the hierarchy of task relationships. We adopt the principle of divide-and-conquer by breaking down the complex end point (logD or logSw) into simpler ones (acidic pKa, basic pKa, and logP) and allocating a specific expert network for each subproblem. Subsequently, we implement transfer learning to extract knowledge from related tasks, thus alleviating the dilemma of limited data. Additionally, we substitute the gating network with an attention mechanism to better capture the dynamic task relationships on a per-example basis. We adopt local fine-tuning and consensus prediction to further boost model performance. Extensive evaluation experiments verify the success of the ALipSol model, which achieves RMSE improvement of 8.04%, 2.49%, 8.57%, 12.8%, and 8.60% on the Lipop, ESOL, AqSolDB, external logD, and external logS data sets, respectively, compared with Attentive FP and the state-of-the-art in silico tools. In particular, our model yields more significant advantages (Welch's t-test) for small training data, implying its high robustness and generalizability. The interpretability analysis proves that the atom contributions learned by ALipSol are more reasonable compared with the vanilla Attentive FP, and the substitution effects in benzene derivatives agreed well with empirical constants, revealing the potential of our model to extract useful patterns from data and provide guidance for lead optimization.
Collapse
Affiliation(s)
- Jialu Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018Zhejiang, P. R. China
| | - Junmei Wang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, Pennsylvania15261, United States
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018Zhejiang, P. R. China
| | - Shengyu Zhang
- Tencent Quantum Laboratory, Tencent, Shenzhen, 518057Guangdong, P. R. China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018Zhejiang, P. R. China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410004Hunan, P. R. China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China
| |
Collapse
|
37
|
Liao J, Chen H, Wei L, Wei L. GSAML-DTA: An interpretable drug-target binding affinity prediction model based on graph neural networks with self-attention mechanism and mutual information. Comput Biol Med 2022; 150:106145. [PMID: 37859276 DOI: 10.1016/j.compbiomed.2022.106145] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 08/23/2022] [Accepted: 09/24/2022] [Indexed: 11/03/2022]
Abstract
Identifying drug-target affinity (DTA) has great practical importance in the process of designing efficacious drugs for known diseases. Recently, numerous deep learning-based computational methods have been developed to predict drug-target affinity and achieved impressive performance. However, most of them construct the molecule (drug or target) encoder without considering the weights of features of each node (atom or residue). Besides, they generally combine drug and target representations directly, which may contain irrelevant-task information. In this study, we develop GSAML-DTA, an interpretable deep learning framework for DTA prediction. GSAML-DTA integrates a self-attention mechanism and graph neural networks (GNNs) to build representations of drugs and target proteins from the structural information. In addition, mutual information is introduced to filter out redundant information and retain relevant information in the combined representations of drugs and targets. Extensive experimental results demonstrate that GSAML-DTA outperforms state-of-the-art methods for DTA prediction on two benchmark datasets. Furthermore, GSAML-DTA has the interpretation ability to analyze binding atoms and residues, which may be conducive to chemical biology studies from data. Overall, GSAML-DTA can serve as a powerful and interpretable tool suitable for DTA modelling.
Collapse
Affiliation(s)
- Jiaqi Liao
- School of Software, Shandong University, Jinan, China
| | - Haoyang Chen
- School of Software, Shandong University, Jinan, China
| | - Lesong Wei
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan.
| | - Leyi Wei
- School of Software, Shandong University, Jinan, China.
| |
Collapse
|
38
|
Deep learning methods for molecular representation and property prediction. Drug Discov Today 2022; 27:103373. [PMID: 36167282 DOI: 10.1016/j.drudis.2022.103373] [Citation(s) in RCA: 62] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 08/22/2022] [Accepted: 09/21/2022] [Indexed: 01/11/2023]
Abstract
With advances in artificial intelligence (AI) methods, computer-aided drug design (CADD) has developed rapidly in recent years. Effective molecular representation and accurate property prediction are crucial tasks in CADD workflows. In this review, we summarize contemporary applications of deep learning (DL) methods for molecular representation and property prediction. We categorize DL methods according to the format of molecular data (1D, 2D, and 3D). In addition, we discuss some common DL models, such as ensemble learning and transfer learning, and analyze the interpretability methods for these models. We also highlight the challenges and opportunities of DL methods for molecular representation and property prediction.
Collapse
|
39
|
da Costa APL, Silva JRA, de Molfetta FA. Computational discovery of sulfonamide derivatives as potential inhibitors of the cruzain enzyme from T. cruzi by molecular docking, molecular dynamics and MM/GBSA approaches. MOLECULAR SIMULATION 2022. [DOI: 10.1080/08927022.2022.2120625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Affiliation(s)
- Ana Paula Lima da Costa
- Laboratório de Modelagem Molecular, Instituto de Ciências Exatas e Naturais, Universidade Federal do Pará, Belém, Brazil
| | - José Rogério A. Silva
- Laboratório de Planejamento e Desenvolvimento de Fármacos, Instituto de Ciências Exatas e Naturais, Universidade Federal do Pará, Belém, Brazil
| | - Fábio Alberto de Molfetta
- Laboratório de Modelagem Molecular, Instituto de Ciências Exatas e Naturais, Universidade Federal do Pará, Belém, Brazil
| |
Collapse
|
40
|
Wang J, Lou C, Liu G, Li W, Wu Z, Tang Y. Profiling prediction of nuclear receptor modulators with multi-task deep learning methods: toward the virtual screening. Brief Bioinform 2022; 23:6673852. [PMID: 35998896 DOI: 10.1093/bib/bbac351] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 07/13/2022] [Accepted: 07/27/2022] [Indexed: 11/13/2022] Open
Abstract
Nuclear receptors (NRs) are ligand-activated transcription factors, which constitute one of the most important targets for drug discovery. Current computational strategies mainly focus on a single target, and the transfer of learned knowledge among NRs was not considered yet. Herein we proposed a novel computational framework named NR-Profiler for prediction of potential NR modulators with high affinity and specificity. First, we built a comprehensive NR data set including 42 684 interactions to connect 42 NRs and 31 033 compounds. Then, we used multi-task deep neural network and multi-task graph convolutional neural network architectures to construct multi-task multi-classification models. To improve the predictive capability and robustness, we built a consensus model with an area under the receiver operating characteristic curve (AUC) = 0.883. Compared with conventional machine learning and structure-based approaches, the consensus model showed better performance in external validation. Using this consensus model, we demonstrated the practical value of NR-Profiler in virtual screening for NRs. In addition, we designed a selectivity score to quantitatively measure the specificity of NR modulators. Finally, we developed a freely available standalone software for users to make profiling predictions for their compounds of interest. In summary, our NR-Profiler provides a useful tool for NR-profiling prediction and is expected to facilitate NR-based drug discovery.
Collapse
Affiliation(s)
- Jiye Wang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Chaofeng Lou
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Zengrui Wu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| |
Collapse
|
41
|
Shen C, Zhang X, Deng Y, Gao J, Wang D, Xu L, Pan P, Hou T, Kang Y. Boosting Protein-Ligand Binding Pose Prediction and Virtual Screening Based on Residue-Atom Distance Likelihood Potential and Graph Transformer. J Med Chem 2022; 65:10691-10706. [PMID: 35917397 DOI: 10.1021/acs.jmedchem.2c00991] [Citation(s) in RCA: 71] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
The past few years have witnessed enormous progress toward applying machine learning approaches to the development of protein-ligand scoring functions. However, the robust performance and wide applicability of scoring functions remain a big challenge for increasing the success rate of docking-based virtual screening. Herein, a novel scoring function named RTMScore was developed by introducing a tailored residue-based graph representation strategy and several graph transformer layers for the learning of protein and ligand representations, followed by a mixture density network to obtain residue-atom distance likelihood potential. Our approach was resolutely validated on the CASF-2016 benchmark, and the results indicate that RTMScore can outperform almost all of the other state-of-the-art methods in terms of both the docking and screening powers. Further evaluation confirms the robustness of our approach that can not only retain its docking power on cross-docked poses but also achieve improved performance as a rescoring tool in larger-scale virtual screening.
Collapse
Affiliation(s)
- Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058, China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou, Zhejiang 310018, China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, Zhejiang 310018, China
| | - Junbo Gao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Dong Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| |
Collapse
|
42
|
Wang MS, Gong Y, Zhuo LS, Shi XX, Tian YG, Huang CK, Huang W, Yang GF. Distribution- and Metabolism-Based Drug Discovery: A Potassium-Competitive Acid Blocker as a Proof of Concept. Research (Wash D C) 2022; 2022:9852518. [PMID: 35958113 PMCID: PMC9343080 DOI: 10.34133/2022/9852518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Accepted: 06/29/2022] [Indexed: 11/06/2022] Open
Abstract
Conventional methods of drug design require compromise in the form of side effects to achieve sufficient efficacy because targeting drugs to specific organs remains challenging. Thus, new strategies to design organ-specific drugs that induce little toxicity are needed. Based on characteristic tissue niche-mediated drug distribution (TNMDD) and patterns of drug metabolism into specific intermediates, we propose a strategy of distribution- and metabolism-based drug design (DMBDD); through a physicochemical property-driven distribution optimization cooperated with a well-designed metabolism pathway, SH-337, a candidate potassium-competitive acid blocker (P-CAB), was designed. SH-337 showed specific distribution in the stomach in the long term and was rapidly cleared from the systemic compartment. Therefore, SH-337 exerted a comparable pharmacological effect but a 3.3-fold higher no observed adverse effect level (NOAEL) compared with FDA-approved vonoprazan. This study contributes a proof-of-concept demonstration of DMBDD and provides a new perspective for the development of highly efficient, organ-specific drugs with low toxicity.
Collapse
Affiliation(s)
- Ming-Shu Wang
- Key Laboratory of Pesticide & Chemical Biology of Ministry of Education, International Joint Research Center for Intelligent Biosensor Technology and Health, College of Chemistry, Central China Normal University, Wuhan 430079, China
| | - Yi Gong
- Key Laboratory of Pesticide & Chemical Biology of Ministry of Education, International Joint Research Center for Intelligent Biosensor Technology and Health, College of Chemistry, Central China Normal University, Wuhan 430079, China
| | - Lin-Sheng Zhuo
- Key Laboratory of Pesticide & Chemical Biology of Ministry of Education, International Joint Research Center for Intelligent Biosensor Technology and Health, College of Chemistry, Central China Normal University, Wuhan 430079, China
| | - Xing-Xing Shi
- Key Laboratory of Pesticide & Chemical Biology of Ministry of Education, International Joint Research Center for Intelligent Biosensor Technology and Health, College of Chemistry, Central China Normal University, Wuhan 430079, China
| | - Yan-Guang Tian
- Key Laboratory of Pesticide & Chemical Biology of Ministry of Education, International Joint Research Center for Intelligent Biosensor Technology and Health, College of Chemistry, Central China Normal University, Wuhan 430079, China
| | - Chang-Kang Huang
- Nanjing Shuohui Pharmatechnology Co., Ltd., Nanjing 210046, China
| | - Wei Huang
- Key Laboratory of Pesticide & Chemical Biology of Ministry of Education, International Joint Research Center for Intelligent Biosensor Technology and Health, College of Chemistry, Central China Normal University, Wuhan 430079, China
| | - Guang-Fu Yang
- Key Laboratory of Pesticide & Chemical Biology of Ministry of Education, International Joint Research Center for Intelligent Biosensor Technology and Health, College of Chemistry, Central China Normal University, Wuhan 430079, China
| |
Collapse
|
43
|
Du H, Jiang D, Gao J, Zhang X, Jiang L, Zeng Y, Wu Z, Shen C, Xu L, Cao D, Hou T, Pan P. Proteome-Wide Profiling of the Covalent-Druggable Cysteines with a Structure-Based Deep Graph Learning Network. Research (Wash D C) 2022; 2022:9873564. [PMID: 35958111 PMCID: PMC9343084 DOI: 10.34133/2022/9873564] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Accepted: 06/27/2022] [Indexed: 11/06/2022] Open
Abstract
Covalent ligands have attracted increasing attention due to their unique advantages, such as long residence time, high selectivity, and strong binding affinity. They also show promise for targets where previous efforts to identify noncovalent small molecule inhibitors have failed. However, our limited knowledge of covalent binding sites has hindered the discovery of novel ligands. Therefore, developing in silico methods to identify covalent binding sites is highly desirable. Here, we propose DeepCoSI, the first structure-based deep graph learning model to identify ligandable covalent sites in the protein. By integrating the characterization of the binding pocket and the interactions between each cysteine and the surrounding environment, DeepCoSI achieves state-of-the-art predictive performances. The validation on two external test sets which mimic the real application scenarios shows that DeepCoSI has strong ability to distinguish ligandable sites from the others. Finally, we profiled the entire set of protein structures in the RCSB Protein Data Bank (PDB) with DeepCoSI to evaluate the ligandability of each cysteine for covalent ligand design, and made the predicted data publicly available on website.
Collapse
Affiliation(s)
- Hongyan Du
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Junbo Gao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Lingxiao Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Yundian Zeng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410004 Hunan, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| |
Collapse
|
44
|
Yang Y, Wu Z, Yao X, Kang Y, Hou T, Hsieh CY, Liu H. Exploring Low-Toxicity Chemical Space with Deep Learning for Molecular Generation. J Chem Inf Model 2022; 62:3191-3199. [PMID: 35713712 DOI: 10.1021/acs.jcim.2c00671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Creating a wide range of new compounds that not only have ideal pharmacological properties but also easily pass long-term toxicity evaluation is still a challenging task in current drug discovery. In this study, we developed a conditional generative model by combining a semisupervised variational autoencoder (SSVAE) with an MGA toxicity predictor. Our aim is to generate molecules with low toxicity, good drug-like properties, and structural diversity. For multiobjective optimization, we have developed a method with hierarchical constraints on the toxicity space of small molecules to generate drug-like small molecules, which can also minimize the effect on the diversity of generated results. The evaluation results of the metrics indicate that the developed model has good effectiveness, novelty, and diversity. The generated molecules by this model are mainly distributed in low-toxicity regions, which suggests that our model can efficiently constrain the generation of toxic structures. In contrast to simply filtering toxic ones after generation, the low-toxicity molecular generative model can generate molecules with structural diversity. Our strategy can be used in target-based drug discovery to improve the quality of generated molecules with low-toxicity, drug-like, and highly active properties.
Collapse
Affiliation(s)
- Yuwei Yang
- School of Pharmacy, Lanzhou University, Lanzhou 730000, China
| | - Zhenxing Wu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Xiaojun Yao
- College of Chemistry and Chemical Engineering, Lanzhou University, Lanzhou 730000, China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Chang-Yu Hsieh
- Tencent Quantum Laboratory, Tencent, Shenzhen 518000, China
| | - Huanxiang Liu
- School of Pharmacy, Lanzhou University, Lanzhou 730000, China.,Faculty of Applied Science, Macao Polytechnic University, Macao, SAR 999078, China
| |
Collapse
|
45
|
Wang M, Hsieh CY, Wang J, Wang D, Weng G, Shen C, Yao X, Bing Z, Li H, Cao D, Hou T. RELATION: A Deep Generative Model for Structure-Based De Novo Drug Design. J Med Chem 2022; 65:9478-9492. [PMID: 35713420 DOI: 10.1021/acs.jmedchem.2c00732] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Deep learning (DL)-based de novo molecular design has recently gained considerable traction. Many DL-based generative models have been successfully developed to design novel molecules, but most of them are ligand-centric and the role of the 3D geometries of target binding pockets in molecular generation has not been well-exploited. Here, we proposed a new 3D-based generative model called RELATION. In the RELATION model, the BiTL algorithm was specifically designed to extract and transfer the desired geometric features of the protein-ligand complexes to a latent space for generation. The pharmacophore conditioning and docking-based Bayesian sampling were applied to efficiently navigate the vast chemical space for the design of molecules with desired geometric properties and pharmacophore features. As a proof of concept, the RELATION model was used to design inhibitors for two targets, AKT1 and CDK2. The calculation results demonstrated that the RELATION model could efficiently generate novel molecules with favorable binding affinity and pharmacophore features.
Collapse
Affiliation(s)
- Mingyang Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Chang-Yu Hsieh
- Tencent, Tencent Quantum Lab, Shenzhen 518057, Guangdong, P. R. China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dong Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Gaoqi Weng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Xiaojun Yao
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery Macau Institute for Applied Research in Medicine and Health State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Taipa 999078, Macau, P. R. China
| | - Zhitong Bing
- Institute of Modern Physics, Chinese Academy of Sciences, Lanzhou 730000, P. R. China
| | - Honglin Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai 200237, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| |
Collapse
|
46
|
Kim H, Park M, Lee I, Nam H. BayeshERG: a robust, reliable and interpretable deep learning model for predicting hERG channel blockers. Brief Bioinform 2022; 23:6609519. [PMID: 35709752 DOI: 10.1093/bib/bbac211] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2022] [Revised: 04/19/2022] [Accepted: 05/06/2022] [Indexed: 11/13/2022] Open
Abstract
Unintended inhibition of the human ether-à-go-go-related gene (hERG) ion channel by small molecules leads to severe cardiotoxicity. Thus, hERG channel blockage is a significant concern in the development of new drugs. Several computational models have been developed to predict hERG channel blockage, including deep learning models; however, they lack robustness, reliability and interpretability. Here, we developed a graph-based Bayesian deep learning model for hERG channel blocker prediction, named BayeshERG, which has robust predictive power, high reliability and high resolution of interpretability. First, we applied transfer learning with 300 000 large data in initial pre-training to increase the predictive performance. Second, we implemented a Bayesian neural network with Monte Carlo dropout to calibrate the uncertainty of the prediction. Third, we utilized global multihead attentive pooling to augment the high resolution of structural interpretability for the hERG channel blockers and nonblockers. We conducted both internal and external validations for stringent evaluation; in particular, we benchmarked most of the publicly available hERG channel blocker prediction models. We showed that our proposed model outperformed predictive performance and uncertainty calibration performance. Furthermore, we found that our model learned to focus on the essential substructures of hERG channel blockers via an attention mechanism. Finally, we validated the prediction results of our model by conducting in vitro experiments and confirmed its high validity. In summary, BayeshERG could serve as a versatile tool for discovering hERG channel blockers and helping maximize the possibility of successful drug discovery. The data and source code are available at our GitHub repository (https://github.com/GIST-CSBL/BayeshERG).
Collapse
Affiliation(s)
- Hyunho Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Buk-gu, Gwangju, 61005, Republic of Korea
| | - Minsu Park
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Buk-gu, Gwangju, 61005, Republic of Korea
| | - Ingoo Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Buk-gu, Gwangju, 61005, Republic of Korea
| | - Hojung Nam
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Buk-gu, Gwangju, 61005, Republic of Korea
| |
Collapse
|
47
|
Xu M, Yang H, Liu G, Tang Y, Li W. In Silico Prediction of Chemical Aquatic Toxicity by Multiple Machine Learning and Deep Learning Approaches. J Appl Toxicol 2022; 42:1766-1776. [PMID: 35653511 DOI: 10.1002/jat.4354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 05/16/2022] [Accepted: 05/31/2022] [Indexed: 11/08/2022]
Abstract
Fish is one of the model animals used to evaluate the adverse effects of a chemical exposed to the ecosystem. However, its low throughput and relevantly high expense make it impossible to test all new chemicals in manufacture. Hence using in silico models to prioritize compounds to be tested has been widely applied in environmental risk assessment and drug discovery. In this study, we constructed the local predictive models for four fish species, including bluegill sunfish, rainbow trout, fathead minnow, and sheepshead minnow, and the global models with all four fish data. A total of 1874 unique compounds with their labels, i.e. toxic (LC50 < 10 ppm) or nontoxic were collected from ECOTOX and literature. Both conventional machine learning methods and the deep learning architecture, graph convolutional network (GCN), were used to build predictive models. The classification accuracy of the best local model for each fish species was higher than 0.83. For the global models, two strategies including consistency prediction and probability threshold were adopted to improve the predictive capability at the cost of limiting applicability domain. For 63% of compounds in domain, the accuracy was around 0.97. By comparison of the deep learning and machine learning methods, we found that the single-task GCN showed specific advantages in performance and multi-task GCN showed no advantages over the conventional machine learning methods. The data and models are available on GitHub (https://github.com/ChemPredict/ChemicalAquaticToxicity).
Collapse
Affiliation(s)
- Minjie Xu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Hongbin Yang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| |
Collapse
|
48
|
Lou C, Yang H, Wang J, Huang M, Li W, Liu G, Lee PW, Tang Y. IDL-PPBopt: A Strategy for Prediction and Optimization of Human Plasma Protein Binding of Compounds via an Interpretable Deep Learning Method. J Chem Inf Model 2022; 62:2788-2799. [PMID: 35607907 DOI: 10.1021/acs.jcim.2c00297] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
The prediction and optimization of pharmacokinetic properties are essential in lead optimization. Traditional strategies mainly depend on the empirical chemical rules from medicinal chemists. However, with the rising amount of data, it is getting more difficult to manually extract useful medicinal chemistry knowledge. To this end, we introduced IDL-PPBopt, a computational strategy for predicting and optimizing the plasma protein binding (PPB) property based on an interpretable deep learning method. At first, a curated PPB data set was used to construct an interpretable deep learning model, which showed excellent predictive performance with a root mean squared error of 0.112 for the entire test set. Then, we designed a detection protocol based on the model and Wilcoxon test to identify the PPB-related substructures (named privileged substructures, PSubs) for each molecule. In total, 22 general privileged substructures (GPSubs) were identified, which shared some common features such as nitrogen-containing groups, diamines with two carbon units, and azetidine. Furthermore, a series of second-level chemical rules for each GPSub were derived through a statistical test and then summarized into substructure pairs. We demonstrated that these substructure pairs were equally applicable outside the training set and accordingly customized the structural modification schemes for each GPSub, which provided alternatives for the optimization of the PPB property. Therefore, IDL-PPBopt provides a promising scheme for the prediction and optimization of the PPB property and would be helpful for lead optimization of other pharmacokinetic properties.
Collapse
Affiliation(s)
- Chaofeng Lou
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Hongbin Yang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Jiye Wang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Mengting Huang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Philip W Lee
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
49
|
An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble Learning. Molecules 2022; 27:molecules27103112. [PMID: 35630587 PMCID: PMC9147181 DOI: 10.3390/molecules27103112] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 05/05/2022] [Accepted: 05/10/2022] [Indexed: 11/19/2022] Open
Abstract
In the process of drug discovery, drug-induced liver injury (DILI) is still an active research field and is one of the most common and important issues in toxicity evaluation research. It directly leads to the high wear attrition of the drug. At present, there are a variety of computer algorithms based on molecular representations to predict DILI. It is found that a single molecular representation method is insufficient to complete the task of toxicity prediction, and multiple molecular fingerprint fusion methods have been used as model input. In order to solve the problem of high dimensional and unbalanced DILI prediction data, this paper integrates existing datasets and designs a new algorithm framework, Rotation-Ensemble-GA (R-E-GA). The main idea is to find a feature subset with better predictive performance after rotating the fusion vector of high-dimensional molecular representation in the feature space. Then, an Adaboost-type ensemble learning method is integrated into R-E-GA to improve the prediction accuracy. The experimental results show that the performance of R-E-GA is better than other state-of-art algorithms including ensemble learning-based and graph neural network-based methods. Through five-fold cross-validation, the R-E-GA obtains an ACC of 0.77, an F1 score of 0.769, and an AUC of 0.842.
Collapse
|
50
|
Wu Z, Jiang D, Wang J, Zhang X, Du H, Pan L, Hsieh CY, Cao D, Hou T. Knowledge-based BERT: a method to extract molecular features such as computational chemists. Brief Bioinform 2022; 23:6570013. [PMID: 35438145 DOI: 10.1093/bib/bbac131] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 03/16/2022] [Accepted: 03/18/2022] [Indexed: 11/12/2022] Open
Abstract
Molecular property prediction models based on machine learning algorithms have become important tools to triage unpromising lead molecules in the early stages of drug discovery. Compared with the mainstream descriptor- and graph-based methods for molecular property predictions, SMILES-based methods can directly extract molecular features from SMILES without human expert knowledge, but they require more powerful algorithms for feature extraction and a larger amount of data for training, which makes SMILES-based methods less popular. Here, we show the great potential of pre-training in promoting the predictions of important pharmaceutical properties. By utilizing three pre-training tasks based on atom feature prediction, molecular feature prediction and contrastive learning, a new pre-training method K-BERT, which can extract chemical information from SMILES like chemists, was developed. The calculation results on 15 pharmaceutical datasets show that K-BERT outperforms well-established descriptor-based (XGBoost) and graph-based (Attentive FP and HRGCN+) models. In addition, we found that the contrastive learning pre-training task enables K-BERT to 'understand' SMILES not limited to canonical SMILES. Moreover, the general fingerprints K-BERT-FP generated by K-BERT exhibit comparative predictive power to MACCS on 15 pharmaceutical datasets and can also capture molecular size and chirality information that traditional binary fingerprints cannot capture. Our results illustrate the great potential of K-BERT in the practical applications of molecular property predictions in drug discovery.
Collapse
Affiliation(s)
- Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.,Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.,National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan 430072, Hubei, P. R. China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Hongyan Du
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Lurong Pan
- Global Health Drug Discovery Institute, Beijing 100192, P. R. China
| | - Chang-Yu Hsieh
- Tencent Quantum Laboratory, Tencent, Shenzhen 518057, Guangdong, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410004, Hunan, P. R. China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.,Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| |
Collapse
|