1
|
Seal S, Mahale M, García-Ortegón M, Joshi CK, Hosseini-Gerami L, Beatson A, Greenig M, Shekhar M, Patra A, Weis C, Mehrjou A, Badré A, Paisley B, Lowe R, Singh S, Shah F, Johannesson B, Williams D, Rouquie D, Clevert DA, Schwab P, Richmond N, Nicolaou CA, Gonzalez RJ, Naven R, Schramm C, Vidler LR, Mansouri K, Walters WP, Wilk DD, Spjuth O, Carpenter AE, Bender A. Machine Learning for Toxicity Prediction Using Chemical Structures: Pillars for Success in the Real World. Chem Res Toxicol 2025. [PMID: 40314361 DOI: 10.1021/acs.chemrestox.5c00033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2025]
Abstract
Machine learning (ML) is increasingly valuable for predicting molecular properties and toxicity in drug discovery. However, toxicity-related end points have always been challenging to evaluate experimentally with respect to in vivo translation due to the required resources for human and animal studies; this has impacted data availability in the field. ML can augment or even potentially replace traditional experimental processes depending on the project phase and specific goals of the prediction. For instance, models can be used to select promising compounds for on-target effects or to deselect those with undesirable characteristics (e.g., off-target or ineffective due to unfavorable pharmacokinetics). However, reliance on ML is not without risks, due to biases stemming from nonrepresentative training data, incompatible choice of algorithm to represent the underlying data, or poor model building and validation approaches. This might lead to inaccurate predictions, misinterpretation of the confidence in ML predictions, and ultimately suboptimal decision-making. Hence, understanding the predictive validity of ML models is of utmost importance to enable faster drug development timelines while improving the quality of decisions. This perspective emphasizes the need to enhance the understanding and application of machine learning models in drug discovery, focusing on well-defined data sets for toxicity prediction based on small molecule structures. We focus on five crucial pillars for success with ML-driven molecular property and toxicity prediction: (1) data set selection, (2) structural representations, (3) model algorithm, (4) model validation, and (5) translation of predictions to decision-making. Understanding these key pillars will foster collaboration and coordination between ML researchers and toxicologists, which will help to advance drug discovery and development.
Collapse
Affiliation(s)
- Srijit Seal
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States
- Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, U.K
| | - Manas Mahale
- Department of Pharmaceutical Chemistry, Bombay College of Pharmacy, Mumbai 400098, India
| | | | - Chaitanya K Joshi
- Department of Computer Science and Technology, University of Cambridge, Cambridge CB3 0FD, U.K
| | | | - Alex Beatson
- Axiom Bio, San Francisco, California 94107, United States
| | - Matthew Greenig
- Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, U.K
| | - Mrinal Shekhar
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States
| | | | | | | | - Adrien Badré
- Novartis Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Brianna Paisley
- Eli Lilly & Company, Indianapolis, Indiana 46285, United States
| | | | - Shantanu Singh
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States
| | - Falgun Shah
- Non Clinical Drug Safety, Merck Inc., West Point, Pennsylvania 19486, United States
| | | | | | - David Rouquie
- Toxicology Data Science, Bayer SAS Crop Science Division, Valbonne Sophia-Antipolis 06560, France
| | - Djork-Arné Clevert
- Pfizer, Worldwide Research, Development and Medical, Machine Learning & Computational Sciences, Berlin 10922, Germany
| | | | | | - Christos A Nicolaou
- Computational Drug Design, Digital Science & Innovation, Novo Nordisk US R&D, Lexington, Massachusetts 02421, United States
| | - Raymond J Gonzalez
- Non Clinical Drug Safety, Merck Inc., West Point, Pennsylvania 19486, United States
| | - Russell Naven
- Novartis Biomedical Research, Cambridge, Massachusetts 02139, United States
| | | | | | - Kamel Mansouri
- NIH/NIEHS/DTT/NICEATM, Research Triangle Park, North Carolina 27709, United States
| | | | | | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala 751 24, Sweden
- Phenaros Pharmaceuticals AB, Uppsala 75239, Sweden
| | - Anne E Carpenter
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States
| | - Andreas Bender
- Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, U.K
- College of Medicine and Health Sciences, Khalifa University of Science and Technology, Abu Dhabi 127788, United Arab Emirates
| |
Collapse
|
2
|
Chen YQ, Yu T, Song ZQ, Wang CY, Luo JT, Xiao Y, Qiu H, Wang QQ, Jin HM. Application of Large Language Models in Drug-Induced Osteotoxicity Prediction. J Chem Inf Model 2025; 65:3370-3379. [PMID: 40114317 DOI: 10.1021/acs.jcim.5c00275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/22/2025]
Abstract
Drug-induced osteotoxicity refers to the harmful effects certain drugs have on the skeletal system, posing significant safety risks. These toxic effects are a key concern in clinical practice, drug development, and environmental management. However, existing toxicity assessment models lack specialized data sets and algorithms for predicting osteotoxicity. In our study, we collected osteotoxic molecules and employed various large language models, including DeepSeek and ChatGPT, alongside traditional machine learning methods to predict their properties. Among these, the DeepSeek R1 and ChatGPT o3 models achieved ACC values of 0.87 and 0.88, respectively. Our results indicate that machine learning methods can assist in evaluating the impact of harmful substances on bone health during drug development, improving safety protocols, mitigating skeletal side effects, and enhancing treatment outcomes and public safety. Furthermore, it highlights the potential of large language models in predicting molecular toxicity and their significance in the fields of health and chemical sciences.
Collapse
Affiliation(s)
- Yi-Qi Chen
- Department of Orthopaedics, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, Wenzhou 32500, China
| | - Tao Yu
- Department of Orthopaedics, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, Wenzhou 32500, China
| | - Zheng-Qi Song
- Department of Orthopaedics, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, Wenzhou 32500, China
| | - Chen-Yu Wang
- Department of Orthopaedics, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, Wenzhou 32500, China
| | - Jiang-Tao Luo
- Department of Orthopaedics, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, Wenzhou 32500, China
| | - Yong Xiao
- Department of Orthopaedics, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, Wenzhou 32500, China
| | - Heng Qiu
- Department of Chemistry, The University of Hong Kong, Hong Kong, SAR 999077, China
| | - Qing-Qing Wang
- Department of Orthopaedic Surgery, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou 310016, China
| | - Hai-Ming Jin
- Department of Orthopaedics, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, Wenzhou 32500, China
| |
Collapse
|
3
|
Kim S, Yang S, Jung J, Choi J, Kang M, Joo J. Psychedelic Drugs in Mental Disorders: Current Clinical Scope and Deep Learning-Based Advanced Perspectives. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2025; 12:e2413786. [PMID: 40112231 PMCID: PMC12005819 DOI: 10.1002/advs.202413786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/27/2024] [Revised: 02/13/2025] [Indexed: 03/22/2025]
Abstract
Mental disorders are a representative type of brain disorder, including anxiety, major depressive depression (MDD), and autism spectrum disorder (ASD), that are caused by multiple etiologies, including genetic heterogeneity, epigenetic dysregulation, and aberrant morphological and biochemical conditions. Psychedelic drugs such as psilocybin and lysergic acid diethylamide (LSD) have been renewed as fascinating treatment options and have gradually demonstrated potential therapeutic effects in mental disorders. However, the multifaceted conditions of psychiatric disorders resulting from individuality, complex genetic interplay, and intricate neural circuits impact the systemic pharmacology of psychedelics, which disturbs the integration of mechanisms that may result in dissimilar medicinal efficiency. The precise prescription of psychedelic drugs remains unclear, and advanced approaches are needed to optimize drug development. Here, recent studies demonstrating the diverse pharmacological effects of psychedelics in mental disorders are reviewed, and emerging perspectives on structural function, the microbiota-gut-brain axis, and the transcriptome are discussed. Moreover, the applicability of deep learning is highlighted for the development of drugs on the basis of big data. These approaches may provide insight into pharmacological mechanisms and interindividual factors to enhance drug discovery and development for advanced precision medicine.
Collapse
Affiliation(s)
- Sung‐Hyun Kim
- Department of PharmacyCollege of PharmacyHanyang UniversityAnsanGyeonggi‐do15588Republic of Korea
| | - Sumin Yang
- Department of PharmacyCollege of PharmacyHanyang UniversityAnsanGyeonggi‐do15588Republic of Korea
| | - Jeehye Jung
- Department of PharmacyCollege of PharmacyHanyang UniversityAnsanGyeonggi‐do15588Republic of Korea
| | - Jeonghyeon Choi
- Department of PharmacyCollege of PharmacyHanyang UniversityAnsanGyeonggi‐do15588Republic of Korea
| | - Mingon Kang
- Department of Computer ScienceUniversity of NevadaLas VegasNV89154USA
| | - Jae‐Yeol Joo
- Department of PharmacyCollege of PharmacyHanyang UniversityAnsanGyeonggi‐do15588Republic of Korea
| |
Collapse
|
4
|
Duy H, Srisongkram T. Bidirectional Long Short-Term Memory (BiLSTM) Neural Networks with Conjoint Fingerprints: Application in Predicting Skin-Sensitizing Agents in Natural Compounds. J Chem Inf Model 2025; 65:3035-3047. [PMID: 40029998 PMCID: PMC11938345 DOI: 10.1021/acs.jcim.5c00032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2025] [Revised: 02/19/2025] [Accepted: 02/20/2025] [Indexed: 03/25/2025]
Abstract
Skin sensitization, or allergic contact dermatitis, represents a critical end point in toxicity assessment, with profound implications for drug safety and regulatory decision-making. This study aims to develop a robust deep-learning-based quantitative structure-activity relationship framework for accurately predicting skin sensitization toxicity, particularly in the context of natural-product-derived compounds. To achieve this, we explored advanced recurrent neural network architectures, including long short-term memory (LSTM), bidirectional LSTM (BiLSTM), gated recurrent unit (GRU), and bidirectional GRU, to model the intricate structure-toxicity relationships inherent in molecular compounds. We aim to optimize and improve predictive performance by training a cohort of 55 models with a diverse set of molecular fingerprints. Notably, the BiLSTM model, which integrates SMILES tokens with RDKit fingerprints, achieved superior predictive performance, underscoring its capability to effectively capture key molecular determinants of skin sensitization. An extensive applicability domain analysis coupled with an in-depth evaluation of feature importance provided new insights into the key molecular attributes that influence sensitization propensity. We further evaluated the BiLSTM model using a natural product data set, where it demonstrated exceptional generalization capabilities. The model achieved an accuracy of 86.5%, a Matthews correlation coefficient of 75.2%, a sensitivity of 100%, an area under the curve of 88%, a specificity of 75%, and an F1-score of 88.8%. Remarkably, the model effectively categorized natural products by discriminating sensitizing from non-sensitizing agents across various natural product subcategories. These results underscore the potential of BiLSTM-based models as powerful in silico tools for modern drug discovery efforts and regulatory assessments, especially in the field of natural products.
Collapse
Affiliation(s)
- Huynh
Anh Duy
- Graduate
School in the Program of Research and Development in Pharmaceuticals,
Faculty of Pharmaceutical Sciences, Khon
Kaen University, Khon Kaen 40002, Thailand
- Department
of Health Sciences, College of Natural Sciences, Can Tho University, Can Tho 900000, Vietnam
| | - Tarapong Srisongkram
- Division
of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Khon Kaen University, Khon Kaen 40002, Thailand
| |
Collapse
|
5
|
Liu J, Li J, Li Z, Dong F, Guo W, Ge W, Patterson TA, Hong H. Developing predictive models for µ opioid receptor binding using machine learning and deep learning techniques. Exp Biol Med (Maywood) 2025; 250:10359. [PMID: 40177220 PMCID: PMC11961360 DOI: 10.3389/ebm.2025.10359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2024] [Accepted: 02/25/2025] [Indexed: 04/05/2025] Open
Abstract
Opioids exert their analgesic effect by binding to the µ opioid receptor (MOR), which initiates a downstream signaling pathway, eventually inhibiting pain transmission in the spinal cord. However, current opioids are addictive, often leading to overdose contributing to the opioid crisis in the United States. Therefore, understanding the structure-activity relationship between MOR and its ligands is essential for predicting MOR binding of chemicals, which could assist in the development of non-addictive or less-addictive opioid analgesics. This study aimed to develop machine learning and deep learning models for predicting MOR binding activity of chemicals. Chemicals with MOR binding activity data were first curated from public databases and the literature. Molecular descriptors of the curated chemicals were calculated using software Mold2. The chemicals were then split into training and external validation datasets. Random forest, k-nearest neighbors, support vector machine, multi-layer perceptron, and long short-term memory models were developed and evaluated using 5-fold cross-validations and external validations, resulting in Matthews correlation coefficients of 0.528-0.654 and 0.408, respectively. Furthermore, prediction confidence and applicability domain analyses highlighted their importance to the models' applicability. Our results suggest that the developed models could be useful for identifying MOR binders, potentially aiding in the development of non-addictive or less-addictive drugs targeting MOR.
Collapse
Affiliation(s)
- Jie Liu
- U.S. Food and Drug Administration, National Center for Toxicological Research, Jefferson, AR, United States
| | - Jerry Li
- Department of Computer Science, Rice University, Houston, TX, United States
| | - Zoe Li
- U.S. Food and Drug Administration, National Center for Toxicological Research, Jefferson, AR, United States
| | - Fan Dong
- U.S. Food and Drug Administration, National Center for Toxicological Research, Jefferson, AR, United States
| | - Wenjing Guo
- U.S. Food and Drug Administration, National Center for Toxicological Research, Jefferson, AR, United States
| | - Weigong Ge
- U.S. Food and Drug Administration, National Center for Toxicological Research, Jefferson, AR, United States
| | - Tucker A. Patterson
- U.S. Food and Drug Administration, National Center for Toxicological Research, Jefferson, AR, United States
| | - Huixiao Hong
- U.S. Food and Drug Administration, National Center for Toxicological Research, Jefferson, AR, United States
| |
Collapse
|
6
|
Duy H, Srisongkram T. Comparative Analysis of Recurrent Neural Networks with Conjoint Fingerprints for Skin Corrosion Prediction. J Chem Inf Model 2025; 65:1305-1317. [PMID: 39835935 PMCID: PMC11815816 DOI: 10.1021/acs.jcim.4c02062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2024] [Revised: 12/28/2024] [Accepted: 01/03/2025] [Indexed: 01/22/2025]
Abstract
Skin corrosion assessment is an essential toxicity end point that addresses safety concerns for topical dosage forms and cosmetic products. Previously, skin corrosion assessments required animal testing; however, differences in skin architecture and ethical concerns regarding animal models have fostered the advancement of alternative methods such as in silico and in vitro models. This study aimed to develop deep learning (DL) models based on recurrent neural networks (RNNs) for classifying skin corrosion of chemical compounds based on chemical language notation, molecular substructure, physicochemical properties, and a combination of these three properties called conjoint fingerprints. Simple RNN, long short-term memory, bidirectional long short-term memory (BiLSTM), gated recurrent units, and bidirectional gated recurrent units models, along with 11 molecular features, were employed to generate 55 RNN-based models. Applicability domain and permutation importance analysis were exploited for additional trustable prediction and explanation ability of the models, respectively. Our findings indicate that BiLSTM with conjoint features of MACCS keys and physicochemical descriptors is the most effective model with 84.3% accuracy, 89.8% area under the curve, and 57.6% Matthews correlation coefficient for the external test performance. Furthermore, our model accurately predicted the skin corrosion toxicity of all new and unseen compounds beyond our test set, highlighting prominent classification performance compared to existing skin corrosion models. This finding will contribute to the utilization of DL and conjoint characteristics of molecular structure to enhance the model's predictive capability for skin toxicity assessment.
Collapse
Affiliation(s)
- Huynh
Anh Duy
- Graduate
School in the Program of Research and Development in Pharmaceuticals,
Faculty of Pharmaceutical Sciences, Khon
Kaen University, Khon Kaen 40002, Thailand
| | - Tarapong Srisongkram
- Division
of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Khon Kaen University, Khon Kaen 40002, Thailand
| |
Collapse
|
7
|
Kakar M, Huynh BN, Zlygosteva O, Juvkam IS, Edin N, Tomic O, Futsaether CM, Malinen E. Attention-based Vision Transformer Enables Early Detection of Radiotherapy-Induced Toxicity in Magnetic Resonance Images of a Preclinical Model. Technol Cancer Res Treat 2025; 24:15330338251333018. [PMID: 40183426 PMCID: PMC11970093 DOI: 10.1177/15330338251333018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2025] [Revised: 03/07/2025] [Accepted: 03/20/2025] [Indexed: 04/05/2025] Open
Abstract
IntroductionEarly identification of patients at risk for toxicity induced by radiotherapy (RT) is essential for developing personalized treatments and mitigation plans. Preclinical models with relevant endpoints are critical for systematic evaluation of normal tissue responses. This study aims to determine whether attention-based vision transformers can classify MR images of irradiated and control mice, potentially aiding early identification of individuals at risk of developing toxicity.MethodC57BL/6J mice (n = 14) were subjected to 66 Gy of fractionated RT targeting the oral cavity, swallowing muscles, and salivary glands. A control group (n = 15) received no irradiation but was otherwise treated identically. T2-weighted MR images were obtained 3-5 days post-irradiation. Late toxicity in terms of saliva production in individual mice was assessed at day 105 after treatment. A pre-trained vision transformer model (ViT Base 16) was employed to classify the images into control and irradiated groups.ResultsThe ViT Base 16 model classified the MR images with an accuracy of 69%, with identical overall performance for control and irradiated animals. The ViT's model predictions showed a significant correlation with late toxicity (r = 0.65, p < 0.01). One of the attention maps from the ViT model highlighted the irradiated regions of the animals.ConclusionsAttention-based vision transformers using MRI have the potential to predict individuals at risk of developing early toxicity. This approach may enhance personalized treatment and follow-up strategies in head and neck cancer radiotherapy.
Collapse
Affiliation(s)
- Manish Kakar
- Department of Radiation Biology, Institute for Cancer Research, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
| | - Bao Ngoc Huynh
- Department of Radiation Biology, Institute for Cancer Research, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
- Faculty of Science and Technology, Norwegian University of Life Sciences, Ås, Norway
| | | | - Inga Solgård Juvkam
- Department of Radiation Biology, Institute for Cancer Research, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
- Institute for Oral Biology, Faculty of Dentistry, University of Oslo, Oslo, Norway
| | - Nina Edin
- Department of Physics, University of Oslo, Oslo, Norway
| | - Oliver Tomic
- Faculty of Science and Technology, Norwegian University of Life Sciences, Ås, Norway
| | | | - Eirik Malinen
- Department of Radiation Biology, Institute for Cancer Research, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
- Department of Physics, University of Oslo, Oslo, Norway
| |
Collapse
|
8
|
Wei Y, Qiu T, Ai Y, Zhang Y, Xie J, Zhang D, Luo X, Sun X, Wang X, Qiu J. Advances of computational methods enhance the development of multi-epitope vaccines. Brief Bioinform 2024; 26:bbaf055. [PMID: 39951549 PMCID: PMC11827616 DOI: 10.1093/bib/bbaf055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Revised: 11/28/2024] [Accepted: 01/27/2025] [Indexed: 02/16/2025] Open
Abstract
Vaccine development is one of the most promising fields, and multi-epitope vaccine, which does not need laborious culture processes, is an attractive alternative to classical vaccines with the advantage of safety, and efficiency. The rapid development of algorithms and the accumulation of immune data have facilitated the advancement of computer-aided vaccine design. Here we systemically reviewed the in silico data and algorithms resource, for different steps of computational vaccine design, including immunogen selection, epitope prediction, vaccine construction, optimization, and evaluation. The performance of different available tools on epitope prediction and immunogenicity evaluation was tested and compared on benchmark datasets. Finally, we discuss the future research direction for the construction of a multiepitope vaccine.
Collapse
Affiliation(s)
- Yiwen Wei
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
| | - Tianyi Qiu
- Institute of Clinical Science, Zhongshan Hospital; Intelligent Medicine Institute; Shanghai Institute of Infectious Disease and Biosecurity, Shanghai Medical College, Fudan University, No. 180, Fenglin Road, Xuhui Destrict, Shanghai 200032, China
| | - Yisi Ai
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
| | - Yuxi Zhang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
| | - Junting Xie
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
| | - Dong Zhang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
| | - Xiaochuan Luo
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
| | - Xiulan Sun
- State Key Laboratory of Food Science and Technology, School of Food Science and Technology, National Engineering Research Center for Functional Foods, Synergetic Innovation Center of Food Safety and Nutrition, Jiangnan University, Lihu Avenue 1800, Wuxi, Jiangsu 214122, China
| | - Xin Wang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
- Shanghai Collaborative Innovation Center of Energy Therapy for Tumors, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
| | - Jingxuan Qiu
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
- Shanghai Collaborative Innovation Center of Energy Therapy for Tumors, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
| |
Collapse
|
9
|
Guo W, Liu J, Dong F, Hong H. Unlocking the potential of AI: Machine learning and deep learning models for predicting carcinogenicity of chemicals. JOURNAL OF ENVIRONMENTAL SCIENCE AND HEALTH. PART C, TOXICOLOGY AND CARCINOGENESIS 2024; 43:23-50. [PMID: 39228157 DOI: 10.1080/26896583.2024.2396731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
The escalating apprehension surrounding the carcinogenic potential of chemicals emphasizes the imperative need for efficient methods of assessing carcinogenicity. Conventional experimental approaches such as in vitro and in vivo assays, albeit effective, suffer from being costly and time-consuming. In response to this challenge, new alternative methodologies, notably machine learning and deep learning techniques, have attracted attention for their potential in developing carcinogenicity prediction models. This article reviews the progress in predicting carcinogenicity using various machine learning and deep learning algorithms. A comparative analysis on these developed models reveals that support vector machine, random forest, and ensemble learning are commonly preferred for their robustness and effectiveness in predicting chemical carcinogenicity. Conversely, models based on deep learning algorithms, such as feedforward neural network, convolutional neural network, graph convolutional neural network, capsule neural network, and hybrid neural networks, exhibit promising capabilities but are limited by the size of available carcinogenicity datasets. This review provides a comprehensive analysis of current machine learning and deep learning models for carcinogenicity prediction, underscoring the importance of high-quality and large datasets. These observations are anticipated to catalyze future advancements in developing effective and generalizable machine learning and deep learning models for predicting chemical carcinogenicity.
Collapse
Affiliation(s)
- Wenjing Guo
- National Center for Toxicological Research (NCTR), U.S. Food & Drug Administration (FDA), Jefferson, AR
| | - Jie Liu
- National Center for Toxicological Research (NCTR), U.S. Food & Drug Administration (FDA), Jefferson, AR
| | - Fan Dong
- National Center for Toxicological Research (NCTR), U.S. Food & Drug Administration (FDA), Jefferson, AR
| | - Huixiao Hong
- National Center for Toxicological Research (NCTR), U.S. Food & Drug Administration (FDA), Jefferson, AR
| |
Collapse
|
10
|
Arab I, Laukens K, Bittremieux W. Semisupervised Learning to Boost hERG, Nav1.5, and Cav1.2 Cardiac Ion Channel Toxicity Prediction by Mining a Large Unlabeled Small Molecule Data Set. J Chem Inf Model 2024; 64:6410-6420. [PMID: 39110924 DOI: 10.1021/acs.jcim.4c01102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Predicting drug toxicity is a critical aspect of ensuring patient safety during the drug design process. Although conventional machine learning techniques have shown some success in this field, the scarcity of annotated toxicity data poses a significant challenge in enhancing models' performance. In this study, we explore the potential of leveraging large unlabeled small molecule data sets using semisupervised learning to improve drug cardiotoxicity predictive performance across three cardiac ion channel targets: the voltage-gated potassium channel (hERG), the voltage-gated sodium channel (Nav1.5), and the voltage-gated calcium channel (Cav1.2). We extensively mined the ChEMBL database, comprising approximately 2 million small molecules, and then employed semisupervised learning to construct robust classification models for this purpose. We achieved a performance boost on highly diverse (i.e., structurally dissimilar) test data sets across all three targets. Using our built models, we screened the whole ChEMBL database and a large set of FDA-approved drugs, identifying several compounds with potential cardiac ion channel activity. To ensure broad accessibility and usability for both technical and nontechnical users, we developed a cross-platform graphical user interface that allows users to make predictions and gain insights into the cardiotoxicity of drugs and other small molecules. The software is made available as open source under the permissive MIT license at https://github.com/issararab/CToxPred2.
Collapse
Affiliation(s)
- Issar Arab
- Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium
- Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium
| | - Kris Laukens
- Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium
- Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium
| | - Wout Bittremieux
- Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium
- Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium
| |
Collapse
|
11
|
Huang ETC, Yang JS, Liao KYK, Tseng WCW, Lee CK, Gill M, Compas C, See S, Tsai FJ. Predicting blood-brain barrier permeability of molecules with a large language model and machine learning. Sci Rep 2024; 14:15844. [PMID: 38982309 PMCID: PMC11233737 DOI: 10.1038/s41598-024-66897-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Accepted: 07/05/2024] [Indexed: 07/11/2024] Open
Abstract
Predicting the blood-brain barrier (BBB) permeability of small-molecule compounds using a novel artificial intelligence platform is necessary for drug discovery. Machine learning and a large language model on artificial intelligence (AI) tools improve the accuracy and shorten the time for new drug development. The primary goal of this research is to develop artificial intelligence (AI) computing models and novel deep learning architectures capable of predicting whether molecules can permeate the human blood-brain barrier (BBB). The in silico (computational) and in vitro (experimental) results were validated by the Natural Products Research Laboratories (NPRL) at China Medical University Hospital (CMUH). The transformer-based MegaMolBART was used as the simplified molecular input line entry system (SMILES) encoder with an XGBoost classifier as an in silico method to check if a molecule could cross through the BBB. We used Morgan or Circular fingerprints to apply the Morgan algorithm to a set of atomic invariants as a baseline encoder also with an XGBoost classifier to compare the results. BBB permeability was assessed in vitro using three-dimensional (3D) human BBB spheroids (human brain microvascular endothelial cells, brain vascular pericytes, and astrocytes). Using multiple BBB databases, the results of the final in silico transformer and XGBoost model achieved an area under the receiver operating characteristic curve of 0.88 on the held-out test dataset. Temozolomide (TMZ) and 21 randomly selected BBB permeable compounds (Pred scores = 1, indicating BBB-permeable) from the NPRL penetrated human BBB spheroid cells. No evidence suggests that ferulic acid or five BBB-impermeable compounds (Pred scores < 1.29423E-05, which designate compounds that pass through the human BBB) can pass through the spheroid cells of the BBB. Our validation of in vitro experiments indicated that the in silico prediction of small-molecule permeation in the BBB model is accurate. Transformer-based models like MegaMolBART, leveraging the SMILES representations of molecules, show great promise for applications in new drug discovery. These models have the potential to accelerate the development of novel targeted treatments for disorders of the central nervous system.
Collapse
Affiliation(s)
- Eddie T C Huang
- NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, USA
| | - Jai-Sing Yang
- Department of Medical Research, China Medical University Hospital, China Medical University, Taichung, Taiwan
| | - Ken Y K Liao
- NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, USA
| | - Warren C W Tseng
- NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, USA
| | - C K Lee
- NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, USA
| | - Michelle Gill
- NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, USA
| | - Colin Compas
- NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, USA
| | - Simon See
- NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, USA
| | - Fuu-Jen Tsai
- School of Chinese Medicine, College of Chinese Medicine, China Medical University, China Medical University Children's Hospital, No. 2, Yude Road, Taichung, 404332, Taiwan.
- China Medical University Children's Hospital, Taichung, Taiwan.
| |
Collapse
|
12
|
Liu J, Khan MKH, Guo W, Dong F, Ge W, Zhang C, Gong P, Patterson TA, Hong H. Machine learning and deep learning approaches for enhanced prediction of hERG blockade: a comprehensive QSAR modeling study. Expert Opin Drug Metab Toxicol 2024; 20:665-684. [PMID: 38968091 DOI: 10.1080/17425255.2024.2377593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 06/26/2024] [Indexed: 07/07/2024]
Abstract
BACKGROUND Cardiotoxicity is a major cause of drug withdrawal. The hERG channel, regulating ion flow, is pivotal for heart and nervous system function. Its blockade is a concern in drug development. Predicting hERG blockade is essential for identifying cardiac safety issues. Various QSAR models exist, but their performance varies. Ongoing improvements show promise, necessitating continued efforts to enhance accuracy using emerging deep learning algorithms in predicting potential hERG blockade. STUDY DESIGN AND METHOD Using a large training dataset, six individual QSAR models were developed. Additionally, three ensemble models were constructed. All models were evaluated using 10-fold cross-validations and two external datasets. RESULTS The 10-fold cross-validations resulted in Mathews correlation coefficient (MCC) values from 0.682 to 0.730, surpassing the best-reported model on the same dataset (0.689). External validations yielded MCC values from 0.520 to 0.715 for the first dataset, exceeding those of previously reported models (0-0.599). For the second dataset, MCC values fell between 0.025 and 0.215, aligning with those of reported models (0.112-0.220). CONCLUSIONS The developed models can assist the pharmaceutical industry and regulatory agencies in predicting hERG blockage activity, thereby enhancing safety assessments and reducing the risk of adverse cardiac events associated with new drug candidates.
Collapse
Affiliation(s)
- Jie Liu
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Md Kamrul Hasan Khan
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Wenjing Guo
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Fan Dong
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Weigong Ge
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, USA
| | - Ping Gong
- Environmental Laboratory, US Army Engineer Research and Development Center, Vicksburg, MS, USA
| | - Tucker A Patterson
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Huixiao Hong
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| |
Collapse
|
13
|
Khan MK, Raza M, Shahbaz M, Hussain I, Khan MF, Xie Z, Shah SSA, Tareen AK, Bashir Z, Khan K. The recent advances in the approach of artificial intelligence (AI) towards drug discovery. Front Chem 2024; 12:1408740. [PMID: 38882215 PMCID: PMC11176507 DOI: 10.3389/fchem.2024.1408740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Accepted: 04/26/2024] [Indexed: 06/18/2024] Open
Abstract
Artificial intelligence (AI) has recently emerged as a unique developmental influence that is playing an important role in the development of medicine. The AI medium is showing the potential in unprecedented advancements in truth and efficiency. The intersection of AI has the potential to revolutionize drug discovery. However, AI also has limitations and experts should be aware of these data access and ethical issues. The use of AI techniques for drug discovery applications has increased considerably over the past few years, including combinatorial QSAR and QSPR, virtual screening, and denovo drug design. The purpose of this survey is to give a general overview of drug discovery based on artificial intelligence, and associated applications. We also highlighted the gaps present in the traditional method for drug designing. In addition, potential strategies and approaches to overcome current challenges are discussed to address the constraints of AI within this field. We hope that this survey plays a comprehensive role in understanding the potential of AI in drug discovery.
Collapse
Affiliation(s)
- Mahroza Kanwal Khan
- College of Chemistry and Environmental Engineering, Shenzhen University, Shenzhen, China
| | - Mohsin Raza
- Additive Manufacturing Institute, Shenzhen University, Shenzhen, China
| | - Muhammad Shahbaz
- Additive Manufacturing Institute, Shenzhen University, Shenzhen, China
| | - Iftikhar Hussain
- Department of Mechanical Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR, China
- A. J. Drexel Nanomaterials Institute and Department of Materials Science and Engineering, Drexel University, Philadelphia, PA, United States
| | - Muhammad Farooq Khan
- Department of Electrical Engineering, Sejong University, Seoul, Republic of Korea
| | - Zhongjian Xie
- Shenzhen Children's Hospital, Clinical Medical College of Southern University of Science and Technology, Shenzhen, China
| | - Syed Shoaib Ahmad Shah
- Department of Chemistry, School of Natural Sciences, National University of Sciences and Technology, Islamabad, Pakistan
| | - Ayesha Khan Tareen
- School of Mechanical Engineering, Dongguan University of Technology, Dongguan, China
| | - Zoobia Bashir
- College of Chemistry and Environmental Engineering, Shenzhen University, Shenzhen, China
| | - Karim Khan
- Additive Manufacturing Institute, Shenzhen University, Shenzhen, China
| |
Collapse
|
14
|
Dong F, Guo W, Liu J, Patterson TA, Hong H. BERT-based language model for accurate drug adverse event extraction from social media: implementation, evaluation, and contributions to pharmacovigilance practices. Front Public Health 2024; 12:1392180. [PMID: 38716250 PMCID: PMC11074401 DOI: 10.3389/fpubh.2024.1392180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 04/11/2024] [Indexed: 05/18/2024] Open
Abstract
Introduction Social media platforms serve as a valuable resource for users to share health-related information, aiding in the monitoring of adverse events linked to medications and treatments in drug safety surveillance. However, extracting drug-related adverse events accurately and efficiently from social media poses challenges in both natural language processing research and the pharmacovigilance domain. Method Recognizing the lack of detailed implementation and evaluation of Bidirectional Encoder Representations from Transformers (BERT)-based models for drug adverse event extraction on social media, we developed a BERT-based language model tailored to identifying drug adverse events in this context. Our model utilized publicly available labeled adverse event data from the ADE-Corpus-V2. Constructing the BERT-based model involved optimizing key hyperparameters, such as the number of training epochs, batch size, and learning rate. Through ten hold-out evaluations on ADE-Corpus-V2 data and external social media datasets, our model consistently demonstrated high accuracy in drug adverse event detection. Result The hold-out evaluations resulted in average F1 scores of 0.8575, 0.9049, and 0.9813 for detecting words of adverse events, words in adverse events, and words not in adverse events, respectively. External validation using human-labeled adverse event tweets data from SMM4H further substantiated the effectiveness of our model, yielding F1 scores 0.8127, 0.8068, and 0.9790 for detecting words of adverse events, words in adverse events, and words not in adverse events, respectively. Discussion This study not only showcases the effectiveness of BERT-based language models in accurately identifying drug-related adverse events in the dynamic landscape of social media data, but also addresses the need for the implementation of a comprehensive study design and evaluation. By doing so, we contribute to the advancement of pharmacovigilance practices and methodologies in the context of emerging information sources like social media.
Collapse
Affiliation(s)
| | | | | | | | - Huixiao Hong
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, United States
| |
Collapse
|
15
|
Hong H, Slikker W. Integrating artificial intelligence with bioinformatics promotes public health. Exp Biol Med (Maywood) 2023; 248:1905-1907. [PMID: 38179798 PMCID: PMC10798184 DOI: 10.1177/15353702231223575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2024] Open
Affiliation(s)
- Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA
| | - William Slikker
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA
| |
Collapse
|