1
|
Guo W, Liu J, Dong F, Hong H. Unlocking the potential of AI: Machine learning and deep learning models for predicting carcinogenicity of chemicals. JOURNAL OF ENVIRONMENTAL SCIENCE AND HEALTH. PART C, TOXICOLOGY AND CARCINOGENESIS 2024:1-28. [PMID: 39228157 DOI: 10.1080/26896583.2024.2396731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
The escalating apprehension surrounding the carcinogenic potential of chemicals emphasizes the imperative need for efficient methods of assessing carcinogenicity. Conventional experimental approaches such as in vitro and in vivo assays, albeit effective, suffer from being costly and time-consuming. In response to this challenge, new alternative methodologies, notably machine learning and deep learning techniques, have attracted attention for their potential in developing carcinogenicity prediction models. This article reviews the progress in predicting carcinogenicity using various machine learning and deep learning algorithms. A comparative analysis on these developed models reveals that support vector machine, random forest, and ensemble learning are commonly preferred for their robustness and effectiveness in predicting chemical carcinogenicity. Conversely, models based on deep learning algorithms, such as feedforward neural network, convolutional neural network, graph convolutional neural network, capsule neural network, and hybrid neural networks, exhibit promising capabilities but are limited by the size of available carcinogenicity datasets. This review provides a comprehensive analysis of current machine learning and deep learning models for carcinogenicity prediction, underscoring the importance of high-quality and large datasets. These observations are anticipated to catalyze future advancements in developing effective and generalizable machine learning and deep learning models for predicting chemical carcinogenicity.
Collapse
Affiliation(s)
- Wenjing Guo
- National Center for Toxicological Research (NCTR), U.S. Food & Drug Administration (FDA), Jefferson, AR
| | - Jie Liu
- National Center for Toxicological Research (NCTR), U.S. Food & Drug Administration (FDA), Jefferson, AR
| | - Fan Dong
- National Center for Toxicological Research (NCTR), U.S. Food & Drug Administration (FDA), Jefferson, AR
| | - Huixiao Hong
- National Center for Toxicological Research (NCTR), U.S. Food & Drug Administration (FDA), Jefferson, AR
| |
Collapse
|
2
|
Arab I, Laukens K, Bittremieux W. Semisupervised Learning to Boost hERG, Nav1.5, and Cav1.2 Cardiac Ion Channel Toxicity Prediction by Mining a Large Unlabeled Small Molecule Data Set. J Chem Inf Model 2024; 64:6410-6420. [PMID: 39110924 DOI: 10.1021/acs.jcim.4c01102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Predicting drug toxicity is a critical aspect of ensuring patient safety during the drug design process. Although conventional machine learning techniques have shown some success in this field, the scarcity of annotated toxicity data poses a significant challenge in enhancing models' performance. In this study, we explore the potential of leveraging large unlabeled small molecule data sets using semisupervised learning to improve drug cardiotoxicity predictive performance across three cardiac ion channel targets: the voltage-gated potassium channel (hERG), the voltage-gated sodium channel (Nav1.5), and the voltage-gated calcium channel (Cav1.2). We extensively mined the ChEMBL database, comprising approximately 2 million small molecules, and then employed semisupervised learning to construct robust classification models for this purpose. We achieved a performance boost on highly diverse (i.e., structurally dissimilar) test data sets across all three targets. Using our built models, we screened the whole ChEMBL database and a large set of FDA-approved drugs, identifying several compounds with potential cardiac ion channel activity. To ensure broad accessibility and usability for both technical and nontechnical users, we developed a cross-platform graphical user interface that allows users to make predictions and gain insights into the cardiotoxicity of drugs and other small molecules. The software is made available as open source under the permissive MIT license at https://github.com/issararab/CToxPred2.
Collapse
Affiliation(s)
- Issar Arab
- Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium
- Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium
| | - Kris Laukens
- Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium
- Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium
| | - Wout Bittremieux
- Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium
- Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium
| |
Collapse
|
3
|
Huang ETC, Yang JS, Liao KYK, Tseng WCW, Lee CK, Gill M, Compas C, See S, Tsai FJ. Predicting blood-brain barrier permeability of molecules with a large language model and machine learning. Sci Rep 2024; 14:15844. [PMID: 38982309 PMCID: PMC11233737 DOI: 10.1038/s41598-024-66897-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Accepted: 07/05/2024] [Indexed: 07/11/2024] Open
Abstract
Predicting the blood-brain barrier (BBB) permeability of small-molecule compounds using a novel artificial intelligence platform is necessary for drug discovery. Machine learning and a large language model on artificial intelligence (AI) tools improve the accuracy and shorten the time for new drug development. The primary goal of this research is to develop artificial intelligence (AI) computing models and novel deep learning architectures capable of predicting whether molecules can permeate the human blood-brain barrier (BBB). The in silico (computational) and in vitro (experimental) results were validated by the Natural Products Research Laboratories (NPRL) at China Medical University Hospital (CMUH). The transformer-based MegaMolBART was used as the simplified molecular input line entry system (SMILES) encoder with an XGBoost classifier as an in silico method to check if a molecule could cross through the BBB. We used Morgan or Circular fingerprints to apply the Morgan algorithm to a set of atomic invariants as a baseline encoder also with an XGBoost classifier to compare the results. BBB permeability was assessed in vitro using three-dimensional (3D) human BBB spheroids (human brain microvascular endothelial cells, brain vascular pericytes, and astrocytes). Using multiple BBB databases, the results of the final in silico transformer and XGBoost model achieved an area under the receiver operating characteristic curve of 0.88 on the held-out test dataset. Temozolomide (TMZ) and 21 randomly selected BBB permeable compounds (Pred scores = 1, indicating BBB-permeable) from the NPRL penetrated human BBB spheroid cells. No evidence suggests that ferulic acid or five BBB-impermeable compounds (Pred scores < 1.29423E-05, which designate compounds that pass through the human BBB) can pass through the spheroid cells of the BBB. Our validation of in vitro experiments indicated that the in silico prediction of small-molecule permeation in the BBB model is accurate. Transformer-based models like MegaMolBART, leveraging the SMILES representations of molecules, show great promise for applications in new drug discovery. These models have the potential to accelerate the development of novel targeted treatments for disorders of the central nervous system.
Collapse
Affiliation(s)
- Eddie T C Huang
- NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, USA
| | - Jai-Sing Yang
- Department of Medical Research, China Medical University Hospital, China Medical University, Taichung, Taiwan
| | - Ken Y K Liao
- NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, USA
| | - Warren C W Tseng
- NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, USA
| | - C K Lee
- NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, USA
| | - Michelle Gill
- NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, USA
| | - Colin Compas
- NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, USA
| | - Simon See
- NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, USA
| | - Fuu-Jen Tsai
- School of Chinese Medicine, College of Chinese Medicine, China Medical University, China Medical University Children's Hospital, No. 2, Yude Road, Taichung, 404332, Taiwan.
- China Medical University Children's Hospital, Taichung, Taiwan.
| |
Collapse
|
4
|
Liu J, Khan MKH, Guo W, Dong F, Ge W, Zhang C, Gong P, Patterson TA, Hong H. Machine learning and deep learning approaches for enhanced prediction of hERG blockade: a comprehensive QSAR modeling study. Expert Opin Drug Metab Toxicol 2024; 20:665-684. [PMID: 38968091 DOI: 10.1080/17425255.2024.2377593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 06/26/2024] [Indexed: 07/07/2024]
Abstract
BACKGROUND Cardiotoxicity is a major cause of drug withdrawal. The hERG channel, regulating ion flow, is pivotal for heart and nervous system function. Its blockade is a concern in drug development. Predicting hERG blockade is essential for identifying cardiac safety issues. Various QSAR models exist, but their performance varies. Ongoing improvements show promise, necessitating continued efforts to enhance accuracy using emerging deep learning algorithms in predicting potential hERG blockade. STUDY DESIGN AND METHOD Using a large training dataset, six individual QSAR models were developed. Additionally, three ensemble models were constructed. All models were evaluated using 10-fold cross-validations and two external datasets. RESULTS The 10-fold cross-validations resulted in Mathews correlation coefficient (MCC) values from 0.682 to 0.730, surpassing the best-reported model on the same dataset (0.689). External validations yielded MCC values from 0.520 to 0.715 for the first dataset, exceeding those of previously reported models (0-0.599). For the second dataset, MCC values fell between 0.025 and 0.215, aligning with those of reported models (0.112-0.220). CONCLUSIONS The developed models can assist the pharmaceutical industry and regulatory agencies in predicting hERG blockage activity, thereby enhancing safety assessments and reducing the risk of adverse cardiac events associated with new drug candidates.
Collapse
Affiliation(s)
- Jie Liu
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Md Kamrul Hasan Khan
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Wenjing Guo
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Fan Dong
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Weigong Ge
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, USA
| | - Ping Gong
- Environmental Laboratory, US Army Engineer Research and Development Center, Vicksburg, MS, USA
| | - Tucker A Patterson
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| | - Huixiao Hong
- National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR, USA
| |
Collapse
|
5
|
Khan MK, Raza M, Shahbaz M, Hussain I, Khan MF, Xie Z, Shah SSA, Tareen AK, Bashir Z, Khan K. The recent advances in the approach of artificial intelligence (AI) towards drug discovery. Front Chem 2024; 12:1408740. [PMID: 38882215 PMCID: PMC11176507 DOI: 10.3389/fchem.2024.1408740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Accepted: 04/26/2024] [Indexed: 06/18/2024] Open
Abstract
Artificial intelligence (AI) has recently emerged as a unique developmental influence that is playing an important role in the development of medicine. The AI medium is showing the potential in unprecedented advancements in truth and efficiency. The intersection of AI has the potential to revolutionize drug discovery. However, AI also has limitations and experts should be aware of these data access and ethical issues. The use of AI techniques for drug discovery applications has increased considerably over the past few years, including combinatorial QSAR and QSPR, virtual screening, and denovo drug design. The purpose of this survey is to give a general overview of drug discovery based on artificial intelligence, and associated applications. We also highlighted the gaps present in the traditional method for drug designing. In addition, potential strategies and approaches to overcome current challenges are discussed to address the constraints of AI within this field. We hope that this survey plays a comprehensive role in understanding the potential of AI in drug discovery.
Collapse
Affiliation(s)
- Mahroza Kanwal Khan
- College of Chemistry and Environmental Engineering, Shenzhen University, Shenzhen, China
| | - Mohsin Raza
- Additive Manufacturing Institute, Shenzhen University, Shenzhen, China
| | - Muhammad Shahbaz
- Additive Manufacturing Institute, Shenzhen University, Shenzhen, China
| | - Iftikhar Hussain
- Department of Mechanical Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR, China
- A. J. Drexel Nanomaterials Institute and Department of Materials Science and Engineering, Drexel University, Philadelphia, PA, United States
| | - Muhammad Farooq Khan
- Department of Electrical Engineering, Sejong University, Seoul, Republic of Korea
| | - Zhongjian Xie
- Shenzhen Children's Hospital, Clinical Medical College of Southern University of Science and Technology, Shenzhen, China
| | - Syed Shoaib Ahmad Shah
- Department of Chemistry, School of Natural Sciences, National University of Sciences and Technology, Islamabad, Pakistan
| | - Ayesha Khan Tareen
- School of Mechanical Engineering, Dongguan University of Technology, Dongguan, China
| | - Zoobia Bashir
- College of Chemistry and Environmental Engineering, Shenzhen University, Shenzhen, China
| | - Karim Khan
- Additive Manufacturing Institute, Shenzhen University, Shenzhen, China
| |
Collapse
|
6
|
Dong F, Guo W, Liu J, Patterson TA, Hong H. BERT-based language model for accurate drug adverse event extraction from social media: implementation, evaluation, and contributions to pharmacovigilance practices. Front Public Health 2024; 12:1392180. [PMID: 38716250 PMCID: PMC11074401 DOI: 10.3389/fpubh.2024.1392180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 04/11/2024] [Indexed: 05/18/2024] Open
Abstract
Introduction Social media platforms serve as a valuable resource for users to share health-related information, aiding in the monitoring of adverse events linked to medications and treatments in drug safety surveillance. However, extracting drug-related adverse events accurately and efficiently from social media poses challenges in both natural language processing research and the pharmacovigilance domain. Method Recognizing the lack of detailed implementation and evaluation of Bidirectional Encoder Representations from Transformers (BERT)-based models for drug adverse event extraction on social media, we developed a BERT-based language model tailored to identifying drug adverse events in this context. Our model utilized publicly available labeled adverse event data from the ADE-Corpus-V2. Constructing the BERT-based model involved optimizing key hyperparameters, such as the number of training epochs, batch size, and learning rate. Through ten hold-out evaluations on ADE-Corpus-V2 data and external social media datasets, our model consistently demonstrated high accuracy in drug adverse event detection. Result The hold-out evaluations resulted in average F1 scores of 0.8575, 0.9049, and 0.9813 for detecting words of adverse events, words in adverse events, and words not in adverse events, respectively. External validation using human-labeled adverse event tweets data from SMM4H further substantiated the effectiveness of our model, yielding F1 scores 0.8127, 0.8068, and 0.9790 for detecting words of adverse events, words in adverse events, and words not in adverse events, respectively. Discussion This study not only showcases the effectiveness of BERT-based language models in accurately identifying drug-related adverse events in the dynamic landscape of social media data, but also addresses the need for the implementation of a comprehensive study design and evaluation. By doing so, we contribute to the advancement of pharmacovigilance practices and methodologies in the context of emerging information sources like social media.
Collapse
Affiliation(s)
| | | | | | | | - Huixiao Hong
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, United States
| |
Collapse
|
7
|
Hong H, Slikker W. Integrating artificial intelligence with bioinformatics promotes public health. Exp Biol Med (Maywood) 2023; 248:1905-1907. [PMID: 38179798 PMCID: PMC10798184 DOI: 10.1177/15353702231223575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2024] Open
Affiliation(s)
- Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA
| | - William Slikker
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA
| |
Collapse
|