101
|
Elbadawi M, Li H, Basit AW, Gaisford S. The role of artificial intelligence in generating original scientific research. Int J Pharm 2024; 652:123741. [PMID: 38181989 DOI: 10.1016/j.ijpharm.2023.123741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 12/20/2023] [Accepted: 12/22/2023] [Indexed: 01/07/2024]
Abstract
Artificial intelligence (AI) is a revolutionary technology that is finding wide application across numerous sectors. Large language models (LLMs) are an emerging subset technology of AI and have been developed to communicate using human languages. At their core, LLMs are trained with vast amounts of information extracted from the internet, including text and images. Their ability to create human-like, expert text in almost any subject means they are increasingly being used as an aid to presentation, particularly in scientific writing. However, we wondered whether LLMs could go further, generating original scientific research and preparing the results for publication. We taskedGPT-4, an LLM, to write an original pharmaceutics manuscript, on a topic that is itself novel. It was able to conceive a research hypothesis, define an experimental protocol, produce photo-realistic images of 3D printed tablets, generate believable analytical data from a range of instruments and write a convincing publication-ready manuscript with evidence of critical interpretation. The model achieved all this is less than 1 h. Moreover, the generated data were multi-modal in nature, including thermal analyses, vibrational spectroscopy and dissolution testing, demonstrating multi-disciplinary expertise in the LLM. One area in which the model failed, however, was in referencing to the literature. Since the generated experimental results appeared believable though, we suggest that LLMs could certainly play a role in scientific research but with human input, interpretation and data validation. We discuss the potential benefits and current bottlenecks for realising this ambition here.
Collapse
Affiliation(s)
- Moe Elbadawi
- UCL School of Pharmacy, University College London, 29-39 Brunswick Square, London WC1N 1AX, UK.
| | - Hanxiang Li
- UCL School of Pharmacy, University College London, 29-39 Brunswick Square, London WC1N 1AX, UK
| | - Abdul W Basit
- UCL School of Pharmacy, University College London, 29-39 Brunswick Square, London WC1N 1AX, UK
| | - Simon Gaisford
- UCL School of Pharmacy, University College London, 29-39 Brunswick Square, London WC1N 1AX, UK.
| |
Collapse
|
102
|
Maramraju S, Kowalczewski A, Kaza A, Liu X, Singaraju JP, Albert MV, Ma Z, Yang H. AI-organoid integrated systems for biomedical studies and applications. Bioeng Transl Med 2024; 9:e10641. [PMID: 38435826 PMCID: PMC10905559 DOI: 10.1002/btm2.10641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 12/07/2023] [Accepted: 12/13/2023] [Indexed: 03/05/2024] Open
Abstract
In this review, we explore the growing role of artificial intelligence (AI) in advancing the biomedical applications of human pluripotent stem cell (hPSC)-derived organoids. Stem cell-derived organoids, these miniature organ replicas, have become essential tools for disease modeling, drug discovery, and regenerative medicine. However, analyzing the vast and intricate datasets generated from these organoids can be inefficient and error-prone. AI techniques offer a promising solution to efficiently extract insights and make predictions from diverse data types generated from microscopy images, transcriptomics, metabolomics, and proteomics. This review offers a brief overview of organoid characterization and fundamental concepts in AI while focusing on a comprehensive exploration of AI applications in organoid-based disease modeling and drug evaluation. It provides insights into the future possibilities of AI in enhancing the quality control of organoid fabrication, label-free organoid recognition, and three-dimensional image reconstruction of complex organoid structures. This review presents the challenges and potential solutions in AI-organoid integration, focusing on the establishment of reliable AI model decision-making processes and the standardization of organoid research.
Collapse
Affiliation(s)
- Sudhiksha Maramraju
- Department of Biomedical EngineeringUniversity of North TexasDentonTexasUSA
- Texas Academy of Mathematics and ScienceUniversity of North TexasDentonTexasUSA
| | - Andrew Kowalczewski
- Department of Biomedical & Chemical EngineeringSyracuse UniversitySyracuseNew YorkUSA
- BioInspired Institute for Material and Living SystemsSyracuse UniversitySyracuseNew YorkUSA
| | - Anirudh Kaza
- Department of Biomedical EngineeringUniversity of North TexasDentonTexasUSA
- Texas Academy of Mathematics and ScienceUniversity of North TexasDentonTexasUSA
| | - Xiyuan Liu
- Department of Mechanical & Aerospace EngineeringSyracuse UniversitySyracuseNew YorkUSA
| | - Jathin Pranav Singaraju
- Department of Biomedical EngineeringUniversity of North TexasDentonTexasUSA
- Texas Academy of Mathematics and ScienceUniversity of North TexasDentonTexasUSA
| | - Mark V. Albert
- Department of Biomedical EngineeringUniversity of North TexasDentonTexasUSA
- Department of Computer Science and EngineeringUniversity of North TexasDentonTexasUSA
| | - Zhen Ma
- Department of Biomedical & Chemical EngineeringSyracuse UniversitySyracuseNew YorkUSA
- BioInspired Institute for Material and Living SystemsSyracuse UniversitySyracuseNew YorkUSA
| | - Huaxiao Yang
- Department of Biomedical EngineeringUniversity of North TexasDentonTexasUSA
| |
Collapse
|
103
|
Wang M, Wu Z, Wang J, Weng G, Kang Y, Pan P, Li D, Deng Y, Yao X, Bing Z, Hsieh CY, Hou T. Genetic Algorithm-Based Receptor Ligand: A Genetic Algorithm-Guided Generative Model to Boost the Novelty and Drug-Likeness of Molecules in a Sampling Chemical Space. J Chem Inf Model 2024; 64:1213-1228. [PMID: 38302422 DOI: 10.1021/acs.jcim.3c01964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Deep learning-based de novo molecular design has recently gained significant attention. While numerous DL-based generative models have been successfully developed for designing novel compounds, the majority of the generated molecules lack sufficiently novel scaffolds or high drug-like profiles. The aforementioned issues may not be fully captured by commonly used metrics for the assessment of molecular generative models, such as novelty, diversity, and quantitative estimation of the drug-likeness score. To address these limitations, we proposed a genetic algorithm-guided generative model called GARel (genetic algorithm-based receptor-ligand interaction generator), a novel framework for training a DL-based generative model to produce drug-like molecules with novel scaffolds. To efficiently train the GARel model, we utilized dense net to update the parameters based on molecules with novel scaffolds and drug-like features. To demonstrate the capability of the GARel model, we used it to design inhibitors for three targets: AA2AR, EGFR, and SARS-Cov2. The results indicate that GARel-generated molecules feature more diverse and novel scaffolds and possess more desirable physicochemical properties and favorable docking scores. Compared with other generative models, GARel makes significant progress in balancing novelty and drug-likeness, providing a promising direction for the further development of DL-based de novo design methodology with potential impacts on drug discovery.
Collapse
Affiliation(s)
- Mingyang Wang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
- CarbonSilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang ,China
| | - Zhengjian Wu
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
- School of Computer Science, Wuhan University, Wuhan 430072, Hubei ,China
| | - Jike Wang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
- CarbonSilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang ,China
| | - Gaoqi Weng
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Yu Kang
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Peichen Pan
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Dan Li
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang ,China
| | - Xiaojun Yao
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery Macau Institute for Applied Research in Medicine and Health State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Taipa, Macau 999078, China
| | - Zhitong Bing
- Institute of Modern Physics, Chinese Academy of Sciences, Lanzhou, Gansu 730000, China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| | - Tingjun Hou
- College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang ,China
| |
Collapse
|
104
|
Qi X, Zhao Y, Qi Z, Hou S, Chen J. Machine Learning Empowering Drug Discovery: Applications, Opportunities and Challenges. Molecules 2024; 29:903. [PMID: 38398653 PMCID: PMC10892089 DOI: 10.3390/molecules29040903] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 02/08/2024] [Accepted: 02/14/2024] [Indexed: 02/25/2024] Open
Abstract
Drug discovery plays a critical role in advancing human health by developing new medications and treatments to combat diseases. How to accelerate the pace and reduce the costs of new drug discovery has long been a key concern for the pharmaceutical industry. Fortunately, by leveraging advanced algorithms, computational power and biological big data, artificial intelligence (AI) technology, especially machine learning (ML), holds the promise of making the hunt for new drugs more efficient. Recently, the Transformer-based models that have achieved revolutionary breakthroughs in natural language processing have sparked a new era of their applications in drug discovery. Herein, we introduce the latest applications of ML in drug discovery, highlight the potential of advanced Transformer-based ML models, and discuss the future prospects and challenges in the field.
Collapse
Affiliation(s)
- Xin Qi
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, Suzhou 215011, China; (Y.Z.); (S.H.); (J.C.)
| | - Yuanchun Zhao
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, Suzhou 215011, China; (Y.Z.); (S.H.); (J.C.)
| | - Zhuang Qi
- School of Software, Shandong University, Jinan 250101, China;
| | - Siyu Hou
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, Suzhou 215011, China; (Y.Z.); (S.H.); (J.C.)
| | - Jiajia Chen
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, Suzhou 215011, China; (Y.Z.); (S.H.); (J.C.)
| |
Collapse
|
105
|
Yoshikai Y, Mizuno T, Nemoto S, Kusuhara H. Difficulty in chirality recognition for Transformer architectures learning chemical structures from string representations. Nat Commun 2024; 15:1197. [PMID: 38365821 PMCID: PMC10873378 DOI: 10.1038/s41467-024-45102-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 01/11/2024] [Indexed: 02/18/2024] Open
Abstract
Recent years have seen rapid development of descriptor generation based on representation learning of extremely diverse molecules, especially those that apply natural language processing (NLP) models to SMILES, a literal representation of molecular structure. However, little research has been done on how these models understand chemical structure. To address this black box, we investigated the relationship between the learning progress of SMILES and chemical structure using a representative NLP model, the Transformer. We show that while the Transformer learns partial structures of molecules quickly, it requires extended training to understand overall structures. Consistently, the accuracy of molecular property predictions using descriptors generated from models at different learning steps was similar from the beginning to the end of training. Furthermore, we found that the Transformer requires particularly long training to learn chirality and sometimes stagnates with low performance due to misunderstanding of enantiomers. These findings are expected to deepen the understanding of NLP models in chemistry.
Collapse
Affiliation(s)
- Yasuhiro Yoshikai
- Laboratory of Molecular Pharmacokinetics, Graduate School of Pharmaceutical Sciences, The University of Tokyo, 7-3-1 Hongo, Bunkyo, Tokyo, Japan
| | - Tadahaya Mizuno
- Laboratory of Molecular Pharmacokinetics, Graduate School of Pharmaceutical Sciences, The University of Tokyo, 7-3-1 Hongo, Bunkyo, Tokyo, Japan.
| | - Shumpei Nemoto
- Laboratory of Molecular Pharmacokinetics, Graduate School of Pharmaceutical Sciences, The University of Tokyo, 7-3-1 Hongo, Bunkyo, Tokyo, Japan
| | - Hiroyuki Kusuhara
- Laboratory of Molecular Pharmacokinetics, Graduate School of Pharmaceutical Sciences, The University of Tokyo, 7-3-1 Hongo, Bunkyo, Tokyo, Japan
| |
Collapse
|
106
|
Zhang H, Huang J, Xie J, Huang W, Yang Y, Xu M, Lei J, Chen H. GRELinker: A Graph-Based Generative Model for Molecular Linker Design with Reinforcement and Curriculum Learning. J Chem Inf Model 2024; 64:666-676. [PMID: 38241022 DOI: 10.1021/acs.jcim.3c01700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/13/2024]
Abstract
Fragment-based drug discovery (FBDD) is widely used in drug design. One useful strategy in FBDD is designing linkers for linking fragments to optimize their molecular properties. In the current study, we present a novel generative fragment linking model, GRELinker, which utilizes a gated-graph neural network combined with reinforcement and curriculum learning to generate molecules with desirable attributes. The model has been shown to be efficient in multiple tasks, including controlling log P, optimizing synthesizability or predicted bioactivity of compounds, and generating molecules with high 3D similarity but low 2D similarity to the lead compound. Specifically, our model outperforms the previously reported reinforcement learning (RL) built-in method DRlinker on these benchmark tasks. Moreover, GRELinker has been successfully used in an actual FBDD case to generate optimized molecules with enhanced affinities by employing the docking score as the scoring function in RL. Besides, the implementation of curriculum learning in our framework enables the generation of structurally complex linkers more efficiently. These results demonstrate the benefits and feasibility of GRELinker in linker design for molecular optimization and drug discovery.
Collapse
Affiliation(s)
- Hao Zhang
- School of Pharmaceutical Science, Sun Yat-sen University, Guangzhou 510006, China
| | - Jinchao Huang
- School of Pharmaceutical Science, Sun Yat-sen University, Guangzhou 510006, China
| | - Junjie Xie
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China
| | - Weifeng Huang
- School of Pharmaceutical Science, Sun Yat-sen University, Guangzhou 510006, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China
| | - Mingyuan Xu
- Guangzhou National Laboratory, Guangzhou International Bio Island, No. 9 Xin Dao Huan Bei Road, Guangzhou 510005, China
| | - Jinping Lei
- School of Pharmaceutical Science, Sun Yat-sen University, Guangzhou 510006, China
| | - Hongming Chen
- Guangzhou National Laboratory, Guangzhou International Bio Island, No. 9 Xin Dao Huan Bei Road, Guangzhou 510005, China
| |
Collapse
|
107
|
Tossou P, Wognum C, Craig M, Mary H, Noutahi E. Real-World Molecular Out-Of-Distribution: Specification and Investigation. J Chem Inf Model 2024; 64:697-711. [PMID: 38300258 PMCID: PMC10865358 DOI: 10.1021/acs.jcim.3c01774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 01/09/2024] [Accepted: 01/10/2024] [Indexed: 02/02/2024]
Abstract
This study presents a rigorous framework for investigating molecular out-of-distribution (MOOD) generalization in drug discovery. The concept of MOOD is first clarified through a problem specification that demonstrates how the covariate shifts encountered during real-world deployment can be characterized by the distribution of sample distances to the training set. We find that these shifts can cause performance to drop by up to 60% and uncertainty calibration by up to 40%. This leads us to propose a splitting protocol that aims to close the gap between the deployment and testing. Then, using this protocol, a thorough investigation is conducted to assess the impact of model design, model selection, and data set characteristics on MOOD performance and uncertainty calibration. We find that appropriate representations and algorithms with built-in uncertainty estimation are crucial to improving performance and uncertainty calibration. This study sets itself apart by its exhaustiveness and opens an exciting avenue to benchmark meaningful algorithmic progress in molecular scoring.
Collapse
Affiliation(s)
- Prudencio Tossou
- Valence
Labs, Montréal, Québec H2S3G9, Canada
- Department
of Computer Science and Software Engineering, Université Laval, Montréal, Québec G1 V 0A6, Canada
| | - Cas Wognum
- Valence
Labs, Montréal, Québec H2S3G9, Canada
| | | | | | | |
Collapse
|
108
|
Adebar N, Keupp J, Emenike VN, Kühlborn J, Vom Dahl L, Möckel R, Smiatek J. Scientific Deep Machine Learning Concepts for the Prediction of Concentration Profiles and Chemical Reaction Kinetics: Consideration of Reaction Conditions. J Phys Chem A 2024; 128:929-944. [PMID: 38271617 DOI: 10.1021/acs.jpca.3c06265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2024]
Abstract
Emerging concepts from scientific deep machine learning such as physics-informed neural networks (PINNs) enable a data-driven approach for the study of complex kinetic problems. We present an extended framework that combines the advantages of PINNs with the detailed consideration of experimental parameter variations for the simulation and prediction of chemical reaction kinetics. The approach is based on truncated Taylor series expansions for the underlying fundamental equations, whereby the external variations can be interpreted as perturbations of the kinetic parameters. Accordingly, our method allows for an efficient consideration of experimental parameter settings and their influence on the concentration profiles and reaction kinetics. A particular advantage of our approach, in addition to the consideration of univariate and multivariate parameter variations, is the robust model-based exploration of the parameter space to determine optimal reaction conditions in combination with advanced reaction insights. The benefits of this concept are demonstrated for higher-order chemical reactions including catalytic and oscillatory systems in combination with small amounts of training data. All predicted values show a high level of accuracy, demonstrating the broad applicability and flexibility of our approach.
Collapse
Affiliation(s)
- Niklas Adebar
- Development NCE, Chemical Development, Boehringer Ingelheim Pharma GmbH & Co. KG, D-55218 Ingelheim (Rhein), Germany
| | - Julian Keupp
- Development NCE, Chemical Development, Boehringer Ingelheim Pharma GmbH & Co. KG, D-55218 Ingelheim (Rhein), Germany
| | - Victor N Emenike
- HP BioP Launch and Innovation, Boehringer Ingelheim Pharma GmbH & Co. KG, D-55218 Ingelheim (Rhein), Germany
| | - Jonas Kühlborn
- Development NCE, Chemical Development, Boehringer Ingelheim Pharma GmbH & Co. KG, D-55218 Ingelheim (Rhein), Germany
| | - Lisa Vom Dahl
- Development NCE, Analytical Development, Boehringer Ingelheim Pharma GmbH & Co. KG, D-55218 Ingelheim (Rhein), Germany
| | - Robert Möckel
- Development NCE, Chemical Development, Boehringer Ingelheim Pharma GmbH & Co. KG, D-55218 Ingelheim (Rhein), Germany
| | - Jens Smiatek
- Institute for Computational Physics, University of Stuttgart, D-70569 Stuttgart, Germany
- Development NCE, Strategy NCEs, Boehringer Ingelheim Pharma GmbH & Co. KG, D-88397 Biberach (Riss), Germany
| |
Collapse
|
109
|
Zheng L, Shi F, Peng C, Xu M, Fan F, Li Y, Zhang L, Du J, Wang Z, Lin Z, Sun Y, Deng C, Duan X, Wei L, Zhao C, Fang L, Zhang P, Ma S, Lai L, Yang M. Application scenario-oriented molecule generation platform developed for drug discovery. Methods 2024; 222:112-121. [PMID: 38215898 DOI: 10.1016/j.ymeth.2023.12.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 11/22/2023] [Accepted: 12/23/2023] [Indexed: 01/14/2024] Open
Abstract
Design of molecules for candidate compound selection is one of the central challenges in drug discovery due to the complexity of chemical space and requirement of multi-parameter optimization. Here we present an application scenario-oriented platform (ID4Idea) for molecule generation in different scenarios of drug discovery. This platform utilizes both library or rule based and generative based algorithms (VAE, RNN, GAN, etc.), in combination with various AI learning types (pre-training, transfer learning, reinforcement learning, active learning, etc.) and input representations (1D SMILES, 2D graph, 3D shape, binding site, pharmacophore, etc.), to enable customized solutions for a given molecular design scenario. Besides the usual generation followed screening protocol, goal-directed molecule generation can also be conducted towards predefined goals, enhancing the efficiency of hit identification, lead finding, and lead optimization. We demonstrate the effectiveness of ID4Idea platform through case studies, showcasing customized solutions for different design tasks using various input information, such as binding pockets, pharmacophores, and compound representations. In addition, remaining challenges are discussed to unlock the full potential of AI models in drug discovery and pave the way for the development of novel therapeutics.
Collapse
Affiliation(s)
- Lianjun Zheng
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Fangjun Shi
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Chunwang Peng
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Min Xu
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Fangda Fan
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Yuanpeng Li
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Lin Zhang
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Jiewen Du
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Zonghu Wang
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Zhixiong Lin
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Yina Sun
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Chenglong Deng
- Jingtai Zhiyao Technology (Shanghai) Co., Ltd. (XtalPi), No. 207 Huanqiao Road, Pudong New Area, Shanghai 201315, China
| | - Xinli Duan
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Lin Wei
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | | | - Lei Fang
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Peiyu Zhang
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Songling Ma
- XtalPi Innovation Center, XtalPi Inc., Beijing, China.
| | - Lipeng Lai
- XtalPi Innovation Center, XtalPi Inc., Beijing, China.
| | - Mingjun Yang
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China.
| |
Collapse
|
110
|
Tropsha A, Isayev O, Varnek A, Schneider G, Cherkasov A. Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR. Nat Rev Drug Discov 2024; 23:141-155. [PMID: 38066301 DOI: 10.1038/s41573-023-00832-0] [Citation(s) in RCA: 68] [Impact Index Per Article: 68.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/21/2023] [Indexed: 02/08/2024]
Abstract
Quantitative structure-activity relationship (QSAR) modelling, an approach that was introduced 60 years ago, is widely used in computer-aided drug design. In recent years, progress in artificial intelligence techniques, such as deep learning, the rapid growth of databases of molecules for virtual screening and dramatic improvements in computational power have supported the emergence of a new field of QSAR applications that we term 'deep QSAR'. Marking a decade from the pioneering applications of deep QSAR to tasks involved in small-molecule drug discovery, we herein describe key advances in the field, including deep generative and reinforcement learning approaches in molecular design, deep learning models for synthetic planning and the application of deep QSAR models in structure-based virtual screening. We also reflect on the emergence of quantum computing, which promises to further accelerate deep QSAR applications and the need for open-source and democratized resources to support computer-aided drug design.
Collapse
Affiliation(s)
| | | | | | | | - Artem Cherkasov
- University of British Columbia, Vancouver, BC, Canada.
- Photonic Inc., Coquitlam, BC, Canada.
| |
Collapse
|
111
|
Lamens A, Bajorath J. Generation of Molecular Counterfactuals for Explainable Machine Learning Based on Core-Substituent Recombination. ChemMedChem 2024; 19:e202300586. [PMID: 37983655 DOI: 10.1002/cmdc.202300586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 11/20/2023] [Accepted: 11/20/2023] [Indexed: 11/22/2023]
Abstract
The use of black box machine learning models whose decisions cannot be understood limits the acceptance of predictions in interdisciplinary research and camouflages artificial learning characteristics leading to predictions for other than anticipated reasons. Consequently, there is increasing interest in explainable artificial intelligence to rationalize predictions and uncover potential pitfalls. Among others, relevant approaches include feature attribution methods to identify molecular structures determining predictions and counterfactuals (CFs) or contrastive explanations. CFs are defined as variants of test instances with minimal modifications leading to opposing predictions. In medicinal chemistry, CFs have thus far only been little investigated although they are particularly intuitive from a chemical perspective. We introduce a new methodology for the systematic generation of CFs that is centered on well-defined structural analogues of test compounds. The approach is transparent, computationally straightforward, and shown to provide a wealth of CFs for test sets. The method is made freely available.
Collapse
Affiliation(s)
- Alec Lamens
- Department of Life Science Informatics and Data Science B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, 53115, Bonn, Germany
| | - Jürgen Bajorath
- Department of Life Science Informatics and Data Science B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, 53115, Bonn, Germany
- Lamarr Institute for Machine Learning and Artificial Intelligence, Rheinische Friedrich-Wilhelms-Universität Bonn, Friedrich-Hirzebruch-Allee 5/6, 53115, Bonn, Germany
| |
Collapse
|
112
|
Kutsal M, Ucar F, Kati N. Computational drug discovery on human immunodeficiency virus with a customized long short-term memory variational autoencoder deep-learning architecture. CPT Pharmacometrics Syst Pharmacol 2024; 13:308-316. [PMID: 38010989 PMCID: PMC10864928 DOI: 10.1002/psp4.13085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Revised: 11/01/2023] [Accepted: 11/07/2023] [Indexed: 11/29/2023] Open
Abstract
Despite attempts to control the spread of human immunodeficiency virus (HIV) through the use of anti-HIV medications, the absence of an effective vaccine continues to present a significant obstacle. In addition, the development of drug resistance by HIV underscores the necessity for computational drug discovery methods to identify novel therapies. This investigation specifically focused on employing a long short-term memory (LSTM) variational autoencoder deep-learning architecture for computational drug discovery in relation to HIV. Our data set comprised simplified molecular input line entry system (SMILES)-encoded compounds, which were used to train the LSTM autoencoder. Remarkably, our model achieved a training accuracy of 91%, with a data set containing 1377 compounds. Leveraging the generative model derived from the training phase, we generated potential new drugs for combating HIV and assessed their interaction with the virus using a previously developed artificial intelligence model. Lastly, we verified the drug likeliness of our computationally generated compounds in accordance with Lipinski's rule of five. Overall, our study presents a promising approach to computational drug discovery in the ongoing battle against HIV.
Collapse
Affiliation(s)
- Mucahit Kutsal
- Institute of Theoretical Physics and Astrophysics, Quantum Information TechnologyUniversity of GdańskGdańskPoland
| | - Ferhat Ucar
- Faculty of Technology, Software EngineeringFırat UniversityElazigTurkey
| | - Nida Kati
- Faculty of Technology, Materials and Metallurgical EngineeringFırat UniversityElazigTurkey
| |
Collapse
|
113
|
Gonzalez Pepe I, Chatelain Y, Kiar G, Glatard T. Numerical stability of DeepGOPlus inference. PLoS One 2024; 19:e0296725. [PMID: 38285635 PMCID: PMC10824456 DOI: 10.1371/journal.pone.0296725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 12/16/2023] [Indexed: 01/31/2024] Open
Abstract
Convolutional neural networks (CNNs) are currently among the most widely-used deep neural network (DNN) architectures available and achieve state-of-the-art performance for many problems. Originally applied to computer vision tasks, CNNs work well with any data with a spatial relationship, besides images, and have been applied to different fields. However, recent works have highlighted numerical stability challenges in DNNs, which also relates to their known sensitivity to noise injection. These challenges can jeopardise their performance and reliability. This paper investigates DeepGOPlus, a CNN that predicts protein function. DeepGOPlus has achieved state-of-the-art performance and can successfully take advantage and annotate the abounding protein sequences emerging in proteomics. We determine the numerical stability of the model's inference stage by quantifying the numerical uncertainty resulting from perturbations of the underlying floating-point data. In addition, we explore the opportunity to use reduced-precision floating point formats for DeepGOPlus inference, to reduce memory consumption and latency. This is achieved by instrumenting DeepGOPlus' execution using Monte Carlo Arithmetic, a technique that experimentally quantifies floating point operation errors and VPREC, a tool that emulates results with customizable floating point precision formats. Focus is placed on the inference stage as it is the primary deliverable of the DeepGOPlus model, widely applicable across different environments. All in all, our results show that although the DeepGOPlus CNN is very stable numerically, it can only be selectively implemented with lower-precision floating-point formats. We conclude that predictions obtained from the pre-trained DeepGOPlus model are very reliable numerically, and use existing floating-point formats efficiently.
Collapse
Affiliation(s)
- Inés Gonzalez Pepe
- Department of Computer Science and Software Engineering, Concordia University, Montreal, Qc, Canada
| | - Yohan Chatelain
- Department of Computer Science and Software Engineering, Concordia University, Montreal, Qc, Canada
| | - Gregory Kiar
- Computational Neuroimaging Laboratory, Child Mind Institute, New York, NY, United States of America
| | - Tristan Glatard
- Department of Computer Science and Software Engineering, Concordia University, Montreal, Qc, Canada
| |
Collapse
|
114
|
Cho J, Singh M, Lo AW. How does news affect biopharma stock prices?: An event study. PLoS One 2024; 19:e0296927. [PMID: 38277362 PMCID: PMC10817120 DOI: 10.1371/journal.pone.0296927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 12/22/2023] [Indexed: 01/28/2024] Open
Abstract
We investigate the impact of information on biopharmaceutical stock prices via an event study encompassing 503,107 news releases from 1,012 companies. We distinguish between pharmaceutical and biotechnology companies, and apply three asset pricing models to estimate their abnormal returns. Acquisition-related news yields the highest positive return, while drug-development setbacks trigger significant negative returns. We also find that biotechnology companies have larger means and standard deviations of abnormal returns, while the abnormal returns of pharmaceutical companies are influenced by more general financial news. To better understand the empirical properties of price movement dynamics, we regress abnormal returns on market capitalization and a sub-industry indicator variable to distinguish biotechnology and pharmaceutical companies, and find that biopharma companies with larger capitalization generally experience lower magnitude of abnormal returns in response to events. Using longer event windows, we show that news related to acquisitions and clinical trials are the sources of potential news leakage. We expect this study to provide valuable insights into how diverse news types affect market perceptions and stock valuations, particularly in the volatile and information-sensitive biopharmaceutical sector, thus aiding stakeholders in making informed investment and strategic decisions.
Collapse
Affiliation(s)
- Joonhyuk Cho
- Laboratory for Financial Engineering, MIT, Cambridge, MA, United States of America
- Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA, United States of America
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, United States of America
| | - Manish Singh
- Laboratory for Financial Engineering, MIT, Cambridge, MA, United States of America
- Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA, United States of America
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, United States of America
| | - Andrew W. Lo
- Laboratory for Financial Engineering, MIT, Cambridge, MA, United States of America
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, United States of America
- Operations Research Center, MIT, Cambridge, MA, United States of America
- Sloan School of Management, MIT, Cambridge, MA, United States of America
- Santa Fe Institute, Santa Fe, NM, United States of America
| |
Collapse
|
115
|
Zhu J, Che C, Jiang H, Xu J, Yin J, Zhong Z. SSF-DDI: a deep learning method utilizing drug sequence and substructure features for drug-drug interaction prediction. BMC Bioinformatics 2024; 25:39. [PMID: 38262923 PMCID: PMC10810255 DOI: 10.1186/s12859-024-05654-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 01/12/2024] [Indexed: 01/25/2024] Open
Abstract
BACKGROUND Drug-drug interactions (DDI) are prevalent in combination therapy, necessitating the importance of identifying and predicting potential DDI. While various artificial intelligence methods can predict and identify potential DDI, they often overlook the sequence information of drug molecules and fail to comprehensively consider the contribution of molecular substructures to DDI. RESULTS In this paper, we proposed a novel model for DDI prediction based on sequence and substructure features (SSF-DDI) to address these issues. Our model integrates drug sequence features and structural features from the drug molecule graph, providing enhanced information for DDI prediction and enabling a more comprehensive and accurate representation of drug molecules. CONCLUSION The results of experiments and case studies have demonstrated that SSF-DDI significantly outperforms state-of-the-art DDI prediction models across multiple real datasets and settings. SSF-DDI performs better in predicting DDI involving unknown drugs, resulting in a 5.67% improvement in accuracy compared to state-of-the-art methods.
Collapse
Affiliation(s)
- Jing Zhu
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, Dalian University, Dalian, 116000, China
| | - Chao Che
- School of Software Engineering, Dalian University, Dalian, 116000, China
| | - Hao Jiang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, Dalian University, Dalian, 116000, China
| | - Jian Xu
- General Surgery, Affiliated Zhongshan Hospital of Dalian University, Dalian, 116000, China
| | - Jiajun Yin
- General Surgery, Affiliated Zhongshan Hospital of Dalian University, Dalian, 116000, China
| | - Zhaoqian Zhong
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, Dalian University, Dalian, 116000, China.
| |
Collapse
|
116
|
Qian Y, Shi M, Zhang Q. CONSMI: Contrastive Learning in the Simplified Molecular Input Line Entry System Helps Generate Better Molecules. Molecules 2024; 29:495. [PMID: 38276573 PMCID: PMC10821140 DOI: 10.3390/molecules29020495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/12/2024] [Accepted: 01/16/2024] [Indexed: 01/27/2024] Open
Abstract
In recent years, the application of deep learning in molecular de novo design has gained significant attention. One successful approach involves using SMILES representations of molecules and treating the generation task as a text generation problem, yielding promising results. However, the generation of more effective and novel molecules remains a key research area. Due to the fact that a molecule can have multiple SMILES representations, it is not sufficient to consider only one of them for molecular generation. To make up for this deficiency, and also motivated by the advancements in contrastive learning in natural language processing, we propose a contrastive learning framework called CONSMI to learn more comprehensive SMILES representations. This framework leverages different SMILES representations of the same molecule as positive examples and other SMILES representations as negative examples for contrastive learning. The experimental results of generation tasks demonstrate that CONSMI significantly enhances the novelty of generated molecules while maintaining a high validity. Moreover, the generated molecules have similar chemical properties compared to the original dataset. Additionally, we find that CONSMI can achieve favorable results in classifier tasks, such as the compound-protein interaction task.
Collapse
Affiliation(s)
| | | | - Qian Zhang
- School of Computer Science and Technology, Shanghai Frontiers Science Center of Molecule Intelligent Syntheses, East China Normal University, 3663 North Zhongshan Road, Putuo District, Shanghai 200062, China; (Y.Q.); (M.S.)
| |
Collapse
|
117
|
Karampuri A, Perugu S. A breast cancer-specific combinational QSAR model development using machine learning and deep learning approaches. FRONTIERS IN BIOINFORMATICS 2024; 3:1328262. [PMID: 38288043 PMCID: PMC10822965 DOI: 10.3389/fbinf.2023.1328262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 12/21/2023] [Indexed: 01/31/2024] Open
Abstract
Breast cancer is the most prevalent and heterogeneous form of cancer affecting women worldwide. Various therapeutic strategies are in practice based on the extent of disease spread, such as surgery, chemotherapy, radiotherapy, and immunotherapy. Combinational therapy is another strategy that has proven to be effective in controlling cancer progression. Administration of Anchor drug, a well-established primary therapeutic agent with known efficacy for specific targets, with Library drug, a supplementary drug to enhance the efficacy of anchor drugs and broaden the therapeutic approach. Our work focused on harnessing regression-based Machine learning (ML) and deep learning (DL) algorithms to develop a structure-activity relationship between the molecular descriptors of drug pairs and their combined biological activity through a QSAR (Quantitative structure-activity relationship) model. 11 popularly known machine learning and deep learning algorithms were used to develop QSAR models. A total of 52 breast cancer cell lines, 25 anchor drugs, and 51 library drugs were considered in developing the QSAR model. It was observed that Deep Neural Networks (DNNs) achieved an impressive R2 (Coefficient of Determination) of 0.94, with an RMSE (Root Mean Square Error) value of 0.255, making it the most effective algorithm for developing a structure-activity relationship with strong generalization capabilities. In conclusion, applying combinational therapy alongside ML and DL techniques represents a promising approach to combating breast cancer.
Collapse
Affiliation(s)
| | - Shyam Perugu
- Department of Biotechnology, National Institute of Technology, Warangal, India
| |
Collapse
|
118
|
Arora P, Behera M, Saraf SA, Shukla R. Leveraging Artificial Intelligence for Synergies in Drug Discovery: From Computers to Clinics. Curr Pharm Des 2024; 30:2187-2205. [PMID: 38874046 DOI: 10.2174/0113816128308066240529121148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 03/27/2024] [Accepted: 04/03/2024] [Indexed: 06/15/2024]
Abstract
Over the period of the preceding decade, artificial intelligence (AI) has proved an outstanding performance in entire dimensions of science including pharmaceutical sciences. AI uses the concept of machine learning (ML), deep learning (DL), and neural networks (NNs) approaches for novel algorithm and hypothesis development by training the machines in multiple ways. AI-based drug development from molecule identification to clinical approval tremendously reduces the cost of development and the time over conventional methods. The COVID-19 vaccine development and approval by regulatory agencies within 1-2 years is the finest example of drug development. Hence, AI is fast becoming a boon for scientific researchers to streamline their advanced discoveries. AI-based FDA-approved nanomedicines perform well as target selective, synergistic therapies, recolonize the theragnostic pharmaceutical stream, and significantly improve drug research outcomes. This comprehensive review delves into the fundamental aspects of AI along with its applications in the realm of pharmaceutical life sciences. It explores AI's role in crucial areas such as drug designing, drug discovery and development, traditional Chinese medicine, integration of multi-omics data, as well as investigations into drug repurposing and polypharmacology studies.
Collapse
Affiliation(s)
- Priyanka Arora
- Department of Pharmaceutics, National Institute of Pharmaceutical Education and Research (NIPER)-Raebareli, Near CRPF Base Camp, Bijnor-Sisendi Road, Sarojini Nagar, Lucknow (UP)-226002, India
| | - Manaswini Behera
- Department of Pharmaceutics, National Institute of Pharmaceutical Education and Research (NIPER)-Raebareli, Near CRPF Base Camp, Bijnor-Sisendi Road, Sarojini Nagar, Lucknow (UP)-226002, India
| | - Shubhini A Saraf
- Department of Pharmaceutics, National Institute of Pharmaceutical Education and Research (NIPER)-Raebareli, Near CRPF Base Camp, Bijnor-Sisendi Road, Sarojini Nagar, Lucknow (UP)-226002, India
| | - Rahul Shukla
- Department of Pharmaceutics, National Institute of Pharmaceutical Education and Research (NIPER)-Raebareli, Near CRPF Base Camp, Bijnor-Sisendi Road, Sarojini Nagar, Lucknow (UP)-226002, India
| |
Collapse
|
119
|
Chen M, Yang J, Tang C, Lu X, Wei Z, Liu Y, Yu P, Li H. Improving ADMET Prediction Accuracy for Candidate Drugs: Factors to Consider in QSPR Modeling Approaches. Curr Top Med Chem 2024; 24:222-242. [PMID: 38083894 DOI: 10.2174/0115680266280005231207105900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 11/02/2023] [Accepted: 11/10/2023] [Indexed: 05/04/2024]
Abstract
Quantitative Structure-Property Relationship (QSPR) employs mathematical and statistical methods to reveal quantitative correlations between the pharmacokinetics of compounds and their molecular structures, as well as their physical and chemical properties. QSPR models have been widely applied in the prediction of drug absorption, distribution, metabolism, excretion, and toxicity (ADMET). However, the accuracy of QSPR models for predicting drug ADMET properties still needs improvement. Therefore, this paper comprehensively reviews the tools employed in various stages of QSPR predictions for drug ADMET. It summarizes commonly used approaches to building QSPR models, systematically analyzing the advantages and limitations of each modeling method to ensure their judicious application. We provide an overview of recent advancements in the application of QSPR models for predicting drug ADMET properties. Furthermore, this review explores the inherent challenges in QSPR modeling while also proposing a range of considerations aimed at enhancing model prediction accuracy. The objective is to enhance the predictive capabilities of QSPR models in the field of drug development and provide valuable reference and guidance for researchers in this domain.
Collapse
Affiliation(s)
- Meilun Chen
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Changsha, Hunan, 410013, China
| | - Jie Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Changsha, Hunan, 410013, China
| | - Chunhua Tang
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Changsha, Hunan, 410013, China
| | - Xiaoling Lu
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Changsha, Hunan, 410013, China
| | - Zheng Wei
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Changsha, Hunan, 410013, China
| | - Yijie Liu
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Changsha, Hunan, 410013, China
| | - Peng Yu
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Changsha, Hunan, 410013, China
| | - HuanHuan Li
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Changsha, Hunan, 410013, China
| |
Collapse
|
120
|
Ahmadi M, Ayyoubzadeh SM, Ghorbani-Bidkorpeh F. Toxicity prediction of nanoparticles using machine learning approaches. Toxicology 2024; 501:153697. [PMID: 38056590 DOI: 10.1016/j.tox.2023.153697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 11/21/2023] [Accepted: 12/01/2023] [Indexed: 12/08/2023]
Abstract
Nanoparticle toxicity analysis is critical for evaluating the safety of nanomaterials due to their potential harm to the biological system. However, traditional experimental methods for evaluating nanoparticle toxicity are expensive and time-consuming. As an alternative approach, machine learning offers a solution for predicting cellular responses to nanoparticles. This study focuses on developing ML models for nanoparticle toxicity prediction. The training dataset used for building these models includes the physicochemical properties of nanoparticles, exposure conditions, and cellular responses of different cell lines. The impact of each parameter on cell death was assessed using the Gini index. Five classifiers, namely Decision Tree, Random Forest, Support Vector Machine, Naïve Bayes, and Artificial Neural Network, were employed to predict toxicity. The models' performance was compared based on accuracy, sensitivity, specificity, area under the curve, F measure, K-fold validation, and classification error. The Gini index indicated that cell line, exposure dose, and tissue are the most influential factors in cell death. Among the models tested, Random Forest exhibited the highest performance in the given dataset. Other models demonstrated lower performance compared to Random Forest. Researchers can utilize the Random Forest model to predict nanoparticle toxicity, resulting in cost and time savings for toxicity analysis.
Collapse
Affiliation(s)
- Mahnaz Ahmadi
- Medical Nanotechnology and Tissue Engineering Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Seyed Mohammad Ayyoubzadeh
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran; Health Information Management Research Center, Tehran University of Medical Sciences, Tehran, Iran.
| | - Fatemeh Ghorbani-Bidkorpeh
- Department of Pharmaceutics and Pharmaceutical Nanotechnology, School of Pharmacy, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
121
|
Zhang YH, Zhao P, Gao HL, Zhong ML, Li JY. Screening Targets and Therapeutic Drugs for Alzheimer's Disease Based on Deep Learning Model and Molecular Docking. J Alzheimers Dis 2024; 100:863-878. [PMID: 38995776 DOI: 10.3233/jad-231389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/14/2024]
Abstract
Background Alzheimer's disease (AD) is a neurodegenerative disorder caused by a complex interplay of various factors. However, a satisfactory cure for AD remains elusive. Pharmacological interventions based on drug targets are considered the most cost-effective therapeutic strategy. Therefore, it is paramount to search potential drug targets and drugs for AD. Objective We aimed to provide novel targets and drugs for the treatment of AD employing transcriptomic data of AD and normal control brain tissues from a new perspective. Methods Our study combined the use of a multi-layer perceptron (MLP) with differential expression analysis, variance assessment and molecular docking to screen targets and drugs for AD. Results We identified the seven differentially expressed genes (DEGs) with the most significant variation (ANKRD39, CPLX1, FABP3, GABBR2, GNG3, PPM1E, and WDR49) in transcriptomic data from AD brain. A newly built MLP was used to confirm the association between the seven DEGs and AD, establishing these DEGs as potential drug targets. Drug databases and molecular docking results indicated that arbaclofen, baclofen, clozapine, arbaclofen placarbil, BML-259, BRD-K72883421, and YC-1 had high affinity for GABBR2, and FABP3 bound with oleic, palmitic, and stearic acids. Arbaclofen and YC-1 activated GABAB receptor through PI3K/AKT and PKA/CREB pathways, respectively, thereby promoting neuronal anti-apoptotic effect and inhibiting p-tau and Aβ formation. Conclusions This study provided a new strategy for the identification of targets and drugs for the treatment of AD using deep learning. Seven therapeutic targets and ten drugs were selected by using this method, providing new insight for AD treatment.
Collapse
Affiliation(s)
- Ya-Hong Zhang
- College of Life and Health Sciences, Northeastern University, Shenyang, China
| | - Pu Zhao
- College of Life and Health Sciences, Northeastern University, Shenyang, China
| | - Hui-Ling Gao
- College of Life and Health Sciences, Northeastern University, Shenyang, China
| | - Man-Li Zhong
- College of Life and Health Sciences, Northeastern University, Shenyang, China
| | - Jia-Yi Li
- Health Sciences Institute, China Medical University, Shenyang, China
- Department of Experimental Medical Science, Neuronal Plasticity and Repair Unit, Wallenberg Neuroscience Center, Lund University, Lund, Sweden
| |
Collapse
|
122
|
He S, Ye X, Dou L, Sakurai T. FIAMol-AB: A feature fusion and attention-based deep learning method for enhanced antibiotic discovery. Comput Biol Med 2024; 168:107762. [PMID: 38056212 DOI: 10.1016/j.compbiomed.2023.107762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Revised: 10/31/2023] [Accepted: 11/21/2023] [Indexed: 12/08/2023]
Abstract
Antibiotic resistance continues to be a growing concern for global health, accentuating the need for novel antibiotic discoveries. Traditional methodologies in this field have relied heavily on extensive experimental screening, which is often time-consuming and costly. Contrastly, computer-assisted drug screening offers rapid, cost-effective solutions. In this work, we propose FIAMol-AB, a deep learning model that combines graph neural networks, text convolutional networks and molecular fingerprint techniques. This method also combines an attention mechanism to fuse multiple forms of information within the model. The experiments show that FIAMol-AB may offer potential advantages in antibiotic discovery tasks over some existing methods. We conducted some analysis based on our model's results, which help highlight the potential significance of certain features in the model's predictive performance. Compared to different models, ours demonstrate promising results, indicating potential robustness and versatility. This suggests that by integrating multi-view information and attention mechanisms, FIAMol-AB might better learn complex molecular structures, potentially improving the precision and efficiency of antibiotic discovery. We hope our FIAMol-AB can be used as a useful method in the ongoing fight against antibiotic resistance.
Collapse
Affiliation(s)
- Shida He
- Department of Computer Science, University of Tsukuba, Tsukuba, Ibaraki, 305-8577, Japan
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, Ibaraki, 305-8577, Japan.
| | - Lijun Dou
- Genomic Medicine Institute, Lerner Research Institute, Cleveland, OH, 44106, USA
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba, Ibaraki, 305-8577, Japan
| |
Collapse
|
123
|
He B, Guo J, Tong HHY, To WM. Artificial Intelligence in Drug Discovery: A Bibliometric Analysis and Literature Review. Mini Rev Med Chem 2024; 24:1353-1367. [PMID: 38243944 DOI: 10.2174/0113895575271267231123160503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 09/09/2023] [Accepted: 09/11/2023] [Indexed: 01/22/2024]
Abstract
Drug discovery is a complex and iterative process, making it ideal for using artificial intelligence (AI). This paper uses a bibliometric approach to reveal AI's trend and underlying structure in drug discovery (AIDD). A total of 4310 journal articles and reviews indexed in Scopus were analyzed, revealing that AIDD has been rapidly growing over the past two decades, with a significant increase after 2017. The United States, China, and the United Kingdom were the leading countries in research output, with academic institutions, particularly the Chinese Academy of Sciences and the University of Cambridge, being the most productive. In addition, industrial companies, including both pharmaceutical and high-tech ones, also made significant contributions. Additionally, this paper thoroughly discussed the evolution and research frontiers of AIDD, which were uncovered through co-occurrence analyses of keywords using VOSviewer. Our findings highlight that AIDD is an interdisciplinary and promising research field that has the potential to revolutionize drug discovery. The comprehensive overview provided here will be of significant interest to researchers, practitioners, and policy-makers in related fields. The results emphasize the need for continued investment and collaboration in AIDD to accelerate drug discovery, reduce costs, and improve patient outcomes.
Collapse
Affiliation(s)
- Baoyu He
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao, China
| | - Jingjing Guo
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao, China
| | - Henry H Y Tong
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao, China
| | - Wai Ming To
- Faculty of Business, Macao Polytechnic University, Macao, China
| |
Collapse
|
124
|
Sainz-DeMena D, Pérez MA, García-Aznar JM. Exploring the potential of Physics-Informed Neural Networks to extract vascularization data from DCE-MRI in the presence of diffusion. Med Eng Phys 2024; 123:104092. [PMID: 38365330 DOI: 10.1016/j.medengphy.2023.104092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 11/23/2023] [Accepted: 12/16/2023] [Indexed: 02/18/2024]
Abstract
Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is widely used to assess tissue vascularization, particularly in oncological applications. However, the most widely used pharmacokinetic (PK) models do not account for contrast agent (CA) diffusion between neighboring voxels, which can limit the accuracy of the results, especially in cases of heterogeneous tumors. To address this issue, previous works have proposed algorithms that incorporate diffusion phenomena into the formulation. However, these algorithms often face convergence problems due to the ill-posed nature of the problem. In this work, we present a new approach to fitting DCE-MRI data that incorporates CA diffusion by using Physics-Informed Neural Networks (PINNs). PINNs can be trained to fit measured data obtained from DCE-MRI while ensuring the mass conservation equation from the PK model. We compare the performance of PINNs to previous algorithms on different 1D cases inspired by previous works from literature. Results show that PINNs retrieve vascularization parameters more accurately from diffusion-corrected tracer-kinetic models. Furthermore, we demonstrate the robustness of PINNs compared to other traditional algorithms when faced with noisy or incomplete data. Overall, our results suggest that PINNs can be a valuable tool for improving the accuracy of DCE-MRI data analysis, particularly in cases where CA diffusion plays a significant role.
Collapse
Affiliation(s)
- D Sainz-DeMena
- Department of Mechanical Engineering, Aragon Institute for Engineering Research (I3A), University of Zaragoza, Zaragoza, Spain
| | - M A Pérez
- Department of Mechanical Engineering, Aragon Institute for Engineering Research (I3A), University of Zaragoza, Zaragoza, Spain
| | - J M García-Aznar
- Department of Mechanical Engineering, Aragon Institute for Engineering Research (I3A), University of Zaragoza, Zaragoza, Spain.
| |
Collapse
|
125
|
Niazi SK, Mariam Z. Computer-Aided Drug Design and Drug Discovery: A Prospective Analysis. Pharmaceuticals (Basel) 2023; 17:22. [PMID: 38256856 PMCID: PMC10819513 DOI: 10.3390/ph17010022] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 12/13/2023] [Accepted: 12/20/2023] [Indexed: 01/24/2024] Open
Abstract
In the dynamic landscape of drug discovery, Computer-Aided Drug Design (CADD) emerges as a transformative force, bridging the realms of biology and technology. This paper overviews CADDs historical evolution, categorization into structure-based and ligand-based approaches, and its crucial role in rationalizing and expediting drug discovery. As CADD advances, incorporating diverse biological data and ensuring data privacy become paramount. Challenges persist, demanding the optimization of algorithms and robust ethical frameworks. Integrating Machine Learning and Artificial Intelligence amplifies CADDs predictive capabilities, yet ethical considerations and scalability challenges linger. Collaborative efforts and global initiatives, exemplified by platforms like Open-Source Malaria, underscore the democratization of drug discovery. The convergence of CADD with personalized medicine offers tailored therapeutic solutions, though ethical dilemmas and accessibility concerns must be navigated. Emerging technologies like quantum computing, immersive technologies, and green chemistry promise to redefine the future of CADD. The trajectory of CADD, marked by rapid advancements, anticipates challenges in ensuring accuracy, addressing biases in AI, and incorporating sustainability metrics. This paper concludes by highlighting the need for proactive measures in navigating the ethical, technological, and educational frontiers of CADD to shape a healthier, brighter future in drug discovery.
Collapse
Affiliation(s)
| | - Zamara Mariam
- Centre for Health and Life Sciences, Coventry University, Coventry City CV1 5FB, UK
| |
Collapse
|
126
|
Tiwari PC, Pal R, Chaudhary MJ, Nath R. Artificial intelligence revolutionizing drug development: Exploring opportunities and challenges. Drug Dev Res 2023; 84:1652-1663. [PMID: 37712494 DOI: 10.1002/ddr.22115] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 08/14/2023] [Accepted: 09/04/2023] [Indexed: 09/16/2023]
Abstract
By harnessing artificial intelligence (AI) algorithms and machine learning techniques, the entire drug discovery process stands to undergo a profound transformation, offering a myriad of advantages. Foremost among these is the ability of AI to conduct swift and efficient screenings of expansive compound libraries, significantly augmenting the identification of potential drug candidates. Moreover, AI algorithms can prove instrumental in predicting the efficacy and safety profiles of candidate compounds, thus endowing invaluable insights and reducing reliance on extensive preclinical and clinical testing. This predictive capacity of AI has the potential to streamline the drug development pipeline and enhance the success rate of clinical trials, ultimately resulting in the emergence of more efficacious and safer therapeutic agents. However, the deployment of AI in drug discovery introduces certain challenges that warrant attention. A primary hurdle entails the imperative acquisition of high-quality and diverse data. Furthermore, ensuring the interpretability of AI models assumes critical importance in securing regulatory endorsement and cultivating trust within scientific and medical communities. Addressing ethical considerations, including data privacy and mitigating bias, represents an additional momentous challenge, requiring assiduous navigation. In this review, we provide an intricate and comprehensive overview of the multifaceted challenges intrinsic to conventional drug development paradigms, while simultaneously interrogating the efficacy of AI in effectively surmounting these formidable obstacles.
Collapse
Affiliation(s)
- Prafulla C Tiwari
- Department of Pharmacology and Therapeutics, King George's Medical University, Lucknow, Uttar Pradesh, India
| | - Rishi Pal
- Department of Pharmacology and Therapeutics, King George's Medical University, Lucknow, Uttar Pradesh, India
| | - Manju J Chaudhary
- Department of Physiology, Government Medical College, Kannauj, Uttar Pradesh, India
| | - Rajendra Nath
- Department of Pharmacology and Therapeutics, King George's Medical University, Lucknow, Uttar Pradesh, India
| |
Collapse
|
127
|
Sousa GHM, Gomes RA, de Oliveira EO, Trossini GHG. Machine learning methods applied for the prediction of biological activities of triple reuptake inhibitors. J Biomol Struct Dyn 2023; 41:10277-10286. [PMID: 36546689 DOI: 10.1080/07391102.2022.2154269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 11/25/2022] [Indexed: 12/24/2022]
Abstract
Major depressive disorder (MDD) is characterized by a series of disabling symptoms like anhedonia, depressed mood, lack of motivation for daily tasks and self-extermination thoughts. The monoamine deficiency hypothesis states that depression is mainly caused by a deficiency of monoamine at the synaptic cleft. Thus, major efforts have been made to develop drugs that inhibit serotonin (SERT), norepinephrine (NET) and dopamine (DAT) transporters and increase the availability of these monoamines. Current gold standard treatment of MDD uses drugs that target one or more monoamine transporters. Triple reuptake inhibitors (TRIs) can target SERT, NET, and DAT simultaneously, and are believed to have the potential to be early onset antidepressants. Quantitative structure-activity relationship models were developed using machine learning algorithms in order to predict biological activities of a series of triple reuptake inhibitor compounds that showed in vitro inhibitory activity against multiple targets. The results, using mostly interpretable descriptors, showed that the internal and external predictive ability of the models are adequate, particularly of the DAT and NET by Random Forest and Support Vector Machine models. The current work shows that models developed from relatively simple, chemically interpretable descriptors can predict the activity of TRIs with similar structure in the applicability domain using ML methods.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
| | - Renan Augusto Gomes
- Faculdade de Ciências Farmacêuticas, Universidade de São Paulo, São Paulo, SP, Brazil
| | | | | |
Collapse
|
128
|
Djoumbou-Feunang Y, Wilmot J, Kinney J, Chanda P, Yu P, Sader A, Sharifi M, Smith S, Ou J, Hu J, Shipp E, Tomandl D, Kumpatla SP. Cheminformatics and artificial intelligence for accelerating agrochemical discovery. Front Chem 2023; 11:1292027. [PMID: 38093816 PMCID: PMC10716421 DOI: 10.3389/fchem.2023.1292027] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Accepted: 11/09/2023] [Indexed: 10/17/2024] Open
Abstract
The global cost-benefit analysis of pesticide use during the last 30 years has been characterized by a significant increase during the period from 1990 to 2007 followed by a decline. This observation can be attributed to several factors including, but not limited to, pest resistance, lack of novelty with respect to modes of action or classes of chemistry, and regulatory action. Due to current and projected increases of the global population, it is evident that the demand for food, and consequently, the usage of pesticides to improve yields will increase. Addressing these challenges and needs while promoting new crop protection agents through an increasingly stringent regulatory landscape requires the development and integration of infrastructures for innovative, cost- and time-effective discovery and development of novel and sustainable molecules. Significant advances in artificial intelligence (AI) and cheminformatics over the last two decades have improved the decision-making power of research scientists in the discovery of bioactive molecules. AI- and cheminformatics-driven molecule discovery offers the opportunity of moving experiments from the greenhouse to a virtual environment where thousands to billions of molecules can be investigated at a rapid pace, providing unbiased hypothesis for lead generation, optimization, and effective suggestions for compound synthesis and testing. To date, this is illustrated to a far lesser extent in the publicly available agrochemical research literature compared to drug discovery. In this review, we provide an overview of the crop protection discovery pipeline and how traditional, cheminformatics, and AI technologies can help to address the needs and challenges of agrochemical discovery towards rapidly developing novel and more sustainable products.
Collapse
Affiliation(s)
| | - Jeremy Wilmot
- Corteva Agriscience, Crop Protection Discovery and Development, Indianapolis, IN, United States
| | - John Kinney
- Corteva Agriscience, Farming Solutions and Digital, Indianapolis, IN, United States
| | - Pritam Chanda
- Corteva Agriscience, Farming Solutions and Digital, Indianapolis, IN, United States
| | - Pulan Yu
- Corteva Agriscience, Crop Protection Discovery and Development, Indianapolis, IN, United States
| | - Avery Sader
- Corteva Agriscience, Crop Protection Discovery and Development, Indianapolis, IN, United States
| | - Max Sharifi
- Corteva Agriscience, Regulatory and Stewardship, Indianapolis, IN, United States
| | - Scott Smith
- Corteva Agriscience, Farming Solutions and Digital, Indianapolis, IN, United States
| | - Junjun Ou
- Corteva Agriscience, Crop Protection Discovery and Development, Indianapolis, IN, United States
| | - Jie Hu
- Corteva Agriscience, Farming Solutions and Digital, Indianapolis, IN, United States
| | - Elizabeth Shipp
- Corteva Agriscience UK Limited, Regulation Innovation Center, Abingdon, United Kingdom
| | | | | |
Collapse
|
129
|
Shen C, Luo J, Xia K. Molecular geometric deep learning. CELL REPORTS METHODS 2023; 3:100621. [PMID: 37875121 PMCID: PMC10694498 DOI: 10.1016/j.crmeth.2023.100621] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 06/16/2023] [Accepted: 09/28/2023] [Indexed: 10/26/2023]
Abstract
Molecular representation learning plays an important role in molecular property prediction. Existing molecular property prediction models rely on the de facto standard of covalent-bond-based molecular graphs for representing molecular topology at the atomic level and totally ignore the non-covalent interactions within the molecule. In this study, we propose a molecular geometric deep learning model to predict the properties of molecules that aims to comprehensively consider the information of covalent and non-covalent interactions of molecules. The essential idea is to incorporate a more general molecular representation into geometric deep learning (GDL) models. We systematically test molecular GDL (Mol-GDL) on fourteen commonly used benchmark datasets. The results show that Mol-GDL can achieve a better performance than state-of-the-art (SOTA) methods. Extensive tests have demonstrated the important role of non-covalent interactions in molecular property prediction and the effectiveness of Mol-GDL models.
Collapse
Affiliation(s)
- Cong Shen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410000, China; School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410000, China.
| | - Kelin Xia
- School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore.
| |
Collapse
|
130
|
Kazakova E, Lane TR, Jones T, Puhl AC, Riabova O, Makarov V, Ekins S. 1-Sulfonyl-3-amino-1 H-1,2,4-triazoles as Yellow Fever Virus Inhibitors: Synthesis and Structure-Activity Relationship. ACS OMEGA 2023; 8:42951-42965. [PMID: 38024733 PMCID: PMC10653066 DOI: 10.1021/acsomega.3c06106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 10/10/2023] [Accepted: 10/12/2023] [Indexed: 12/01/2023]
Abstract
Yellow fever virus (YFV) transmitted by infected mosquitoes causes an acute viral disease for which there are no approved small-molecule therapeutics. Our recently developed machine learning models for YFV inhibitors led to the selection of a new pyrazolesulfonamide derivative RCB16003 with acceptable in vitro activity. We report that the N-phenyl-1-(phenylsulfonyl)-1H-1,2,4-triazol-3-amine class, which was recently identified as active non-nucleoside reverse transcriptase inhibitors against HIV-1, can also be repositioned as inhibitors of yellow fever virus replication. As compared to other Flaviviridae or Togaviridae family viruses tested, both compounds RCB16003 and RCB16007 demonstrate selectivity for YFV over related viruses, with only RCB16007 showing some inhibition of the West Nile virus (EC50 7.9 μM, CC50 17 μM, SI 2.2). We also describe the absorption, distribution, metabolism, and excretion (ADME) in vitro and pharmacokinetics (PK) for RCB16007 in mice. This compound had previously been shown to not inhibit hERG, and we now describe that it has good metabolic stability in mouse and human liver microsomes, low levels of CYP inhibition, high protein binding, and no indication of efflux in Caco-2 cells. A single-dose oral PK study in mice has a T1/2 of 3.4 h and Cmax of 1190 ng/mL, suggesting good availability and stability. We now propose that the N-phenyl-1-(phenylsulfonyl)-1H-1,2,4-triazol-3-amine class may be prioritized for in vivo efficacy testing against YFV.
Collapse
Affiliation(s)
- Elena Kazakova
- Federal
Research Centre “Fundamentals of Biotechnology” of the
Russian Academy of Sciences (Research Centre of Biotechnology RAS), 33-2 Leninsky Prospect, 119071 Moscow, Russia
| | - Thomas R. Lane
- Collaborations
Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Thane Jones
- Collaborations
Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Ana C. Puhl
- Collaborations
Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Olga Riabova
- Federal
Research Centre “Fundamentals of Biotechnology” of the
Russian Academy of Sciences (Research Centre of Biotechnology RAS), 33-2 Leninsky Prospect, 119071 Moscow, Russia
| | - Vadim Makarov
- Federal
Research Centre “Fundamentals of Biotechnology” of the
Russian Academy of Sciences (Research Centre of Biotechnology RAS), 33-2 Leninsky Prospect, 119071 Moscow, Russia
| | - Sean Ekins
- Collaborations
Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| |
Collapse
|
131
|
Mastropietro A, Feldmann C, Bajorath J. Calculation of exact Shapley values for explaining support vector machine models using the radial basis function kernel. Sci Rep 2023; 13:19561. [PMID: 37949930 PMCID: PMC10638308 DOI: 10.1038/s41598-023-46930-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 11/07/2023] [Indexed: 11/12/2023] Open
Abstract
Machine learning (ML) algorithms are extensively used in pharmaceutical research. Most ML models have black-box character, thus preventing the interpretation of predictions. However, rationalizing model decisions is of critical importance if predictions should aid in experimental design. Accordingly, in interdisciplinary research, there is growing interest in explaining ML models. Methods devised for this purpose are a part of the explainable artificial intelligence (XAI) spectrum of approaches. In XAI, the Shapley value concept originating from cooperative game theory has become popular for identifying features determining predictions. The Shapley value concept has been adapted as a model-agnostic approach for explaining predictions. Since the computational time required for Shapley value calculations scales exponentially with the number of features used, local approximations such as Shapley additive explanations (SHAP) are usually required in ML. The support vector machine (SVM) algorithm is one of the most popular ML methods in pharmaceutical research and beyond. SVM models are often explained using SHAP. However, there is only limited correlation between SHAP and exact Shapley values, as previously demonstrated for SVM calculations using the Tanimoto kernel, which limits SVM model explanation. Since the Tanimoto kernel is a special kernel function mostly applied for assessing chemical similarity, we have developed the Shapley value-expressed radial basis function (SVERAD), a computationally efficient approach for the calculation of exact Shapley values for SVM models based upon radial basis function kernels that are widely applied in different areas. SVERAD is shown to produce meaningful explanations of SVM predictions.
Collapse
Affiliation(s)
- Andrea Mastropietro
- Department of Computer, Control and Management Engineering "Antonio Ruberti", Sapienza University of Rome, 00185, Rome, Italy
| | - Christian Feldmann
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, 53115, Bonn, Germany
| | - Jürgen Bajorath
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, 53115, Bonn, Germany.
| |
Collapse
|
132
|
Li N, Zhang R, Tang M, Zhao M, Jiang X, Cai X, Ye N, Su K, Peng J, Zhang X, Wu W, Ye H. Recent Progress and Prospects of Small Molecules for NLRP3 Inflammasome Inhibition. J Med Chem 2023; 66:14447-14473. [PMID: 37879043 DOI: 10.1021/acs.jmedchem.3c01370] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2023]
Abstract
NLRP3 inflammasome is a multiprotein complex involved in host immune response─which exerts various biological effects by mediating the maturation and secretion of IL-1β and IL-18─and pyroptosis. However, its aberrant activation could cause amplification of inflammatory effects, thereby triggering a range of ailments, including Alzheimer's disease, Parkinson's disease, rheumatoid arthritis, gout, type 2 diabetes mellitus, and cancer. For the past few years, as an attractive anti-inflammatory target, NLRP3-targeting small-molecule inhibitors have been widely reported by both the academic and the industrial communities. In order to deeply understand the advancement of NLRP3 inflammasome inhibitors, we provide comprehensive insights and commentary on drugs currently under clinical investigation, as well as other NLRP3 inflammasome inhibitors from a chemical structure point of view, with an aim to provide new insights for the further development of clinical drugs for NLRP3 inflammasome-mediated diseases.
Collapse
Affiliation(s)
- Na Li
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Ruijia Zhang
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Minghai Tang
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Min Zhao
- Laboratory of Metabolomics and Drug-Induced Liver Injury, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Xueqin Jiang
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Xiaoying Cai
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Neng Ye
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Kaiyue Su
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Jing Peng
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Xinlu Zhang
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Wenshuang Wu
- Division of Thyroid Surgery, Department of General Surgery and Laboratory of Thyroid and Parathyroid Disease, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Haoyu Ye
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| |
Collapse
|
133
|
Yan F, Jiang L, Ye F, Ping J, Bowley TY, Ness SA, Li CI, Marchetti D, Tang J, Guo Y. Deep neural network based tissue deconvolution of circulating tumor cell RNA. J Transl Med 2023; 21:783. [PMID: 37925448 PMCID: PMC10625696 DOI: 10.1186/s12967-023-04663-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 10/25/2023] [Indexed: 11/06/2023] Open
Abstract
Prior research has shown that the deconvolution of cell-free RNA can uncover the tissue origin. The conventional deconvolution approaches rely on constructing a reference tissue-specific gene panel, which cannot capture the inherent variation present in actual data. To address this, we have developed a novel method that utilizes a neural network framework to leverage the entire training dataset. Our approach involved training a model that incorporated 15 distinct tissue types. Through one semi-independent and two complete independent validations, including deconvolution using a semi in silico dataset, deconvolution with a custom normal tissue mixture RNA-seq data, and deconvolution of longitudinal circulating tumor cell RNA-seq (ctcRNA) data from a cancer patient with metastatic tumors, we demonstrate the efficacy and advantages of the deep-learning approach which were exerted by effectively capturing the inherent variability present in the dataset, thus leading to enhanced accuracy. Sensitivity analyses reveal that neural network models are less susceptible to the presence of missing data, making them more suitable for real-world applications. Moreover, by leveraging the concept of organotropism, we applied our approach to trace the migration of circulating tumor cell-derived RNA (ctcRNA) in a cancer patient with metastatic tumors, thereby highlighting the potential clinical significance of early detection of cancer metastasis.
Collapse
Affiliation(s)
- Fengyao Yan
- Department of Public Health and Sciences, Sylvester Comprehensive Cancer Center, University of Miami, Miami, FL, 33136, USA
- Department of Computer Science, University of South Carolina, Columbia, SC, 29208, USA
| | - Limin Jiang
- Department of Public Health and Sciences, Sylvester Comprehensive Cancer Center, University of Miami, Miami, FL, 33136, USA
| | - Fei Ye
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Jie Ping
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Tetiana Y Bowley
- Department of Internal Medicine, Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM, 87131, USA
| | - Scott A Ness
- Department of Internal Medicine, Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM, 87131, USA
| | - Chung-I Li
- Department of Statistics, National Cheng Kung University, Tainan, 701401, Taiwan
| | - Dario Marchetti
- Department of Internal Medicine, Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM, 87131, USA
| | - Jijun Tang
- Department of Computer Science, University of South Carolina, Columbia, SC, 29208, USA
| | - Yan Guo
- Department of Public Health and Sciences, Sylvester Comprehensive Cancer Center, University of Miami, Miami, FL, 33136, USA.
| |
Collapse
|
134
|
Huang D, Ye X, Zhang Y, Sakurai T. Collaborative analysis for drug discovery by federated learning on non-IID data. Methods 2023; 219:1-7. [PMID: 37689121 DOI: 10.1016/j.ymeth.2023.09.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 08/23/2023] [Accepted: 09/05/2023] [Indexed: 09/11/2023] Open
Abstract
With the increasing availability of large-scale QSAR (Quantitative Structure-Activity Relationship) datasets, collaborative analysis has become a promising approach for drug discovery. Traditional centralized analysis which typically concentrates data on a central server for training faces challenges such as data privacy and security. Distributed analysis such as federated learning offers a solution by enabling collaborative model training without sharing raw data. However, it may fail when the training data in the local devices are non-independent and identically distributed (non-IID). In this paper, we propose a novel framework for collaborative drug discovery using federated learning on non-IID datasets. We address the difficulty of training on non-IID data by globally sharing a small subset of data among all institutions. Our framework allows multiple institutions to jointly train a robust predictive model while preserving the privacy of their individual data. We leverage the federated learning paradigm to distribute the model training process across local devices, eliminating the need for data exchange. The experimental results on 15 benchmark datasets demonstrate that the proposed method achieves competitive predictive accuracy to centralized analysis while respecting data privacy. Moreover, our framework offers benefits such as reduced data transmission and enhanced scalability, making it suitable for large-scale collaborative drug discovery efforts.
Collapse
Affiliation(s)
- Dong Huang
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan.
| | - Ying Zhang
- Beidahuang Industry Group General Hospital, Harbin, China.
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| |
Collapse
|
135
|
Zhang W, Hu F, Li W, Yin P. Does protein pretrained language model facilitate the prediction of protein-ligand interaction? Methods 2023; 219:8-15. [PMID: 37690736 DOI: 10.1016/j.ymeth.2023.08.016] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 08/22/2023] [Accepted: 08/29/2023] [Indexed: 09/12/2023] Open
Abstract
Protein-ligand interaction (PLI) is a critical step for drug discovery. Recently, protein pretrained language models (PLMs) have showcased exceptional performance across a wide range of protein-related tasks. However, a significant heterogeneity exists between the PLM and PLI tasks, leading to a degree of uncertainty. In this study, we propose a method that quantitatively assesses the significance of protein PLMs in PLI prediction. Specifically, we analyze the performance of three widely-used protein PLMs (TAPE, ESM-1b, and ProtTrans) on three PLI tasks (PDBbind, Kinase, and DUD-E). The model with pre-training consistently achieves improved performance and decreased time cost, demonstrating that enhance both the accuracy and efficiency of PLI prediction. By quantitatively assessing the transferability, the optimal PLM for each PLI task is identified without the need for costly transfer experiments. Additionally, we examine the contributions of PLMs on the distribution of feature space, highlighting the improved discriminability after pre-training. Our findings provide insights into the mechanisms underlying PLMs in PLI prediction and pave the way for the design of more interpretable and accurate PLMs in the future. Code and data are freely available at https://github.com/brian-zZZ/PLM-PLI.
Collapse
Affiliation(s)
- Weihong Zhang
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Fan Hu
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.
| | - Wang Li
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Peng Yin
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.
| |
Collapse
|
136
|
Kashyap K, Mahapatra PP, Ahmed S, Buyukbingol E, Siddiqi MI. Identification of Potential Aldose Reductase Inhibitors Using Convolutional Neural Network-Based in Silico Screening. J Chem Inf Model 2023; 63:6261-6282. [PMID: 37788831 DOI: 10.1021/acs.jcim.3c00547] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Aldose reductase (ALR2) is a notable enzyme of the polyol pathway responsible for aggravating diabetic neuropathy complications. The first step begins when it catalyzes the reduction of glucose to sorbitol with NADPH as a coenzyme. Elevated concentrations of sorbitol damage the tissues, leading to complications like neuropathy. Though considerable effort has been pushed toward the successful discovery of potent inhibitors, its discovery still remains an elusive task. To this end, we present a 3D convolutional neural network (3D-CNN) based ALR2 inhibitor classification technique by dealing with snapshots of images captured from 3D chemical structures with multiple rotations as input data. The CNN-based architecture was trained on the 360 sets of image data along each axis and further prediction on the Maybridge library by each of the models. Subjecting the retrieved hits to molecular docking leads to the identification of the top 10 molecules with high binding affinity. The hits displayed a better blood-brain barrier penetration (BBB) score (90% with more than four scores) as compared to standard inhibitors (38%), reflecting the superior BBB penetrating efficiency of the hits. Followed by molecular docking, the biological evaluation spotlighted five compounds as promising ALR2 inhibitors and can be considered as a likely prospect for further structural optimization with medicinal chemistry efforts to improve their inhibition efficacy and consolidate them as new ALR2 antagonists in the future. In addition, the study also demonstrated the usefulness of scaffold analysis of the molecules as a method for investigating the significance of structurally diverse compounds in data-driven studies. For reproducibility and accessibility purposes, all of the source codes used in our study are publicly available.
Collapse
Affiliation(s)
- Kushagra Kashyap
- Biochemistry and Structural Biology Division, CSIR-Central Drug Research Institute, Sector 10, Jankipuram Extension, Sitapur Road, Lucknow 226031, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Pinaki Prasad Mahapatra
- Biochemistry and Structural Biology Division, CSIR-Central Drug Research Institute, Sector 10, Jankipuram Extension, Sitapur Road, Lucknow 226031, India
| | - Shakil Ahmed
- Biochemistry and Structural Biology Division, CSIR-Central Drug Research Institute, Sector 10, Jankipuram Extension, Sitapur Road, Lucknow 226031, India
| | - Erdem Buyukbingol
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Ankara University, 06100 Ankara, Turkey
| | - Mohammad Imran Siddiqi
- Biochemistry and Structural Biology Division, CSIR-Central Drug Research Institute, Sector 10, Jankipuram Extension, Sitapur Road, Lucknow 226031, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| |
Collapse
|
137
|
Deng J, Yang Z, Wang H, Ojima I, Samaras D, Wang F. A systematic study of key elements underlying molecular property prediction. Nat Commun 2023; 14:6395. [PMID: 37833262 PMCID: PMC10575948 DOI: 10.1038/s41467-023-41948-6] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 09/18/2023] [Indexed: 10/15/2023] Open
Abstract
Artificial intelligence (AI) has been widely applied in drug discovery with a major task as molecular property prediction. Despite booming techniques in molecular representation learning, key elements underlying molecular property prediction remain largely unexplored, which impedes further advancements in this field. Herein, we conduct an extensive evaluation of representative models using various representations on the MoleculeNet datasets, a suite of opioids-related datasets and two additional activity datasets from the literature. To investigate the predictive power in low-data and high-data space, a series of descriptors datasets of varying sizes are also assembled to evaluate the models. In total, we have trained 62,820 models, including 50,220 models on fixed representations, 4200 models on SMILES sequences and 8400 models on molecular graphs. Based on extensive experimentation and rigorous comparison, we show that representation learning models exhibit limited performance in molecular property prediction in most datasets. Besides, multiple key elements underlying molecular property prediction can affect the evaluation results. Furthermore, we show that activity cliffs can significantly impact model prediction. Finally, we explore into potential causes why representation learning models can fail and show that dataset size is essential for representation learning models to excel.
Collapse
Affiliation(s)
- Jianyuan Deng
- Stony Brook University, Department of Biomedical Informatics, Stony Brook, NY, 11794, USA
| | - Zhibo Yang
- Stony Brook University, Department of Computer Science, Stony Brook, NY, 11794, USA
| | - Hehe Wang
- Stony Brook University, Department of Chemistry, Stony Brook, NY, 11794, USA
| | - Iwao Ojima
- Stony Brook University, Department of Chemistry, Stony Brook, NY, 11794, USA
| | - Dimitris Samaras
- Stony Brook University, Department of Computer Science, Stony Brook, NY, 11794, USA
| | - Fusheng Wang
- Stony Brook University, Department of Biomedical Informatics, Stony Brook, NY, 11794, USA.
- Stony Brook University, Department of Computer Science, Stony Brook, NY, 11794, USA.
| |
Collapse
|
138
|
Betts R, Dierking I. Machine learning classification of polar sub-phases in liquid crystal MHPOBC. SOFT MATTER 2023; 19:7502-7512. [PMID: 37646209 DOI: 10.1039/d3sm00902e] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
Experimental polarising microscopy texture images of the fluid smectic phases and sub-phases of the classic liquid crystal MHPOBC were classified as paraelectric (SmA*), ferroelectric (SmC*), ferrielectric (SmC1/3*), and antiferroelectric (SmCA*) using convolutional neural networks, CNNs. Two neural network architectures were tested, a sequential convolutional neural network with varying numbers of layers and a simplified inception model with varying number of inception blocks. Both models are successful in binary classifications between different phases as well as classification between all four phases. Optimised architectures for the multi-phase classification achieved accuracies of (84 ± 2)% and (93 ± 1)% for sequential convolutional and inception networks, respectively. The results of this study contribute to the understanding of how CNNs may be used in classifying liquid crystal phases. Especially the inception model is of sufficient accuracy to allow automated characterization of liquid crystal phase sequences and thus opens a path towards an additional method to determine the phases of novel liquid crystals for applications in electro-optics, photonics or sensors. The outlined procedure of supervised machine learning can be applied to practically all liquid crystal phases and materials, provided the infrastructure of training data and computational power is provided.
Collapse
Affiliation(s)
- Rebecca Betts
- Department of Physics and Astronomy, University of Manchester, Oxford Road, Manchester M139PL, UK.
| | - Ingo Dierking
- Department of Physics and Astronomy, University of Manchester, Oxford Road, Manchester M139PL, UK.
| |
Collapse
|
139
|
AlFaraj Y, Mohapatra S, Shieh P, Husted KEL, Ivanoff DG, Lloyd EM, Cooper JC, Dai Y, Singhal AP, Moore JS, Sottos NR, Gomez-Bombarelli R, Johnson JA. A Model Ensemble Approach Enables Data-Driven Property Prediction for Chemically Deconstructable Thermosets in the Low-Data Regime. ACS CENTRAL SCIENCE 2023; 9:1810-1819. [PMID: 37780353 PMCID: PMC10540282 DOI: 10.1021/acscentsci.3c00502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Indexed: 10/03/2023]
Abstract
Thermosets present sustainability challenges that could potentially be addressed through the design of deconstructable variants with tunable properties; however, the combinatorial space of possible thermoset molecular building blocks (e.g., monomers, cross-linkers, and additives) and manufacturing conditions is vast, and predictive knowledge for how combinations of these molecular components translate to bulk thermoset properties is lacking. Data science could overcome these problems, but computational methods are difficult to apply to multicomponent, amorphous, statistical copolymer materials for which little data exist. Here, leveraging a data set with 101 examples, we introduce a closed-loop experimental, machine learning (ML), and virtual screening strategy to enable predictions of the glass transition temperature (Tg) of polydicyclopentadiene (pDCPD) thermosets containing cleavable bifunctional silyl ether (BSE) comonomers and/or cross-linkers with varied compositions and loadings. Molecular features and formulation variables are used as model inputs, and uncertainty is quantified through model ensembling, which together with heavy regularization helps to avoid overfitting and ultimately achieves predictions within <15 °C for thermosets with compositionally diverse BSEs. This work offers a path to predicting the properties of thermosets based on their molecular building blocks, which may accelerate the discovery of promising plastics, rubbers, and composites with improved functionality and controlled deconstructability.
Collapse
Affiliation(s)
- Yasmeen
S. AlFaraj
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States of America
| | - Somesh Mohapatra
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States of America
| | - Peyton Shieh
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States of America
| | - Keith E. L. Husted
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States of America
| | - Douglass G. Ivanoff
- Department
of Materials Science and Engineering, University
of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States of America
- The
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States
of America
| | - Evan M. Lloyd
- The
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States
of America
- Department
of Chemistry, University of Illinois at
Urbana—Champaign, Urbana, Illinois 61801, United States of America
| | - Julian C. Cooper
- The
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States
of America
- Department
of Chemistry, University of Illinois at
Urbana—Champaign, Urbana, Illinois 61801, United States of America
| | - Yutong Dai
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States of America
| | - Avni P. Singhal
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States of America
| | - Jeffrey S. Moore
- Department
of Materials Science and Engineering, University
of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States of America
- The
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States
of America
| | - Nancy R. Sottos
- Department
of Materials Science and Engineering, University
of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States of America
- The
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana—Champaign, Urbana, Illinois 61801, United States
of America
| | - Rafael Gomez-Bombarelli
- Department
of Materials Science and Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States of America
| | - Jeremiah A. Johnson
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States of America
| |
Collapse
|
140
|
Sauer S, Matter H, Hessler G, Grebner C. Integrating Reaction Schemes, Reagent Databases, and Virtual Libraries into Fragment-Based Design by Reinforcement Learning. J Chem Inf Model 2023; 63:5709-5726. [PMID: 37668352 DOI: 10.1021/acs.jcim.3c00735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2023]
Abstract
Lead optimization supported by artificial intelligence (AI)-based generative models has become increasingly important in drug design. Success factors are reagent availability, novelty, and the optimization of multiple properties. Directed fragment-replacement is particularly attractive, as it mimics medicinal chemistry tactics. Here, we present variations of fragment-based reinforcement learning using an actor-critic model. Novel features include freezing fragments and using reagents as the fragment source. Splitting molecules according to reaction schemes improves synthesizability, while tuning network output probabilities allows us to balance novelty versus diversity. Combining fragment-based optimization with virtual library encodings allows the exploration of large chemical spaces with synthesizable ideas. Collectively, these enhancements influence design toward high-quality molecules with favorable profiles. A validation study using 15 pharmaceutically relevant targets reveals that novel structures are obtained for most cases, which are identical or related to independent validation sets for each target. Hence, these modifications significantly increase the value of fragment-based reinforcement learning for drug design. The code is available on GitHub: https://github.com/Sanofi-Public/IDD-papers-fragrl.
Collapse
Affiliation(s)
- Susanne Sauer
- Synthetic Molecular Design, Integrated Drug Discovery, Sanofi-Aventis Deutschland GmbH, 65926 Frankfurt am Main, Germany
| | - Hans Matter
- Synthetic Molecular Design, Integrated Drug Discovery, Sanofi-Aventis Deutschland GmbH, 65926 Frankfurt am Main, Germany
| | - Gerhard Hessler
- Synthetic Molecular Design, Integrated Drug Discovery, Sanofi-Aventis Deutschland GmbH, 65926 Frankfurt am Main, Germany
| | - Christoph Grebner
- Synthetic Molecular Design, Integrated Drug Discovery, Sanofi-Aventis Deutschland GmbH, 65926 Frankfurt am Main, Germany
| |
Collapse
|
141
|
Lee M, Min K. AmorProt: Amino Acid Molecular Fingerprints Repurposing-Based Protein Fingerprint. Biochemistry 2023; 62:2700-2709. [PMID: 37622182 DOI: 10.1021/acs.biochem.3c00253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/26/2023]
Abstract
As protein therapeutics play an important role in almost all medical fields, numerous studies have been conducted on proteins using artificial intelligence. Artificial intelligence has enabled data-driven predictions without the need for expensive experiments. Nevertheless, unlike the various molecular fingerprint algorithms that have been developed, protein fingerprint algorithms have rarely been studied. In this study, we proposed the amino acid molecular fingerprints repurposing-based protein (AmorProt) fingerprint, a protein sequence representation method that effectively uses the molecular fingerprints corresponding to 20 amino acids. Subsequently, the performances of the tree-based machine learning and artificial neural network models were compared using (1) amyloid classification and (2) isoelectric point regression. Finally, the applicability and advantages of the developed platform were demonstrated through a case study and the following experiments: (3) comparison of dataset dependence with feature-based methods, (4) feature importance analysis, and (5) protein space analysis. Consequently, the significantly improved model performance and data-set-independent versatility of the AmorProt fingerprint were verified. The results revealed that the current protein representation method can be applied to various fields related to proteins, such as predicting their fundamental properties or interaction with ligands.
Collapse
Affiliation(s)
- Myeonghun Lee
- School of Systems Biomedical Science, Soongsil University, 369 Sangdo-ro, Dongjak-gu, Seoul 06978, Republic of Korea
| | - Kyoungmin Min
- School of Mechanical Engineering, Soongsil University, 369 Sangdo-ro, Dongjak-gu, Seoul 06978, Republic of Korea
| |
Collapse
|
142
|
Mostafa S, Mondal D, Panjvani K, Kochian L, Stavness I. Explainable deep learning in plant phenotyping. Front Artif Intell 2023; 6:1203546. [PMID: 37795496 PMCID: PMC10546035 DOI: 10.3389/frai.2023.1203546] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 08/25/2023] [Indexed: 10/06/2023] Open
Abstract
The increasing human population and variable weather conditions, due to climate change, pose a threat to the world's food security. To improve global food security, we need to provide breeders with tools to develop crop cultivars that are more resilient to extreme weather conditions and provide growers with tools to more effectively manage biotic and abiotic stresses in their crops. Plant phenotyping, the measurement of a plant's structural and functional characteristics, has the potential to inform, improve and accelerate both breeders' selections and growers' management decisions. To improve the speed, reliability and scale of plant phenotyping procedures, many researchers have adopted deep learning methods to estimate phenotypic information from images of plants and crops. Despite the successful results of these image-based phenotyping studies, the representations learned by deep learning models remain difficult to interpret, understand, and explain. For this reason, deep learning models are still considered to be black boxes. Explainable AI (XAI) is a promising approach for opening the deep learning model's black box and providing plant scientists with image-based phenotypic information that is interpretable and trustworthy. Although various fields of study have adopted XAI to advance their understanding of deep learning models, it has yet to be well-studied in the context of plant phenotyping research. In this review article, we reviewed existing XAI studies in plant shoot phenotyping, as well as related domains, to help plant researchers understand the benefits of XAI and make it easier for them to integrate XAI into their future studies. An elucidation of the representations within a deep learning model can help researchers explain the model's decisions, relate the features detected by the model to the underlying plant physiology, and enhance the trustworthiness of image-based phenotypic information used in food production systems.
Collapse
Affiliation(s)
- Sakib Mostafa
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
| | - Debajyoti Mondal
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
| | - Karim Panjvani
- Global Institute for Food Security, University of Saskatchewan, Saskatoon, SK, Canada
| | - Leon Kochian
- Global Institute for Food Security, University of Saskatchewan, Saskatoon, SK, Canada
| | - Ian Stavness
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
| |
Collapse
|
143
|
Niazi SK. The Coming of Age of AI/ML in Drug Discovery, Development, Clinical Testing, and Manufacturing: The FDA Perspectives. Drug Des Devel Ther 2023; 17:2691-2725. [PMID: 37701048 PMCID: PMC10493153 DOI: 10.2147/dddt.s424991] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 08/24/2023] [Indexed: 09/14/2023] Open
Abstract
Artificial intelligence (AI) and machine learning (ML) represent significant advancements in computing, building on technologies that humanity has developed over millions of years-from the abacus to quantum computers. These tools have reached a pivotal moment in their development. In 2021 alone, the U.S. Food and Drug Administration (FDA) received over 100 product registration submissions that heavily relied on AI/ML for applications such as monitoring and improving human performance in compiling dossiers. To ensure the safe and effective use of AI/ML in drug discovery and manufacturing, the FDA and numerous other U.S. federal agencies have issued continuously updated, stringent guidelines. Intriguingly, these guidelines are often generated or updated with the aid of AI/ML tools themselves. The overarching goal is to expedite drug discovery, enhance the safety profiles of existing drugs, introduce novel treatment modalities, and improve manufacturing compliance and robustness. Recent FDA publications offer an encouraging outlook on the potential of these tools, emphasizing the need for their careful deployment. This has expanded market opportunities for retraining personnel handling these technologies and enabled innovative applications in emerging therapies such as gene editing, CRISPR-Cas9, CAR-T cells, mRNA-based treatments, and personalized medicine. In summary, the maturation of AI/ML technologies is a testament to human ingenuity. Far from being autonomous entities, these are tools created by and for humans designed to solve complex problems now and in the future. This paper aims to present the status of these technologies, along with examples of their present and future applications.
Collapse
|
144
|
Sun J, Xu M, Ru J, James-Bott A, Xiong D, Wang X, Cribbs AP. Small molecule-mediated targeting of microRNAs for drug discovery: Experiments, computational techniques, and disease implications. Eur J Med Chem 2023; 257:115500. [PMID: 37262996 PMCID: PMC11554572 DOI: 10.1016/j.ejmech.2023.115500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/05/2023] [Accepted: 05/15/2023] [Indexed: 06/03/2023]
Abstract
Small molecules have been providing medical breakthroughs for human diseases for more than a century. Recently, identifying small molecule inhibitors that target microRNAs (miRNAs) has gained importance, despite the challenges posed by labour-intensive screening experiments and the significant efforts required for medicinal chemistry optimization. Numerous experimentally-verified cases have demonstrated the potential of miRNA-targeted small molecule inhibitors for disease treatment. This new approach is grounded in their posttranscriptional regulation of the expression of disease-associated genes. Reversing dysregulated gene expression using this mechanism may help control dysfunctional pathways. Furthermore, the ongoing improvement of algorithms has allowed for the integration of computational strategies built on top of laboratory-based data, facilitating a more precise and rational design and discovery of lead compounds. To complement the use of extensive pharmacogenomics data in prioritising potential drugs, our previous work introduced a computational approach based on only molecular sequences. Moreover, various computational tools for predicting molecular interactions in biological networks using similarity-based inference techniques have been accumulated in established studies. However, there are a limited number of comprehensive reviews covering both computational and experimental drug discovery processes. In this review, we outline a cohesive overview of both biological and computational applications in miRNA-targeted drug discovery, along with their disease implications and clinical significance. Finally, utilizing drug-target interaction (DTIs) data from DrugBank, we showcase the effectiveness of deep learning for obtaining the physicochemical characterization of DTIs.
Collapse
Affiliation(s)
- Jianfeng Sun
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK.
| | - Miaoer Xu
- Department of Biology, Emory University, Atlanta, GA, 30322, USA
| | - Jinlong Ru
- Chair of Prevention of Microbial Diseases, School of Life Sciences Weihenstephan, Technical University of Munich, Freising, 85354, Germany
| | - Anna James-Bott
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK
| | - Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Xia Wang
- College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China.
| | - Adam P Cribbs
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK.
| |
Collapse
|
145
|
Kanev GK, Zhang Y, Kooistra AJ, Bender A, Leurs R, Bailey D, Würdinger T, de Graaf C, de Esch IJP, Westerman BA. Predicting the target landscape of kinase inhibitors using 3D convolutional neural networks. PLoS Comput Biol 2023; 19:e1011301. [PMID: 37669273 PMCID: PMC10508635 DOI: 10.1371/journal.pcbi.1011301] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 09/19/2023] [Accepted: 06/25/2023] [Indexed: 09/07/2023] Open
Abstract
Many therapies in clinical trials are based on single drug-single target relationships. To further extend this concept to multi-target approaches using multi-targeted drugs, we developed a machine learning pipeline to unravel the target landscape of kinase inhibitors. This pipeline, which we call 3D-KINEssence, uses a new type of protein fingerprints (3D FP) based on the structure of kinases generated through a 3D convolutional neural network (3D-CNN). These 3D-CNN kinase fingerprints were matched to molecular Morgan fingerprints to predict the targets of each respective kinase inhibitor based on available bioactivity data. The performance of the pipeline was evaluated on two test sets: a sparse drug-target set where each drug is matched in most cases to a single target and also on a densely-covered drug-target set where each drug is matched to most if not all targets. This latter set is more challenging to train, given its non-exclusive character. Our model's root-mean-square error (RMSE) based on the two datasets was 0.68 and 0.8, respectively. These results indicate that 3D FP can predict the target landscape of kinase inhibitors at around 0.8 log units of bioactivity. Our strategy can be utilized in proteochemometric or chemogenomic workflows by consolidating the target landscape of kinase inhibitors.
Collapse
Affiliation(s)
- Georgi K. Kanev
- Division of Medicinal Chemistry, Amsterdam Institute of Molecular and Life Sciences (AIMMS), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Department of Neurosurgery, Amsterdam University Medical Centers, Cancer Center Amsterdam, Brain Tumor Center Amsterdam, Amsterdam, The Netherlands
| | - Yaran Zhang
- Department of Neurosurgery, Amsterdam University Medical Centers, Cancer Center Amsterdam, Brain Tumor Center Amsterdam, Amsterdam, The Netherlands
| | - Albert J. Kooistra
- Division of Medicinal Chemistry, Amsterdam Institute of Molecular and Life Sciences (AIMMS), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark
| | - Andreas Bender
- Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | - Rob Leurs
- Division of Medicinal Chemistry, Amsterdam Institute of Molecular and Life Sciences (AIMMS), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - David Bailey
- The WINDOW consortium, www.window-consortium.org
- IOTA Pharmaceuticals Ltd, St Johns Innovation Centre, Cambridge, United Kingdom
| | - Thomas Würdinger
- Department of Neurosurgery, Amsterdam University Medical Centers, Cancer Center Amsterdam, Brain Tumor Center Amsterdam, Amsterdam, The Netherlands
- The WINDOW consortium, www.window-consortium.org
| | - Chris de Graaf
- Division of Medicinal Chemistry, Amsterdam Institute of Molecular and Life Sciences (AIMMS), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Iwan J. P. de Esch
- Division of Medicinal Chemistry, Amsterdam Institute of Molecular and Life Sciences (AIMMS), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Bart A. Westerman
- Department of Neurosurgery, Amsterdam University Medical Centers, Cancer Center Amsterdam, Brain Tumor Center Amsterdam, Amsterdam, The Netherlands
- The WINDOW consortium, www.window-consortium.org
| |
Collapse
|
146
|
Kate A, Seth E, Singh A, Chakole CM, Chauhan MK, Singh RK, Maddalwar S, Mishra M. Artificial Intelligence for Computer-Aided Drug Discovery. Drug Res (Stuttg) 2023; 73:369-377. [PMID: 37276884 DOI: 10.1055/a-2076-3359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The continuous implementation of Artificial Intelligence (AI) in multiple scientific domains and the rapid advancement in computer software and hardware, along with other parameters, have rapidly fuelled this development. The technology can contribute effectively in solving many challenges and constraints in the traditional development of the drug. Traditionally, large-scale chemical libraries are screened to find one promising medicine. In recent years, more reasonable structure-based drug design approaches have avoided the first screening phases while still requiring chemists to design, synthesize, and test a wide range of compounds to produce possible novel medications. The process of turning a promising chemical into a medicinal candidate can be expensive and time-consuming. Additionally, a new medication candidate may still fail in clinical trials even after demonstrating promise in laboratory research. In fact, less than 10% of medication candidates that undergo Phase I trials really reach the market. As a consequence, the unmatched data processing power of AI systems may expedite and enhance the drug development process in four different ways: by opening up links to novel biological systems, superior or distinctive chemistry, greater success rates, and faster and less expensive innovation trials. Since these technologies may be used to address a variety of discovery scenarios and biological targets, it is essential to comprehend and distinguish between use cases. As a result, we have emphasized how AI may be used in a variety of areas of the pharmaceutical sciences, including in-depth opportunities for drug research and development.
Collapse
Affiliation(s)
- Aditya Kate
- Amity Institute of Biotechnology, Amity University, Chhattisgarh, India
| | - Ekkita Seth
- Amity Institute of Biotechnology, Amity University, Chhattisgarh, India
| | - Ananya Singh
- Amity Institute of Biotechnology, Amity University, Chhattisgarh, India
| | - Chandrashekhar Mahadeo Chakole
- Bajiraoji Karanjekar college of Pharmacy, Sakoli, Dist-Bhandara, India
- NDDS Research Lab, Delhi Institute of Pharmaceutical Sciences and Research, DPSR-University, New Delhi
| | - Meenakshi Kanwar Chauhan
- NDDS Research Lab, Delhi Institute of Pharmaceutical Sciences and Research, DPSR-University, New Delhi
| | - Ravi Kant Singh
- Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India
| | | | - Mohit Mishra
- Amity Institute of Biotechnology, Amity University, Chhattisgarh, India
| |
Collapse
|
147
|
Santosh KC, GhoshRoy D, Nakarmi S. A Systematic Review on Deep Structured Learning for COVID-19 Screening Using Chest CT from 2020 to 2022. Healthcare (Basel) 2023; 11:2388. [PMID: 37685422 PMCID: PMC10486542 DOI: 10.3390/healthcare11172388] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 08/16/2023] [Accepted: 08/22/2023] [Indexed: 09/10/2023] Open
Abstract
The emergence of the COVID-19 pandemic in Wuhan in 2019 led to the discovery of a novel coronavirus. The World Health Organization (WHO) designated it as a global pandemic on 11 March 2020 due to its rapid and widespread transmission. Its impact has had profound implications, particularly in the realm of public health. Extensive scientific endeavors have been directed towards devising effective treatment strategies and vaccines. Within the healthcare and medical imaging domain, the application of artificial intelligence (AI) has brought significant advantages. This study delves into peer-reviewed research articles spanning the years 2020 to 2022, focusing on AI-driven methodologies for the analysis and screening of COVID-19 through chest CT scan data. We assess the efficacy of deep learning algorithms in facilitating decision making processes. Our exploration encompasses various facets, including data collection, systematic contributions, emerging techniques, and encountered challenges. However, the comparison of outcomes between 2020 and 2022 proves intricate due to shifts in dataset magnitudes over time. The initiatives aimed at developing AI-powered tools for the detection, localization, and segmentation of COVID-19 cases are primarily centered on educational and training contexts. We deliberate on their merits and constraints, particularly in the context of necessitating cross-population train/test models. Our analysis encompassed a review of 231 research publications, bolstered by a meta-analysis employing search keywords (COVID-19 OR Coronavirus) AND chest CT AND (deep learning OR artificial intelligence OR medical imaging) on both the PubMed Central Repository and Web of Science platforms.
Collapse
Affiliation(s)
- KC Santosh
- 2AI: Applied Artificial Intelligence Research Lab, Vermillion, SD 57069, USA
| | - Debasmita GhoshRoy
- School of Automation, Banasthali Vidyapith, Tonk 304022, Rajasthan, India;
| | - Suprim Nakarmi
- Department of Computer Science, University of South Dakota, Vermillion, SD 57069, USA;
| |
Collapse
|
148
|
Miao Y, Ma H, Huang J. Recent Advances in Toxicity Prediction: Applications of Deep Graph Learning. Chem Res Toxicol 2023; 36:1206-1226. [PMID: 37562046 DOI: 10.1021/acs.chemrestox.2c00384] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023]
Abstract
The development of new drugs is time-consuming and expensive, and as such, accurately predicting the potential toxicity of a drug candidate is crucial in ensuring its safety and efficacy. Recently, deep graph learning has become prevalent in this field due to its computational power and cost efficiency. Many novel deep graph learning methods aid toxicity prediction and further prompt drug development. This review aims to connect fundamental knowledge with burgeoning deep graph learning methods. We first summarize the essential components of deep graph learning models for toxicity prediction, including molecular descriptors, molecular representations, evaluation metrics, validation methods, and data sets. Furthermore, based on various graph-related representations of molecules, we introduce several representative studies and methods for toxicity prediction from the perspective of GNN architectures and graph pretrained models. Compared to other types of models, deep graph models not only advance in higher accuracy and efficiency but also provide more intuitive insights, which is significant in the development of model interpretation and generalization ability. The graph pretrained models are emerging as they can extract prominent features from large-scale unlabeled molecular graph data and improve the performance of downstream toxicity prediction tasks. We hope this survey can serve as a handbook for individuals interested in exploring deep graph learning for toxicity prediction.
Collapse
Affiliation(s)
- Yuwei Miao
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas 76019, United States
| | - Hehuan Ma
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas 76019, United States
| | - Junzhou Huang
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas 76019, United States
| |
Collapse
|
149
|
Zheng Z, Zhang O, Borgs C, Chayes JT, Yaghi OM. ChatGPT Chemistry Assistant for Text Mining and the Prediction of MOF Synthesis. J Am Chem Soc 2023; 145:18048-18062. [PMID: 37548379 PMCID: PMC11073615 DOI: 10.1021/jacs.3c05819] [Citation(s) in RCA: 91] [Impact Index Per Article: 45.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
We use prompt engineering to guide ChatGPT in the automation of text mining of metal-organic framework (MOF) synthesis conditions from diverse formats and styles of the scientific literature. This effectively mitigates ChatGPT's tendency to hallucinate information, an issue that previously made the use of large language models (LLMs) in scientific fields challenging. Our approach involves the development of a workflow implementing three different processes for text mining, programmed by ChatGPT itself. All of them enable parsing, searching, filtering, classification, summarization, and data unification with different trade-offs among labor, speed, and accuracy. We deploy this system to extract 26 257 distinct synthesis parameters pertaining to approximately 800 MOFs sourced from peer-reviewed research articles. This process incorporates our ChemPrompt Engineering strategy to instruct ChatGPT in text mining, resulting in impressive precision, recall, and F1 scores of 90-99%. Furthermore, with the data set built by text mining, we constructed a machine-learning model with over 87% accuracy in predicting MOF experimental crystallization outcomes and preliminarily identifying important factors in MOF crystallization. We also developed a reliable data-grounded MOF chatbot to answer questions about chemical reactions and synthesis procedures. Given that the process of using ChatGPT reliably mines and tabulates diverse MOF synthesis information in a unified format while using only narrative language requiring no coding expertise, we anticipate that our ChatGPT Chemistry Assistant will be very useful across various other chemistry subdisciplines.
Collapse
Affiliation(s)
- Zhiling Zheng
- Department of Chemistry, University of California, Berkeley, California 94720, United States
- Kavli Energy Nanoscience Institute, University of California, Berkeley, California 94720, United States
- Bakar Institute of Digital Materials for the Planet, College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, United States
| | | | - Christian Borgs
- Bakar Institute of Digital Materials for the Planet, College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, United States
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California 94720, United States
| | - Jennifer T Chayes
- Bakar Institute of Digital Materials for the Planet, College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, United States
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California 94720, United States
- Department of Mathematics, University of California, Berkeley, California 94720, United States
- Department of Statistics, University of California, Berkeley, California 94720, United States
- School of Information, University of California, Berkeley, California 94720, United States
| | - Omar M Yaghi
- Department of Chemistry, University of California, Berkeley, California 94720, United States
- Kavli Energy Nanoscience Institute, University of California, Berkeley, California 94720, United States
- Bakar Institute of Digital Materials for the Planet, College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, United States
- KACST-UC Berkeley Center of Excellence for Nanomaterials for Clean Energy Applications, King Abdulaziz City for Science and Technology, Riyadh 11442, Saudi Arabia
| |
Collapse
|
150
|
Elkashlan M, Ahmad RM, Hajar M, Al Jasmi F, Corchado JM, Nasarudin NA, Mohamad MS. A review of SARS-CoV-2 drug repurposing: databases and machine learning models. Front Pharmacol 2023; 14:1182465. [PMID: 37601065 PMCID: PMC10436567 DOI: 10.3389/fphar.2023.1182465] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 07/06/2023] [Indexed: 08/22/2023] Open
Abstract
The emergence of Severe Acute Respiratory Syndrome Corona Virus 2 (SARS-CoV-2) posed a serious worldwide threat and emphasized the urgency to find efficient solutions to combat the spread of the virus. Drug repurposing has attracted more attention than traditional approaches due to its potential for a time- and cost-effective discovery of new applications for the existing FDA-approved drugs. Given the reported success of machine learning (ML) in virtual drug screening, it is warranted as a promising approach to identify potential SARS-CoV-2 inhibitors. The implementation of ML in drug repurposing requires the presence of reliable digital databases for the extraction of the data of interest. Numerous databases archive research data from studies so that it can be used for different purposes. This article reviews two aspects: the frequently used databases in ML-based drug repurposing studies for SARS-CoV-2, and the recent ML models that have been developed for the prospective prediction of potential inhibitors against the new virus. Both types of ML models, Deep Learning models and conventional ML models, are reviewed in terms of introduction, methodology, and its recent applications in the prospective predictions of SARS-CoV-2 inhibitors. Furthermore, the features and limitations of the databases are provided to guide researchers in choosing suitable databases according to their research interests.
Collapse
Affiliation(s)
- Marim Elkashlan
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Al Ain, United Arab Emirates
| | - Rahaf M Ahmad
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Al Ain, United Arab Emirates
| | - Malak Hajar
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Al Ain, United Arab Emirates
| | - Fatma Al Jasmi
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Al Ain, United Arab Emirates
- Division of Metabolic Genetics, Department of Pediatrics, Tawam Hospital, Al Ain, United Arab Emirates
| | - Juan Manuel Corchado
- Departamento de Informática y Automática, Facultad de Ciencias, Grupo de Investigación BISITE, Instituto de Investigación Biomédica de Salamanca, University of Salamanca, Salamanca, Spain
| | - Nurul Athirah Nasarudin
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Al Ain, United Arab Emirates
| | - Mohd Saberi Mohamad
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Al Ain, United Arab Emirates
| |
Collapse
|