1
|
Yang Z, Wang K, Zhang G, Jiang Y, Zeng R, Qiao J, Li Y, Deng X, Xia Z, Yao R, Zeng X, Zhang L, Zhao Y, Lei J, Chen R. A deep learning model for structure-based bioactivity optimization and its application in the bioactivity optimization of a SARS-CoV-2 main protease inhibitor. Eur J Med Chem 2025; 291:117602. [PMID: 40239482 DOI: 10.1016/j.ejmech.2025.117602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Revised: 04/02/2025] [Accepted: 04/03/2025] [Indexed: 04/18/2025]
Abstract
Bioactivity optimization is a crucial and technical task in the early stages of drug discovery, traditionally carried out through iterative substituent optimization, a process that is often both time-consuming and expensive. To address this challenge, we present Pocket-StrMod, a deep-learning model tailored for structure-based bioactivity optimization. Pocket-StrMod employs an autoregressive flow-based architecture, optimizing molecules within a specific protein binding pocket while explicitly incorporating chemical expertise. It synchronously optimizes all substituents by generating atoms and covalent bonds at designated sites within a molecular scaffold nestled inside a protein pocket. We applied this model to optimize the bioactivity of Hit1, an inhibitor of the SARS-CoV-2 main protease (Mpro) with initially poor bioactivity (IC50 : 34.56 μM). Following two rounds of optimization, six compounds were selected for synthesis and bioactivity testing. This led to the discovery of C5, a potent compound with an IC50 value of 33.6 nM, marking a remarkable 1028-fold improvement over Hit1. Furthermore, C5 demonstrated promising in vitro antiviral activity against SARS-CoV-2. Collectively, these findings underscore the great potential of deep learning in facilitating rapid and cost-effective bioactivity optimization in the early phases of drug development.
Collapse
Affiliation(s)
- Zhenyu Yang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Kai Wang
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Guo Zhang
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Yuanyuan Jiang
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Rui Zeng
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Jingxin Qiao
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Yueyue Li
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Xinyue Deng
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Ziyi Xia
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Rui Yao
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Xiaoxi Zeng
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Liyun Zhang
- Lead Generation Unit, HitGen Inc., Tianfu International Bio-Town, Shuangliu District, Chengdu, Sichuan, 610200, China
| | - Yi Zhao
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.
| | - Jian Lei
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China; National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China.
| | - Runsheng Chen
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China; Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
| |
Collapse
|
2
|
Piazza L, Srinivasan S, Tuccinardi T, Bajorath J. Transforming molecular cores, substituents, and combinations into structurally diverse compounds using chemical language models. Eur J Med Chem 2025; 291:117615. [PMID: 40222164 DOI: 10.1016/j.ejmech.2025.117615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2025] [Revised: 03/19/2025] [Accepted: 04/07/2025] [Indexed: 04/15/2025]
Abstract
Transformer-based chemical language models (CLMs) were derived to generate structurally and topologically diverse embeddings of core structure fragments, substituents, or core/substituent combinations in chemically proper compounds, representing a design task that is difficult to address using conventional structure generation methods. To this end, CLM variants were challenged to learn different fragment-to-compound mappings in the absence of structural rules or any other fragment linking or synthetic information. The resulting alternative models were found to have high syntactic fidelity, but displayed notable differences in their ability to generate valid candidate compounds containing test fragments, with a clear preference for a model variant processing core/substituent combinations. However, the majority of valid candidate compounds generated with all models were distinct from training data and structurally novel. In addition, the CLMs exhibited high chemical diversification capacity and often generated structures with new topologies not encountered during training. Furthermore, all models produced large numbers of close structural analogues of known bioactive compounds covering a large target space, thus indicating the relevance of newly generated candidates for pharmaceutical research. As a part of our study, the new methodology and all data are made publicly available.
Collapse
Affiliation(s)
- Lisa Piazza
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, D-53115, Bonn, Germany; Department of Pharmacy, University of Pisa, Via Bonanno 6, 56126, Pisa, Italy
| | - Sanjana Srinivasan
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, D-53115, Bonn, Germany; Lamarr Institute for Machine Learning and Artificial Intelligence, Rheinische Friedrich-Wilhelms-Universität Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115, Bonn, Germany
| | - Tiziano Tuccinardi
- Department of Pharmacy, University of Pisa, Via Bonanno 6, 56126, Pisa, Italy
| | - Jürgen Bajorath
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, D-53115, Bonn, Germany; Lamarr Institute for Machine Learning and Artificial Intelligence, Rheinische Friedrich-Wilhelms-Universität Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115, Bonn, Germany.
| |
Collapse
|
3
|
Sheikholeslami M, Mazrouei N, Gheisari Y, Fasihi A, Irajpour M, Motahharynia A. DrugGen enhances drug discovery with large language models and reinforcement learning. Sci Rep 2025; 15:13445. [PMID: 40251288 PMCID: PMC12008224 DOI: 10.1038/s41598-025-98629-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2025] [Accepted: 04/14/2025] [Indexed: 04/20/2025] Open
Abstract
Traditional drug design faces significant challenges due to inherent chemical and biological complexities, often resulting in high failure rates in clinical trials. Deep learning advancements, particularly generative models, offer potential solutions to these challenges. One promising algorithm is DrugGPT, a transformer-based model, that generates small molecules for input protein sequences. Although promising, it generates both chemically valid and invalid structures and does not incorporate the features of approved drugs, resulting in time-consuming and inefficient drug discovery. To address these issues, we introduce DrugGen, an enhanced model based on the DrugGPT structure. DrugGen is fine-tuned on approved drug-target interactions and optimized with proximal policy optimization. By giving reward feedback from protein-ligand binding affinity prediction using pre-trained transformers (PLAPT) and a customized invalid structure assessor, DrugGen significantly improves performance. Evaluation across multiple targets demonstrated that DrugGen achieves 100% valid structure generation compared to 95.5% with DrugGPT and produced molecules with higher predicted binding affinities (7.22 [6.30-8.07]) compared to DrugGPT (5.81 [4.97-6.63]) while maintaining diversity and novelty. Docking simulations further validate its ability to generate molecules targeting binding sites effectively. For example, in the case of fatty acid-binding protein 5 (FABP5), DrugGen generated molecules with superior docking scores (FABP5/11, -9.537 and FABP5/5, -8.399) compared to the reference molecule (Palmitic acid, -6.177). Beyond lead compound generation, DrugGen also shows potential for drug repositioning and creating novel pharmacophores for existing targets. By producing high-quality small molecules, DrugGen provides a high-performance medium for advancing pharmaceutical research and drug discovery.
Collapse
Affiliation(s)
- Mahsa Sheikholeslami
- Regenerative Medicine Research Center, Isfahan University of Medical Sciences, Isfahan, 81746 73461, Iran
- Department of Medicinal Chemistry, School of Pharmacy, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Navid Mazrouei
- Regenerative Medicine Research Center, Isfahan University of Medical Sciences, Isfahan, 81746 73461, Iran
| | - Yousof Gheisari
- Regenerative Medicine Research Center, Isfahan University of Medical Sciences, Isfahan, 81746 73461, Iran
- Department of Genetics and Molecular Biology, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Afshin Fasihi
- Department of Medicinal Chemistry, School of Pharmacy, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Matin Irajpour
- Regenerative Medicine Research Center, Isfahan University of Medical Sciences, Isfahan, 81746 73461, Iran.
- Isfahan Cardiovascular Research Center, Cardiovascular Research Institute, Isfahan University of Medical Sciences, Isfahan, Iran.
| | - Ali Motahharynia
- Regenerative Medicine Research Center, Isfahan University of Medical Sciences, Isfahan, 81746 73461, Iran.
- Isfahan Neuroscience Research Center, Isfahan University of Medical Sciences, Isfahan, Iran.
| |
Collapse
|
4
|
Qin R, Zhang H, Huang W, Shao Z, Lei J. Deep learning-based design and screening of benzimidazole-pyrazine derivatives as adenosine A 2B receptor antagonists. J Biomol Struct Dyn 2025; 43:3225-3241. [PMID: 38133953 DOI: 10.1080/07391102.2023.2295974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Accepted: 12/11/2023] [Indexed: 12/24/2023]
Abstract
The Adenosine A2B receptor (A2BAR) is considered a novel potential target for the immunotherapy of cancer, and A2BAR antagonists have an inhibitory effect on tumor growth, proliferation, and metastasis. In our previous studies, we identified a class of benzimidazole-pyrazine scaffolds whose derivatives exhibited the antagonistic effect but lacked subtype selectivity towards A2BAR. In this work, we developed a scaffold-based protocol that incorporates a deep generative model and multilayer virtual screening to design benzimidazole-pyrazine derivatives as potential selective A2BAR antagonists. By utilizing a generative model with reported A2BAR antagonists as the training set, we built up a scaffold-focused library of benzimidazole-pyrazine derivatives and processed a virtual screening protocol to discover potential A2BAR antagonists. Finally, five molecules with different Bemis-Murcko scaffolds were identified and exhibited higher binding free energies than the reference molecule 12o. Further computational analysis revealed that the 3-benzyl derivative ABA-1266 presented high selectivity toward A2BAR and showed preferred draggability, providing future potent development of selective A2BAR antagonists.
Collapse
Affiliation(s)
- Rui Qin
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, China
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Hao Zhang
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, China
| | - Weifeng Huang
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, China
| | - Zhenglin Shao
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, China
| | - Jinping Lei
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
5
|
Jin X, Zhang H, Xie X, Zhang M, Wang R, Liu H, Wang X, Wang J, Li D, Li Y, Xue W, Li J, He J, Liu Y, Yao J. From Traditional Efficacy to Drug Design: A Review of Astragali Radix. Pharmaceuticals (Basel) 2025; 18:413. [PMID: 40143189 PMCID: PMC11945149 DOI: 10.3390/ph18030413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2025] [Revised: 03/08/2025] [Accepted: 03/11/2025] [Indexed: 03/28/2025] Open
Abstract
Astragali Radix (AR), a traditional Chinese herbal medicine, is derived from the dried roots of Astragalus membranaceus (Fisch.) Bge. var. mongholicus (Bge.) Hsiao (A. membranaceus var. mongholicus, AMM) or Astragalus membranaceus (Fisch.) Bge (A. membranaceus, AM). According to traditional Chinese medicine (TCM) theory, AR is believed to tonify qi, elevate yang, consolidate the body's surface to reduce sweating, promote diuresis and reduce swelling, generate body fluids, and nourish the blood. It has been widely used to treat general weakness and chronic illnesses and to improve overall vitality. Extensive research has identified various medicinal properties of AR, including anti-tumor, antioxidant, cardiovascular-protective, immunomodulatory, anti-inflammatory, anti-diabetic, and neuroprotective effects. With advancements in technology, methods such as computer-aided drug design (CADD) and artificial intelligence (AI) are increasingly being applied to the development of TCM. This review summarizes the progress of research on AR over the past decades, providing a comprehensive overview of its traditional efficacy, botanical characteristics, drug design and distribution, chemical constituents, and phytochemistry. This review aims to enhance researchers' understanding of AR and its pharmaceutical potential, thereby facilitating further development and utilization.
Collapse
Affiliation(s)
- Xiaojie Jin
- College of Pharmacy, Gansu University of Chinese Medicine, Lanzhou 730000, China; (X.J.); (H.Z.); (X.X.); (M.Z.); (X.W.); (J.W.)
- Provincial Key Laboratory of Molecular Medicine and Prevention Research of Major Diseases, Gansu University of Chinese Medicine, Lanzhou 730000, China; (R.W.); (Y.L.); (J.H.)
- Key Laboratory of Dunhuang Medicine, Ministry of Education, Gansu University of Traditional Chinese Medicine, Lanzhou 730000, China;
| | - Huijuan Zhang
- College of Pharmacy, Gansu University of Chinese Medicine, Lanzhou 730000, China; (X.J.); (H.Z.); (X.X.); (M.Z.); (X.W.); (J.W.)
| | - Xiaorong Xie
- College of Pharmacy, Gansu University of Chinese Medicine, Lanzhou 730000, China; (X.J.); (H.Z.); (X.X.); (M.Z.); (X.W.); (J.W.)
| | - Min Zhang
- College of Pharmacy, Gansu University of Chinese Medicine, Lanzhou 730000, China; (X.J.); (H.Z.); (X.X.); (M.Z.); (X.W.); (J.W.)
| | - Ruifeng Wang
- Provincial Key Laboratory of Molecular Medicine and Prevention Research of Major Diseases, Gansu University of Chinese Medicine, Lanzhou 730000, China; (R.W.); (Y.L.); (J.H.)
- Key Laboratory of Dunhuang Medicine, Ministry of Education, Gansu University of Traditional Chinese Medicine, Lanzhou 730000, China;
| | - Hao Liu
- College of Pharmacy, Gansu University of Chinese Medicine, Lanzhou 730000, China; (X.J.); (H.Z.); (X.X.); (M.Z.); (X.W.); (J.W.)
| | - Xinyu Wang
- College of Pharmacy, Gansu University of Chinese Medicine, Lanzhou 730000, China; (X.J.); (H.Z.); (X.X.); (M.Z.); (X.W.); (J.W.)
| | - Jiao Wang
- College of Pharmacy, Gansu University of Chinese Medicine, Lanzhou 730000, China; (X.J.); (H.Z.); (X.X.); (M.Z.); (X.W.); (J.W.)
| | - Dangui Li
- College of Pharmacy, Gansu University of Chinese Medicine, Lanzhou 730000, China; (X.J.); (H.Z.); (X.X.); (M.Z.); (X.W.); (J.W.)
| | - Yaling Li
- Provincial Key Laboratory of Molecular Medicine and Prevention Research of Major Diseases, Gansu University of Chinese Medicine, Lanzhou 730000, China; (R.W.); (Y.L.); (J.H.)
- School of Basic Medicine, Gansu University of Traditional Chinese Medicine, Lanzhou 730000, China
| | - Weiwei Xue
- Innovative Drug Research Centre, School of Pharmaceutical Sciences, Chongqing University, Chongqing 404100, China;
| | - Jintian Li
- Key Laboratory of Dunhuang Medicine, Ministry of Education, Gansu University of Traditional Chinese Medicine, Lanzhou 730000, China;
| | - Jianxin He
- Provincial Key Laboratory of Molecular Medicine and Prevention Research of Major Diseases, Gansu University of Chinese Medicine, Lanzhou 730000, China; (R.W.); (Y.L.); (J.H.)
- School of Basic Medicine, Gansu University of Traditional Chinese Medicine, Lanzhou 730000, China
| | - Yongqi Liu
- Provincial Key Laboratory of Molecular Medicine and Prevention Research of Major Diseases, Gansu University of Chinese Medicine, Lanzhou 730000, China; (R.W.); (Y.L.); (J.H.)
- Key Laboratory of Dunhuang Medicine, Ministry of Education, Gansu University of Traditional Chinese Medicine, Lanzhou 730000, China;
| | - Juan Yao
- College of Pharmacy, Gansu University of Chinese Medicine, Lanzhou 730000, China; (X.J.); (H.Z.); (X.X.); (M.Z.); (X.W.); (J.W.)
- Key Laboratory of Dunhuang Medicine, Ministry of Education, Gansu University of Traditional Chinese Medicine, Lanzhou 730000, China;
| |
Collapse
|
6
|
Yang R, Li B, Dong J, Cai Z, Lin H, Wang F, Yang G. Reinforcement learning-based generative artificial intelligence for novel pesticide design. J Adv Res 2025:S2090-1232(25)00128-6. [PMID: 40032026 DOI: 10.1016/j.jare.2025.02.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2024] [Revised: 02/04/2025] [Accepted: 02/23/2025] [Indexed: 03/05/2025] Open
Abstract
INTRODUCTION Pesticides play a pivotal role in ensuring food security, and the development of green pesticides is an inevitable trend in global agricultural progress. Although deep learning-based generative models have revolutionized de novo drug design in pharmaceutical research, their application in pesticide research and development remains unexplored. OBJECTIVES This study aims to pioneer the application of generative artificial intelligence to pesticide design by proposing a reinforcement learning-based framework for obtaining pesticide-like molecules with high binding affinity. METHODS This framework comprises two key components: PestiGen-G, which systematically explores the pesticide-like chemical space using a character-based generative model coupled with the REINFORCE algorithm; and PestiGen-S, which combines a fragment-based generative model with the Monte Carlo Tree Search algorithm to generate molecules that stably bind to the specific target protein. RESULTS Experimental results show that the molecules generated by PestiGen have superior pesticide-likeness and binding affinity compared to those generated by existing methods. In addition, we employ an active learning strategy to reduce the false-positive rate of the generated molecules. Finally, through collaboration with domain experts, we successfully designed a novel 4-hydroxyphenylpyruvate dioxygenase inhibitor (YH23768) with favorable enzyme inhibition and herbicidal potency. CONCLUSION This proof-of-concept study highlights the utility of PestiGen as a valuable tool for pesticide design. The web server based on the model is freely available at https://dpai.ccnu.edu.cn/PestiGen/.
Collapse
Affiliation(s)
- Ruoqi Yang
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, PR China
| | - Biao Li
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, PR China
| | - Jin Dong
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, PR China
| | - Zhuomei Cai
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, PR China
| | - Hongyan Lin
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, PR China
| | - Fan Wang
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, PR China.
| | - Guangfu Yang
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, PR China.
| |
Collapse
|
7
|
Le MHN, Nguyen PK, Nguyen TPT, Nguyen HQ, Tam DNH, Huynh HH, Huynh PK, Le NQK. An in-depth review of AI-powered advancements in cancer drug discovery. Biochim Biophys Acta Mol Basis Dis 2025; 1871:167680. [PMID: 39837431 DOI: 10.1016/j.bbadis.2025.167680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Revised: 01/12/2025] [Accepted: 01/16/2025] [Indexed: 01/23/2025]
Abstract
The convergence of artificial intelligence (AI) and genomics is redefining cancer drug discovery by facilitating the development of personalized and effective therapies. This review examines the transformative role of AI technologies, including deep learning and advanced data analytics, in accelerating key stages of the drug discovery process: target identification, drug design, clinical trial optimization, and drug response prediction. Cutting-edge tools such as DrugnomeAI and PandaOmics have made substantial contributions to therapeutic target identification, while AI's predictive capabilities are driving personalized treatment strategies. Additionally, advancements like AlphaFold highlight AI's capacity to address intricate challenges in drug development. However, the field faces significant challenges, including the management of large-scale genomic datasets and ethical concerns surrounding AI deployment in healthcare. This review underscores the promise of data-centric AI approaches and emphasizes the necessity of continued innovation and interdisciplinary collaboration. Together, AI and genomics are charting a path toward more precise, efficient, and transformative cancer therapeutics.
Collapse
Affiliation(s)
- Minh Huu Nhat Le
- International Master/Ph.D. Program in Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan; AIBioMed Research Group, Taipei Medical University, Taipei 110, Taiwan
| | - Phat Ky Nguyen
- International Master/Ph.D. Program in Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan; AIBioMed Research Group, Taipei Medical University, Taipei 110, Taiwan.
| | | | - Hien Quang Nguyen
- Cardiovascular Research Department, Methodist Hospital, Merrillville, IN 46410, USA
| | - Dao Ngoc Hien Tam
- Regulatory Affairs Department, Asia Shine Trading & Service Co. LTD, Viet Nam
| | - Han Hong Huynh
- International Master Program for Translational Science, College of Medical Science and Technology, Taipei Medical University, Taipei 110, Taiwan
| | - Phat Kim Huynh
- Department of Industrial and Systems Engineering, North Carolina A&T State University, Greensboro, NC 27411, USA.
| | - Nguyen Quoc Khanh Le
- AIBioMed Research Group, Taipei Medical University, Taipei 110, Taiwan; In-Service Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei 110, Taiwan; Translational Imaging Research Center, Taipei Medical University Hospital, Taipei 110, Taiwan.
| |
Collapse
|
8
|
Ben Geoffrey AS, Agrawal D, Kulkarni NM, Gunasekaran M. Molecular Glue-Design-Evaluator (MOLDE): An Advanced Method for In-Silico Molecular Glue Design. ACS OMEGA 2025; 10:6650-6662. [PMID: 40028145 PMCID: PMC11865985 DOI: 10.1021/acsomega.4c08049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 01/24/2025] [Accepted: 01/29/2025] [Indexed: 03/05/2025]
Abstract
Protein function modulation using small-molecule binding is an important therapeutic strategy for many diseases. However, many proteins remain undruggable due to the lack of suitable binding pockets for small-molecule binding. Proximity-induced protein degradation using molecular glues has recently been identified as an important strategy to target undruggable proteins. Molecular glues were discovered serendipitously and as such currently lack an established approach for in-silico-driven rationale design. In this work, we aim to establish an in-silico method for designing molecular glues. To achieve this, we leverage known molecular glue-mediated ternary complexes and derive a rationale for the in-silico design of molecular glues. Establishing an in-silico rationale for molecular glue design would significantly contribute to the literature and accelerate the discovery of molecular glues for targeting previously undruggable proteins. Our work presented here and named Molecular Glue-Designer-Evaluator (MOLDE) contributes to the growing literature of in-silico approaches to drug design in-silico literature.
Collapse
Affiliation(s)
- A. S. Ben Geoffrey
- Sravathi AI Technology Pvt.
Ltd., 63-B, First Floor,
Bommasandra Industrial Area, Bengaluru 560099, Karnataka, India
| | - Deepak Agrawal
- Sravathi AI Technology Pvt.
Ltd., 63-B, First Floor,
Bommasandra Industrial Area, Bengaluru 560099, Karnataka, India
| | - Nagaraj M. Kulkarni
- Sravathi AI Technology Pvt.
Ltd., 63-B, First Floor,
Bommasandra Industrial Area, Bengaluru 560099, Karnataka, India
| | - Manonmani Gunasekaran
- Sravathi AI Technology Pvt.
Ltd., 63-B, First Floor,
Bommasandra Industrial Area, Bengaluru 560099, Karnataka, India
| |
Collapse
|
9
|
Dai J, Zhou Z, Zhao Y, Kong F, Zhai Z, Zhu Z, Cai J, Huang S, Xu Y, Sun T. Combined usage of ligand- and structure-based virtual screening in the artificial intelligence era. Eur J Med Chem 2025; 283:117162. [PMID: 39673863 DOI: 10.1016/j.ejmech.2024.117162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Revised: 11/27/2024] [Accepted: 12/09/2024] [Indexed: 12/16/2024]
Abstract
Drug design has always been pursuing techniques with time- and cost-benefits. Virtual screening, generally classified as ligand-based (LBVS) and structure-based (SBVS) approaches, could identify active compounds in the large chemical library to reduce time and cost. Owing to the intrinsic flaws and complementary nature of both approaches, continued efforts have been made to combine them to mitigate limitations. Meanwhile, the emergence of machine learning (ML) endows them with opportunities to leverage vast amounts of data to improve their defects. However, few discussions on how to merge ML-improved LBVS and SBVS have been conducted. Therefore, this review provides insights into combined usage of ML-improved LBVS and SBVS to enlighten medicinal chemists to utilize these joint strategies to lift the screening efficiency as well as AI professionals to design novel techniques.
Collapse
Affiliation(s)
- Jingyi Dai
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Ziyi Zhou
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Yanru Zhao
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Fanjing Kong
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Zhenwei Zhai
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Zhishan Zhu
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Jie Cai
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Sha Huang
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Ying Xu
- Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, 610072, Sichuan, China.
| | - Tao Sun
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China; State Key Laboratory of Southwestern Chinese Medicine Resources, School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| |
Collapse
|
10
|
Wang J, Mao J, Li C, Xiang H, Wang X, Wang S, Wang Z, Chen Y, Li Y, No KT, Song T, Zeng X. Interface-aware molecular generative framework for protein-protein interaction modulators. J Cheminform 2024; 16:142. [PMID: 39707457 DOI: 10.1186/s13321-024-00930-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 11/11/2024] [Indexed: 12/23/2024] Open
Abstract
Protein-protein interactions (PPIs) play a crucial role in numerous biochemical and biological processes. Although several structure-based molecular generative models have been developed, PPI interfaces and compounds targeting PPIs exhibit distinct physicochemical properties compared to traditional binding pockets and small-molecule drugs. As a result, generating compounds that effectively target PPIs, particularly by considering PPI complexes or interface hotspot residues, remains a significant challenge. In this work, we constructed a comprehensive dataset of PPI interfaces with active and inactive compound pairs. Based on this, we propose a novel molecular generative framework tailored to PPI interfaces, named GENiPPI. Our evaluation demonstrates that GENiPPI captures the implicit relationships between the PPI interfaces and the active molecules, and can generate novel compounds that target these interfaces. Moreover, GENiPPI can generate structurally diverse novel compounds with limited PPI interface modulators. To the best of our knowledge, this is the first exploration of a structure-based molecular generative model focused on PPI interfaces, which could facilitate the design of PPI modulators. The PPI interface-based molecular generative model enriches the existing landscape of structure-based (pocket/interface) molecular generative model. SCIENTIFIC CONTRIBUTION: This study introduces GENiPPI, a protein-protein interaction (PPI) interface-aware molecular generative framework. The framework first employs Graph Attention Networks to capture atomic-level interaction features at the protein complex interface. Subsequently, Convolutional Neural Networks extract compound representations in voxel and electron density spaces. These features are integrated into a Conditional Wasserstein Generative Adversarial Network, which trains the model to generate compound representations targeting PPI interfaces. GENiPPI effectively captures the relationship between PPI interfaces and active/inactive compounds. Furthermore, in fewshot molecular generation, GENiPPI successfully generates compounds comparable to known disruptors. GENiPPI provides an efficient tool for structure-based design of PPI modulators.
Collapse
Affiliation(s)
- Jianmin Wang
- Department of Integrative Biotechnology, Yonsei University, Incheon, 21983, Republic of Korea
| | - Jiashun Mao
- Department of Integrative Biotechnology, Yonsei University, Incheon, 21983, Republic of Korea
| | - Chunyan Li
- School of Informatics, Yunnan Normal University, Kunming, China
| | - Hongxin Xiang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China
| | - Xun Wang
- School of Computer Science and Technology, China University of Petroleum, Qingdao, 266580, Shandong, China
- High Performance Computer Research Center, University of Chinese Academy of Sciences, Beijing, 100190, China
| | - Shuang Wang
- School of Computer Science and Technology, China University of Petroleum, Qingdao, 266580, Shandong, China
| | - Zixu Wang
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Yangyang Chen
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Yuquan Li
- College of Chemistry and Chemical Engineering, Lanzhou University, Lanzhou, China
| | - Kyoung Tai No
- Department of Integrative Biotechnology, Yonsei University, Incheon, 21983, Republic of Korea.
| | - Tao Song
- School of Computer Science and Technology, China University of Petroleum, Qingdao, 266580, Shandong, China.
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China.
| |
Collapse
|
11
|
Madushanka A, Moura RT, Kraka E. QM40, Realistic Quantum Mechanical Dataset for Machine Learning in Molecular Science. Sci Data 2024; 11:1376. [PMID: 39695146 DOI: 10.1038/s41597-024-04206-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2024] [Accepted: 12/02/2024] [Indexed: 12/20/2024] Open
Abstract
The growing popularity of machine learning (ML) and deep learning (DL) in scientific fields is hindered by the scarcity of high-quality datasets. While quantum mechanical (QM) predictions using DL techniques such as graph neural networks (GNNs) and generative models are gaining traction, insufficient training data remains a bottleneck. The QM40 dataset addresses this challenge by representing 88% of the FDA-approved drug chemical space. It includes molecules containing 10 to 40 atoms and composed of elements commonly found in drug molecular structures (C, O, N, S, F, Cl). QM40 offers valuable resources for researchers which include the core QM40 main dataset, containing 16 key quantum mechanical parameters for 162,954 molecules calculated using the B3LYP/6-31G(2df,p) level of theory in Gaussian16, ensuring consistency with established datasets like QM9 and Alchemy. This compatibility allows for future concatenation of QM40 with these datasets. In addition to other valuable information, the QM40 dataset offers the initial and optimized Cartesian coordinates, Mulliken charges, and detailed bond information, including local vibrational mode force constants, which serve as indicators of bond strength. QM40 can be used to benchmark both existing and new methods for predicting QM calculations using ML and DL techniques.
Collapse
Affiliation(s)
- Ayesh Madushanka
- Southern Methodist University Department of Chemistry, Dallas, TX, USA
| | - Renaldo T Moura
- Southern Methodist University Department of Chemistry, Dallas, TX, USA
- Department of Chemistry and Physics, Center of Agrarian Sciences, Federal University of Paraiba, Areia, PB, 58397-000, Brazil
| | - Elfi Kraka
- Southern Methodist University Department of Chemistry, Dallas, TX, USA.
| |
Collapse
|
12
|
Lu Z, Han J, Ji Y, Li B, Zhang A. Computational design of CDK1 inhibitors with enhanced target affinity and drug-likeness using deep-learning framework. Heliyon 2024; 10:e40345. [PMID: 39748968 PMCID: PMC11693894 DOI: 10.1016/j.heliyon.2024.e40345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Revised: 09/20/2024] [Accepted: 11/11/2024] [Indexed: 01/04/2025] Open
Abstract
Cyclin Dependent Kinase 1 (CDK1) plays a crucial role in cell cycle regulation, and dysregulation of its activity has been implicated in various cancers. Although several CDK1 inhibitors are currently in clinical trials, none have yet been approved for therapeutic use. This research utilized deep learning techniques, specifically Recurrent Neural Networks with Long Short-Term Memory (LSTM), to generate potential CDK1 inhibitors. Molecular docking, evaluation of molecular properties, and molecular dynamics simulations were conducted to identify the most promising candidates. The results showed that the generated ligands exhibited substantial improvements in target affinity and drug-likeness. Molecular docking results showed that the generated ligands had an average binding affinity of -10.65 ± 0.877 kcal/mol towards CDK1. The Quantitative Estimate of Drug-likeness (QED) values for the generated ligands averaged 0.733 ± 0.10, significantly higher than the 0.547 ± 0.15 observed for known CDK1 inhibitors (p < 0.001). Molecular dynamics simulations further confirmed the stability and favorable interactions of the selected ligands with the CDK1 complex. The identification of novel CDK1 inhibitors with improved binding affinities and drug-likeness properties could potentially fill the gap in the ongoing development of CDK inhibitors. However, it is imperative to note that extensive experimental validation is required prior to advancing these generated ligands to subsequent stages of drug development.
Collapse
Affiliation(s)
- Zuokun Lu
- Food and Pharmacy College, Xuchang University, Xuchang, 461000, Henan, China
- Key Laboratory of Biomarker-Based Rapid Detection Technology for Food Safety of Henan Province, Xuchang University, Xuchang, 461000, Henan, China
| | - Jiayuan Han
- Food and Pharmacy College, Xuchang University, Xuchang, 461000, Henan, China
| | - Yibo Ji
- Food and Pharmacy College, Xuchang University, Xuchang, 461000, Henan, China
| | - Bingrui Li
- Food and Pharmacy College, Xuchang University, Xuchang, 461000, Henan, China
| | - Aili Zhang
- Food and Pharmacy College, Xuchang University, Xuchang, 461000, Henan, China
| |
Collapse
|
13
|
Bernatavicius A, Šícho M, Janssen APA, Hassen AK, Preuss M, van Westen GJP. AlphaFold Meets De Novo Drug Design: Leveraging Structural Protein Information in Multitarget Molecular Generative Models. J Chem Inf Model 2024; 64:8113-8122. [PMID: 39475544 PMCID: PMC11558674 DOI: 10.1021/acs.jcim.4c00309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 10/15/2024] [Accepted: 10/15/2024] [Indexed: 11/12/2024]
Abstract
Recent advancements in deep learning and generative models have significantly expanded the applications of virtual screening for drug-like compounds. Here, we introduce a multitarget transformer model, PCMol, that leverages the latent protein embeddings derived from AlphaFold2 as a means of conditioning a de novo generative model on different targets. Incorporating rich protein representations allows the model to capture their structural relationships, enabling the chemical space interpolation of active compounds and target-side generalization to new proteins based on embedding similarities. In this work, we benchmark against other existing target-conditioned transformer models to illustrate the validity of using AlphaFold protein representations over raw amino acid sequences. We show that low-dimensional projections of these protein embeddings cluster appropriately based on target families and that model performance declines when these representations are intentionally corrupted. We also show that the PCMol model generates diverse, potentially active molecules for a wide array of proteins, including those with sparse ligand bioactivity data. The generated compounds display higher similarity known active ligands of held-out targets and have comparable molecular docking scores while maintaining novelty. Additionally, we demonstrate the important role of data augmentation in bolstering the performance of generative models in low-data regimes. Software package and AlphaFold protein embeddings are freely available at https://github.com/CDDLeiden/PCMol.
Collapse
Affiliation(s)
- Andrius Bernatavicius
- Leiden
Academic Centre for Drug Research, Leiden
University, Einsteinweg 55, 2333CC Leiden, The Netherlands
- Leiden
Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333CA Leiden, The Netherlands
| | - Martin Šícho
- Leiden
Academic Centre for Drug Research, Leiden
University, Einsteinweg 55, 2333CC Leiden, The Netherlands
- CZ-OPENSCREEN:
National Infrastructure for Chemical Biology, Department of Informatics
and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28 Prague, Czech
Republic
| | - Antonius P. A. Janssen
- Leiden
Academic Centre for Drug Research, Leiden
University, Einsteinweg 55, 2333CC Leiden, The Netherlands
- Leiden
Institute of Chemistry, Leiden University, Einsteinweg 55, 2333CC Leiden, The
Netherlands
| | - Alan Kai Hassen
- Leiden
Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333CA Leiden, The Netherlands
| | - Mike Preuss
- Leiden
Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333CA Leiden, The Netherlands
| | - Gerard J. P. van Westen
- Leiden
Academic Centre for Drug Research, Leiden
University, Einsteinweg 55, 2333CC Leiden, The Netherlands
| |
Collapse
|
14
|
Zhu Y, Fang Y, Huang W, Zhang W, Chen F, Dong J, Zeng W. AI-driven precision subcellular navigation with fluorescent probes. J Mater Chem B 2024; 12:11054-11062. [PMID: 39392117 DOI: 10.1039/d4tb01835d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Precise navigation within intricate biological systems is pivotal for comprehending cellular functions and diagnosing diseases. Fluorescent molecular probes, designed to target specific biological molecules, are indispensable tools for this endeavor. This paper delves into the revolutionary potential of artificial intelligence (AI) in crafting highly precise and effective fluorescent probes. We will discuss how AI can be employed to: design new subcellular dyes by optimizing physicochemical properties; design prospective subcellular targeting probes based on specific receptors; quantitatively explore the potential chemical laws of fluorescent molecules to optimize the optical properties of fluorescent probes; optimize the comprehensive properties of the probe and guide the construction of multifunctional targeting probes. Additionally, we showcase recent AI-driven advancements in probe development and their successful biomedical applications, while addressing challenges and outlining future directions towards transforming subcellular research, diagnostics, and drug discovery.
Collapse
Affiliation(s)
- Yingli Zhu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410083, P. R. China.
| | - Yanpeng Fang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410083, P. R. China.
| | - Wenzhi Huang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410083, P. R. China.
| | - Weiheng Zhang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410083, P. R. China.
| | - Fei Chen
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410083, P. R. China.
| | - Jie Dong
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410083, P. R. China.
| | - Wenbin Zeng
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410083, P. R. China.
| |
Collapse
|
15
|
Li T, Chen X, Tong W. Bridging organ transcriptomics for advancing multiple organ toxicity assessment with a generative AI approach. NPJ Digit Med 2024; 7:310. [PMID: 39501092 PMCID: PMC11538515 DOI: 10.1038/s41746-024-01317-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 10/25/2024] [Indexed: 11/08/2024] Open
Abstract
Translational research in toxicology has significantly benefited from transcriptomic profiling, particularly in drug safety. However, its application has predominantly focused on limited organs, notably the liver, due to resource constraints. This paper presents TransTox, an innovative AI model using a generative adversarial network (GAN) method to facilitate the bidirectional translation of transcriptomic profiles between the liver and kidney under drug treatment. TransTox demonstrates robust performance, validated across independent datasets and laboratories. First, the concordance between real experimental data and synthetic data generated by TransTox was demonstrated in characterizing toxicity mechanisms compared to real experimental settings. Second, TransTox proved valuable in gene expression predictive models, where synthetic data could be used to develop gene expression predictive models or serve as "digital twins" for diagnostic applications. The TransTox approach holds the potential for multi-organ toxicity assessment with AI and advancing the field of precision toxicology.
Collapse
Affiliation(s)
- Ting Li
- FDA National Center for Toxicological Research, Jefferson, AR, USA
| | - Xi Chen
- FDA National Center for Toxicological Research, Jefferson, AR, USA
| | - Weida Tong
- FDA National Center for Toxicological Research, Jefferson, AR, USA.
| |
Collapse
|
16
|
Pan H, Cheng M, Li Z, Sun X, Han C. Multidisciplinary structural optimization of polysaccharides preventing alcohol-induced liver disease with computer-aided molecular design. Int J Biol Macromol 2024; 282:137088. [PMID: 39486738 DOI: 10.1016/j.ijbiomac.2024.137088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Revised: 10/27/2024] [Accepted: 10/29/2024] [Indexed: 11/04/2024]
Abstract
Here, we optimized the active units of polysaccharides and investigated the conformational relationship between the polysaccharides and alcoholic liver disease (ALD) at the molecular level. We used data mining to screen polysaccharide structural parameters for ALD (PSP-ALD). Most ALD-resistant polysaccharides against ALD comprised glucose (Glc), mannose (Man), galactose (Gal), arabinose (Ara), and rhamnose (Rha). Additionally, (1 → 6)-, (1 → 3)-, and (1 → 4)- glycosidic linkages were mainly contained. Polysaccharides against ALD have a wide molecular weight distribution (2.1 × 103 Da - 9.6 × 107 Da). Based on the PSP-ALD analysis, six commercially available oligosaccharides were selected and their structures were built. After molecular docking, the binding affinities between stachyose and the key ALD targets were stronger, indicating that stachyose may be a polysaccharide-active unit against ALD (PAU-ALD). Furthermore, histological examination of liver tissue combined with serum levels of alanine aminotransferase (ALT), aspartate aminotransferase (AST), and triglycerides (TG) showed that stachyose had a significant protective effect against ALD in mice. In summary, we optimized a PAU-ALD and developed a method for studying the structure-activity relationship between polysaccharides and ALD at the molecular level, which provides a new research direction for the development and utilization of polysaccharides and their clinical applications in ALD.
Collapse
Affiliation(s)
- Hongyu Pan
- School of Pharmacy, Shandong University of Traditional Chinese Medicine, Jinan 250355, China
| | - Mengtao Cheng
- School of Pharmacy, Shandong University of Traditional Chinese Medicine, Jinan 250355, China
| | - Zhenxing Li
- School of Pharmacy, Shandong University of Traditional Chinese Medicine, Jinan 250355, China
| | - Xiaomei Sun
- School of Pharmacy, Shandong University of Traditional Chinese Medicine, Jinan 250355, China
| | - Chunchao Han
- School of Pharmacy, Shandong University of Traditional Chinese Medicine, Jinan 250355, China.
| |
Collapse
|
17
|
Petković M, Menkovski V. Description Generation Using Variational Auto-Encoders for Precursor microRNA. ENTROPY (BASEL, SWITZERLAND) 2024; 26:921. [PMID: 39593866 PMCID: PMC11592592 DOI: 10.3390/e26110921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Revised: 10/14/2024] [Accepted: 10/25/2024] [Indexed: 11/28/2024]
Abstract
Micro RNAs (miRNA) are a type of non-coding RNA involved in gene regulation and can be associated with diseases such as cancer, cardiovascular, and neurological diseases. As such, identifying the entire genome of miRNA can be of great relevance. Since experimental methods for novel precursor miRNA (pre-miRNA) detection are complex and expensive, computational detection using Machine Learning (ML) could be useful. Existing ML methods are often complex black boxes that do not create an interpretable structural description of pre-miRNA. In this paper, we propose a novel framework that makes use of generative modeling through Variational Auto-Encoders to uncover the generative factors of pre-miRNA. After training the VAE, the pre-miRNA description is developed using a decision tree on the lower dimensional latent space. Applying the framework to miRNA classification, we obtain a high reconstruction and classification performance while also developing an accurate miRNA description.
Collapse
Affiliation(s)
- Marko Petković
- Department of Applied Physics and Science Education, Eindhoven University of Technology, 5612AZ Eindhoven, The Netherlands;
- Eindhoven Artificial Intelligence Systems Institute, 5612AZ Eindhoven, The Netherlands
| | - Vlado Menkovski
- Eindhoven Artificial Intelligence Systems Institute, 5612AZ Eindhoven, The Netherlands
- Department of Mathematics and Computer Science, Eindhoven University of Technology, 5612AZ Eindhoven, The Netherlands
| |
Collapse
|
18
|
Romanelli V, Annunziata D, Cerchia C, Cerciello D, Piccialli F, Lavecchia A. Enhancing De Novo Drug Design across Multiple Therapeutic Targets with CVAE Generative Models. ACS OMEGA 2024; 9:43963-43976. [PMID: 39493989 PMCID: PMC11525747 DOI: 10.1021/acsomega.4c08027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2024] [Revised: 09/25/2024] [Accepted: 09/30/2024] [Indexed: 11/05/2024]
Abstract
Drug discovery is a costly and time-consuming process, necessitating innovative strategies to enhance efficiency across different stages, from initial hit identification to final market approval. Recent advancement in deep learning (DL), particularly in de novo drug design, show promise. Generative models, a subclass of DL algorithms, have significantly accelerated the de novo drug design process by exploring vast areas of chemical space. Here, we introduce a Conditional Variational Autoencoder (CVAE) generative model tailored for de novo molecular design tasks, utilizing both SMILES and SELFIES as molecular representations. Our computational framework successfully generates molecules with specific property profiles validated though metrics such as uniqueness, validity, novelty, quantitative estimate of drug-likeness (QED), and synthetic accessibility (SA). We evaluated our model's efficacy in generating novel molecules capable of binding to three therapeutic molecular targets: CDK2, PPARγ, and DPP-IV. Comparing with state-of-the-art frameworks demonstrated our model's ability to achieve higher structural diversity while maintaining the molecular properties ranges observed in the training set molecules. This proposed model stands as a valuable resource for advancing de novo molecular design capabilities.
Collapse
Affiliation(s)
- Virgilio Romanelli
- Department
of Pharmacy, “Drug Discovery Laboratory”, University of Naples Federico II, Naples 80131, Italy
| | - Daniela Annunziata
- Department
of Mathematics and Applications “R. Caccioppoli”, University of Naples Federico II, Naples 80126, Italy
| | - Carmen Cerchia
- Department
of Pharmacy, “Drug Discovery Laboratory”, University of Naples Federico II, Naples 80131, Italy
| | - Donato Cerciello
- Department
of Mathematics and Applications “R. Caccioppoli”, University of Naples Federico II, Naples 80126, Italy
| | - Francesco Piccialli
- Department
of Mathematics and Applications “R. Caccioppoli”, University of Naples Federico II, Naples 80126, Italy
| | - Antonio Lavecchia
- Department
of Pharmacy, “Drug Discovery Laboratory”, University of Naples Federico II, Naples 80131, Italy
| |
Collapse
|
19
|
Wang K, Huang Y, Wang Y, You Q, Wang L. Recent advances from computer-aided drug design to artificial intelligence drug design. RSC Med Chem 2024; 15:d4md00522h. [PMID: 39493228 PMCID: PMC11523840 DOI: 10.1039/d4md00522h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Accepted: 10/09/2024] [Indexed: 11/05/2024] Open
Abstract
Computer-aided drug design (CADD), a cornerstone of modern drug discovery, can predict how a molecular structure relates to its activity and interacts with its target using structure-based and ligand-based methods. Fueled by ever-increasing data availability and continuous model optimization, artificial intelligence drug design (AIDD), as an enhanced iteration of CADD, has thrived in the past decade. AIDD demonstrates unprecedented opportunities in protein folding, property prediction, and molecular generation. It can also facilitate target identification, high-throughput screening (HTS), and synthetic route prediction. With AIDD involved, the process of drug discovery is greatly accelerated. Notably, AIDD offers the potential to explore uncharted territories of chemical space beyond current knowledge. In this perspective, we began by briefly outlining the main workflows and components of CADD. Then through showcasing exemplary cases driven by AIDD in recent years, we describe the evolving role of artificial intelligence (AI) in drug discovery from three distinct stages, that is, chemical library screening, linker generation, and de novo molecular generation. In this process, we attempted to draw comparisons between the features of CADD and AIDD.
Collapse
Affiliation(s)
- Keran Wang
- State Key Laboratory of Natural Medicines and, Jiangsu Key Laboratory of Drug Design and Optimization, China Pharmaceutical University Nanjing 210009 China +86 025 83271351 +86 15261483858
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University Nanjing 210009 China
| | - Yanwen Huang
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University Beijing 100191 China
| | - Yan Wang
- Department of Urology, Shuguang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine Shanghai 201203 China +86 13122152007
| | - Qidong You
- State Key Laboratory of Natural Medicines and, Jiangsu Key Laboratory of Drug Design and Optimization, China Pharmaceutical University Nanjing 210009 China +86 025 83271351 +86 15261483858
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University Nanjing 210009 China
| | - Lei Wang
- State Key Laboratory of Natural Medicines and, Jiangsu Key Laboratory of Drug Design and Optimization, China Pharmaceutical University Nanjing 210009 China +86 025 83271351 +86 15261483858
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University Nanjing 210009 China
| |
Collapse
|
20
|
Bhattacharya D, Cassady HJ, Hickner MA, Reinhart WF. Large Language Models as Molecular Design Engines. J Chem Inf Model 2024. [PMID: 39231030 DOI: 10.1021/acs.jcim.4c01396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2024]
Abstract
The design of small molecules is crucial for technological applications ranging from drug discovery to energy storage. Due to the vast design space available to modern synthetic chemistry, the community has increasingly sought to use data-driven and machine learning approaches to navigate this space. Although generative machine learning methods have recently shown potential for computational molecular design, their use is hindered by complex training procedures, and they often fail to generate valid and unique molecules. In this context, pretrained Large Language Models (LLMs) have emerged as potential tools for molecular design, as they appear to be capable of creating and modifying molecules based on simple instructions provided through natural language prompts. In this work, we show that the Claude 3 Opus LLM can read, write, and modify molecules according to prompts, with impressive 97% valid and unique molecules. By quantifying these modifications in a low-dimensional latent space, we systematically evaluate the model's behavior under different prompting conditions. Notably, the model is able to perform guided molecular generation when asked to manipulate the electronic structure of molecules using simple, natural-language prompts. Our findings highlight the potential of LLMs as powerful and versatile molecular design engines.
Collapse
Affiliation(s)
- Debjyoti Bhattacharya
- Materials Science and Engineering, Pennsylvania State University, University Park, Pennsylvania 16802, United States
| | - Harrison J Cassady
- Department of Chemical Engineering and Material Science, Michigan State University, East Lansing, Michigan 48824, United States
| | - Michael A Hickner
- Department of Chemical Engineering and Material Science, Michigan State University, East Lansing, Michigan 48824, United States
| | - Wesley F Reinhart
- Materials Science and Engineering, Pennsylvania State University, University Park, Pennsylvania 16802, United States
- Institute for Computational and Data Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, United States
| |
Collapse
|
21
|
Lavecchia A. Navigating the frontier of drug-like chemical space with cutting-edge generative AI models. Drug Discov Today 2024; 29:104133. [PMID: 39103144 DOI: 10.1016/j.drudis.2024.104133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 07/20/2024] [Accepted: 07/31/2024] [Indexed: 08/07/2024]
Abstract
Deep generative models (GMs) have transformed the exploration of drug-like chemical space (CS) by generating novel molecules through complex, nontransparent processes, bypassing direct structural similarity. This review examines five key architectures for CS exploration: recurrent neural networks (RNNs), variational autoencoders (VAEs), generative adversarial networks (GANs), normalizing flows (NF), and Transformers. It discusses molecular representation choices, training strategies for focused CS exploration, evaluation criteria for CS coverage, and related challenges. Future directions include refining models, exploring new notations, improving benchmarks, and enhancing interpretability to better understand biologically relevant molecular properties.
Collapse
Affiliation(s)
- Antonio Lavecchia
- 'Drug Discovery' Laboratory, Department of Pharmacy, University of Naples Federico II, I-80131 Naples, Italy.
| |
Collapse
|
22
|
Lavecchia A. Advancing drug discovery with deep attention neural networks. Drug Discov Today 2024; 29:104067. [PMID: 38925473 DOI: 10.1016/j.drudis.2024.104067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 06/10/2024] [Accepted: 06/19/2024] [Indexed: 06/28/2024]
Abstract
In the dynamic field of drug discovery, deep attention neural networks are revolutionizing our approach to complex data. This review explores the attention mechanism and its extended architectures, including graph attention networks (GATs), transformers, bidirectional encoder representations from transformers (BERT), generative pre-trained transformers (GPTs) and bidirectional and auto-regressive transformers (BART). Delving into their core principles and multifaceted applications, we uncover their pivotal roles in catalyzing de novo drug design, predicting intricate molecular properties and deciphering elusive drug-target interactions. Despite challenges, these attention-based architectures hold unparalleled promise to drive transformative breakthroughs and accelerate progress in pharmaceutical research.
Collapse
Affiliation(s)
- Antonio Lavecchia
- Drug Discovery Laboratory, Department of Pharmacy, University of Napoli Federico II, I-80131 Naples, Italy.
| |
Collapse
|
23
|
Shen C, Song J, Hsieh CY, Cao D, Kang Y, Ye W, Wu Z, Wang J, Zhang O, Zhang X, Zeng H, Cai H, Chen Y, Chen L, Luo H, Zhao X, Jian T, Chen T, Jiang D, Wang M, Ye Q, Wu J, Du H, Shi H, Deng Y, Hou T. DrugFlow: An AI-Driven One-Stop Platform for Innovative Drug Discovery. J Chem Inf Model 2024; 64:5381-5391. [PMID: 38920405 DOI: 10.1021/acs.jcim.4c00621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/27/2024]
Abstract
Artificial intelligence (AI)-aided drug design has demonstrated unprecedented effects on modern drug discovery, but there is still an urgent need for user-friendly interfaces that bridge the gap between these sophisticated tools and scientists, particularly those who are less computer savvy. Herein, we present DrugFlow, an AI-driven one-stop platform that offers a clean, convenient, and cloud-based interface to streamline early drug discovery workflows. By seamlessly integrating a range of innovative AI algorithms, covering molecular docking, quantitative structure-activity relationship modeling, molecular generation, ADMET (absorption, distribution, metabolism, excretion and toxicity) prediction, and virtual screening, DrugFlow can offer effective AI solutions for almost all crucial stages in early drug discovery, including hit identification and hit/lead optimization. We hope that the platform can provide sufficiently valuable guidance to aid real-word drug design and discovery. The platform is available at https://drugflow.com.
Collapse
Affiliation(s)
- Chao Shen
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jianfei Song
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Chang-Yu Hsieh
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410004, Hunan, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Wenling Ye
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Odin Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Hao Zeng
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Heng Cai
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Yu Chen
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Linkang Chen
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Hao Luo
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Xinda Zhao
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Tianye Jian
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Tong Chen
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Mingyang Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Qing Ye
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jialu Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Hongyan Du
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Hui Shi
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
| | - Yafeng Deng
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Tingjun Hou
- Hangzhou Carbonsilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
24
|
Choudhary R, Mahadevan R. FOCUS on NOD2: Advancing IBD Drug Discovery with a User-Informed Machine Learning Framework. ACS Med Chem Lett 2024; 15:1057-1070. [PMID: 39015268 PMCID: PMC11247655 DOI: 10.1021/acsmedchemlett.4c00148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 05/17/2024] [Accepted: 06/03/2024] [Indexed: 07/18/2024] Open
Abstract
In this study, we introduce the Framework for Optimized Customizable User-Informed Synthesis (FOCUS), a generative machine learning model tailored for drug discovery. FOCUS integrates domain expertise and uses Proximal Policy Optimization (PPO) to guide Monte Carlo Tree Search (MCTS) to efficiently explore chemical space. It generates SMILES representations of potential drug candidates, optimizing for druggability and binding efficacy to NOD2, PEP, and MCT1 receptors. The model is highly interpretive, allowing for user-feedback and expert-driven adjustments based on detailed cycle reports. Employing tools like SHAP and LIME, FOCUS provides a transparent analysis of decision-making processes, emphasizing features such as docking scores and interaction fingerprints. Comparative studies with Muramyl Dipeptide (MDP) demonstrate improved interaction profiles. FOCUS merges advanced machine learning with expert insight, accelerating the drug discovery pipeline.
Collapse
Affiliation(s)
- Ruhi Choudhary
- Department of Chemical Engineering
and Applied Chemistry, University of Toronto, Toronto, Ontario M5S 3E5, Canada
| | - Radhakrishnan Mahadevan
- Department of Chemical Engineering
and Applied Chemistry, University of Toronto, Toronto, Ontario M5S 3E5, Canada
| |
Collapse
|
25
|
Ahmad S, Raza K. An extensive review on lung cancer therapeutics using machine learning techniques: state-of-the-art and perspectives. J Drug Target 2024; 32:635-646. [PMID: 38662768 DOI: 10.1080/1061186x.2024.2347358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2024] [Accepted: 04/18/2024] [Indexed: 05/07/2024]
Abstract
There are over 100 types of human cancer, accounting for millions of deaths every year. Lung cancer alone claims over 1.8 million lives per year and is expected to surpass 3.2 million by 2050, which underscores the urgent need for rapid drug development and repurposing initiatives. The application of AI emerges as a pivotal solution to developing anti-cancer therapeutics. This state-of-the-art review aims to explore the various applications of AI in lung cancer therapeutics. Predictive models can analyse large datasets, including clinical data, genetic information, and treatment outcomes, for novel drug design and to generate personalised treatment recommendations, potentially optimising therapeutic strategies, enhancing treatment efficacy, and minimising adverse effects. A thorough literature review study was conducted based on articles indexed in PubMed and Scopus. We compiled the use of various machine learning approaches, including CNN, RNN, GAN, VAEs, and other AI techniques, enhancing efficiency with accuracy exceeding 95%, which is validated through a computer-aided drug design process. AI can revolutionise lung cancer therapeutics, streamlining processes and saving biological scientists' time and effort-however, further research is needed to overcome challenges and fully unlock AI's potential in Lung Cancer Therapeutics.
Collapse
Affiliation(s)
- Shaban Ahmad
- Department of Computer Science, Jamia Millia Islamia, New Delhi, India
| | - Khalid Raza
- Department of Computer Science, Jamia Millia Islamia, New Delhi, India
| |
Collapse
|
26
|
Duan Y, Yang X, Zeng X, Wang W, Deng Y, Cao D. Enhancing Molecular Property Prediction through Task-Oriented Transfer Learning: Integrating Universal Structural Insights and Domain-Specific Knowledge. J Med Chem 2024; 67:9575-9586. [PMID: 38748846 DOI: 10.1021/acs.jmedchem.4c00692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2024]
Abstract
Precisely predicting molecular properties is crucial in drug discovery, but the scarcity of labeled data poses a challenge for applying deep learning methods. While large-scale self-supervised pretraining has proven an effective solution, it often neglects domain-specific knowledge. To tackle this issue, we introduce Task-Oriented Multilevel Learning based on BERT (TOML-BERT), a dual-level pretraining framework that considers both structural patterns and domain knowledge of molecules. TOML-BERT achieved state-of-the-art prediction performance on 10 pharmaceutical datasets. It has the capability to mine contextual information within molecular structures and extract domain knowledge from massive pseudo-labeled data. The dual-level pretraining accomplished significant positive transfer, with its two components making complementary contributions. Interpretive analysis elucidated that the effectiveness of the dual-level pretraining lies in the prior learning of a task-related molecular representation. Overall, TOML-BERT demonstrates the potential of combining multiple pretraining tasks to extract task-oriented knowledge, advancing molecular property prediction in drug discovery.
Collapse
Affiliation(s)
- Yanjing Duan
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha Hunan 410013, P. R. China
| | - Xixi Yang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410013, P. R. China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410013, P. R. China
| | - Wenxuan Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha Hunan 410013, P. R. China
| | - Youchao Deng
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha Hunan 410013, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha Hunan 410013, P. R. China
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, P. R. China
| |
Collapse
|
27
|
Zhang K, Tang Y, Yu H, Yang J, Tao L, Xiang P. Discovery of lupus nephritis targeted inhibitors based on De novo molecular design: comprehensive application of vinardo scoring, ADMET analysis, and molecular dynamics simulation. J Biomol Struct Dyn 2024:1-14. [PMID: 38501728 DOI: 10.1080/07391102.2024.2329293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2023] [Accepted: 03/06/2024] [Indexed: 03/20/2024]
Abstract
Lupus Nephritis (LN) is an autoimmune disease affecting the kidneys, and conventional drug studies have limitations due to its imprecise and complex pathogenesis. Therefore, the aim of this study was to design a novel Lupus Nephritis-targeted drug with good clinical due potential, high potency and selectivity by computer-assisted approach.NIK belongs to the serine/threonine protein kinase, which is gaining attention as a drug target for Lupus Nephritis. we used bioinformatics, homology modelling and sequence comparison analysis, small molecule ab initio design, ADMET analysis, molecular docking, molecular dynamics simulation, and MM/PBSA analysis to design and explore the selectivity and efficiency of a novel Lupus Nephritis-targeting drug, ClImYnib, and a classical NIK inhibitor, NIK SMI1. We used bioinformatics techniques to determine the correlation between lupus nephritis and the NF-κB signaling pathway. De novo drugs design was used to create a NIK-targeted inhibitor, ClImYnib, with lower toxicity, after which we used molecular dynamics to simulate NIK SMI1 against ClImYnib, and the simulation results showed that ClImYnib had better selectivity and efficiency. Our research delves into the molecular mechanism of protein ligands, and we have designed and validated an excellent NIK inhibitor using multiple computational simulation methods. More importantly, it provides an idea of target designing small molecules.
Collapse
Affiliation(s)
- Kaiyuan Zhang
- School of Clinical Medicine, Bengbu Medical College, China
| | - Yingkai Tang
- Department of Anatomy, School of basic Medicine, Bengbu Medical College, China
| | - Haiyue Yu
- School of Clinical Medicine, Bengbu Medical College, China
| | - Jingtao Yang
- School of Clinical Medicine, Bengbu Medical College, China
| | - Lu Tao
- Central Laboratory, The Frist Affiliated Hospital of Bengbu Medical College, Bengbu, Anhui, China
| | - Ping Xiang
- Central Laboratory, The Frist Affiliated Hospital of Bengbu Medical College, Bengbu, Anhui, China
| |
Collapse
|
28
|
Kong Y, Zhou C, Tan D, Xu X, Li Z, Cheng J. Discovery of Potential Neonicotinoid Insecticides by an Artificial Intelligence Generative Model and Structure-Based Virtual Screening. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2024; 72:5145-5152. [PMID: 38419506 DOI: 10.1021/acs.jafc.3c06895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/02/2024]
Abstract
The identification of neonicotinoid insecticides bearing novel scaffolds is of great importance for pesticide discovery. Here, artificial intelligence-based tools and virtual screening strategy were integrated to discover potential leads of neonicotinoid insecticides. A deep generative model was successfully constructed using a recurrent neural network combined with transfer learning. The model evaluation showed that the pretrained model could accurately grasp the SMILES grammar of drug-like molecules and generate potential neonicotinoid compounds after transfer learning. The generated molecules were evaluated by hierarchical virtual screening, hits were subjected to a similarity search, and the most similar structures were purchased for the bioassay. Compounds A2 and A5 displayed 52.5 and 50.3% mortality rates against Aphis craccivora at 100 mg/L, respectively. The docking study indicated that these two compounds have similar binding modes to neonicotinoids, which were verified by further molecular dynamics simulations.
Collapse
Affiliation(s)
- Yijin Kong
- Shanghai Key Laboratory of Chemical Biology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Cong Zhou
- Shanghai Key Laboratory of Chemical Biology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Du Tan
- Shanghai Key Laboratory of Chemical Biology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Xiaoyong Xu
- Shanghai Key Laboratory of Chemical Biology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Zhong Li
- Shanghai Key Laboratory of Chemical Biology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Jiagao Cheng
- Shanghai Key Laboratory of Chemical Biology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
29
|
Zhang H, Huang J, Xie J, Huang W, Yang Y, Xu M, Lei J, Chen H. GRELinker: A Graph-Based Generative Model for Molecular Linker Design with Reinforcement and Curriculum Learning. J Chem Inf Model 2024; 64:666-676. [PMID: 38241022 DOI: 10.1021/acs.jcim.3c01700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/13/2024]
Abstract
Fragment-based drug discovery (FBDD) is widely used in drug design. One useful strategy in FBDD is designing linkers for linking fragments to optimize their molecular properties. In the current study, we present a novel generative fragment linking model, GRELinker, which utilizes a gated-graph neural network combined with reinforcement and curriculum learning to generate molecules with desirable attributes. The model has been shown to be efficient in multiple tasks, including controlling log P, optimizing synthesizability or predicted bioactivity of compounds, and generating molecules with high 3D similarity but low 2D similarity to the lead compound. Specifically, our model outperforms the previously reported reinforcement learning (RL) built-in method DRlinker on these benchmark tasks. Moreover, GRELinker has been successfully used in an actual FBDD case to generate optimized molecules with enhanced affinities by employing the docking score as the scoring function in RL. Besides, the implementation of curriculum learning in our framework enables the generation of structurally complex linkers more efficiently. These results demonstrate the benefits and feasibility of GRELinker in linker design for molecular optimization and drug discovery.
Collapse
Affiliation(s)
- Hao Zhang
- School of Pharmaceutical Science, Sun Yat-sen University, Guangzhou 510006, China
| | - Jinchao Huang
- School of Pharmaceutical Science, Sun Yat-sen University, Guangzhou 510006, China
| | - Junjie Xie
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China
| | - Weifeng Huang
- School of Pharmaceutical Science, Sun Yat-sen University, Guangzhou 510006, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China
| | - Mingyuan Xu
- Guangzhou National Laboratory, Guangzhou International Bio Island, No. 9 Xin Dao Huan Bei Road, Guangzhou 510005, China
| | - Jinping Lei
- School of Pharmaceutical Science, Sun Yat-sen University, Guangzhou 510006, China
| | - Hongming Chen
- Guangzhou National Laboratory, Guangzhou International Bio Island, No. 9 Xin Dao Huan Bei Road, Guangzhou 510005, China
| |
Collapse
|
30
|
Shen T, Guo J, Han Z, Zhang G, Liu Q, Si X, Wang D, Wu S, Xia J. AutoMolDesigner for Antibiotic Discovery: An AI-Based Open-Source Software for Automated Design of Small-Molecule Antibiotics. J Chem Inf Model 2024; 64:575-583. [PMID: 38265916 DOI: 10.1021/acs.jcim.3c01562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
Discovery of small-molecule antibiotics with novel chemotypes serves as one of the essential strategies to address antibiotic resistance. Although a considerable number of computational tools committed to molecular design have been reported, there is a deficit in holistic and efficient tools specifically developed for small-molecule antibiotic discovery. To address this issue, we report AutoMolDesigner, a computational modeling software dedicated to small-molecule antibiotic design. It is a generalized framework comprising two functional modules, i.e., generative-deep-learning-enabled molecular generation and automated machine-learning-based antibacterial activity/property prediction, wherein individually trained models and curated datasets are out-of-the-box for whole-cell-based antibiotic screening and design. It is open-source, thus allowing for the incorporation of new features for flexible use. Unlike most software programs based on Linux and command lines, this application equipped with a Qt-based graphical user interface can be run on personal computers with multiple operating systems, making it much easier to use for experimental scientists. The software and related materials are freely available at GitHub (https://github.com/taoshen99/AutoMolDesigner) and Zenodo (https://zenodo.org/record/10097899).
Collapse
Affiliation(s)
- Tao Shen
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Jiale Guo
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Zunsheng Han
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Gao Zhang
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Qingxin Liu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
- School of Pharmacy, Jiangsu Ocean University, Lianyungang, Jiangsu 222005, China
| | - Xinxin Si
- School of Pharmacy, Jiangsu Ocean University, Lianyungang, Jiangsu 222005, China
| | - Dongmei Wang
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Song Wu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Jie Xia
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| |
Collapse
|
31
|
Tropsha A, Isayev O, Varnek A, Schneider G, Cherkasov A. Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR. Nat Rev Drug Discov 2024; 23:141-155. [PMID: 38066301 DOI: 10.1038/s41573-023-00832-0] [Citation(s) in RCA: 68] [Impact Index Per Article: 68.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/21/2023] [Indexed: 02/08/2024]
Abstract
Quantitative structure-activity relationship (QSAR) modelling, an approach that was introduced 60 years ago, is widely used in computer-aided drug design. In recent years, progress in artificial intelligence techniques, such as deep learning, the rapid growth of databases of molecules for virtual screening and dramatic improvements in computational power have supported the emergence of a new field of QSAR applications that we term 'deep QSAR'. Marking a decade from the pioneering applications of deep QSAR to tasks involved in small-molecule drug discovery, we herein describe key advances in the field, including deep generative and reinforcement learning approaches in molecular design, deep learning models for synthetic planning and the application of deep QSAR models in structure-based virtual screening. We also reflect on the emergence of quantum computing, which promises to further accelerate deep QSAR applications and the need for open-source and democratized resources to support computer-aided drug design.
Collapse
Affiliation(s)
| | | | | | | | - Artem Cherkasov
- University of British Columbia, Vancouver, BC, Canada.
- Photonic Inc., Coquitlam, BC, Canada.
| |
Collapse
|
32
|
Bajorath J. Chemical language models for molecular design. Mol Inform 2024; 43:e202300288. [PMID: 38010610 DOI: 10.1002/minf.202300288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 11/22/2023] [Accepted: 11/23/2023] [Indexed: 11/29/2023]
Abstract
In drug discovery, chemical language models (CLMs) originating from natural language processing offer new opportunities for molecular design. CLMs have been developed using recurrent neural network (RNN) or transformer architectures. For the predictive performance of RNN-based encoder-decoder frameworks and transformers, attention mechanisms play a central role. Among others, emerging application areas for CLMs include constrained generative modeling and the prediction of chemical reactions or drug-target interactions. Since CLMs are applicable to any compound or target data that can be presented in a sequential format and tokenized, mappings of different types of sequences can be learned. For example, active compounds can be predicted from protein sequence motifs. Novel off-the-beat-path applications can also be considered. For example, analogue series from medicinal chemistry can be perceived and represented as chemical sequences and extended with new compounds using CLMs. Herein, methodological features of CLMs and different applications are discussed.
Collapse
Affiliation(s)
- Jürgen Bajorath
- Department of Life Science Informatics, Bonn-Aachen International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universität Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115, Bonn, Germany
- Lamarr Institute for Machine Learning and Artificial Intelligence, Rheinische Friedrich-Wilhelms-Universität Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115, Bonn, Germany
| |
Collapse
|
33
|
Haga CL, Yang XD, Gheit IS, Phinney DG. Graph neural networks for the identification of novel inhibitors of a small RNA. SLAS DISCOVERY : ADVANCING LIFE SCIENCES R & D 2023; 28:402-409. [PMID: 37839522 DOI: 10.1016/j.slasd.2023.10.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 09/16/2023] [Accepted: 10/11/2023] [Indexed: 10/17/2023]
Abstract
MicroRNAs (miRNAs) play a crucial role in post-transcriptional gene regulation and have been implicated in various diseases, including cancers and lung disease. In recent years, Graph Neural Networks (GNNs) have emerged as powerful tools for analyzing graph-structured data, making them well-suited for the analysis of molecular structures. In this work, we explore the application of GNNs in ligand-based drug screening for small molecules targeting miR-21. By representing a known dataset of small molecules targeting miR-21 as graphs, GNNs can learn complex relationships between their structures and activities, enabling the prediction of potential miRNA-targeting small molecules by capturing the structural features and similarity between known miRNA-targeting compounds. The use of GNNs in miRNA-targeting drug screening holds promise for the discovery of novel therapeutic agents and provides a computational framework for efficient screening of large chemical libraries.
Collapse
Affiliation(s)
- Christopher L Haga
- Department of Molecular Medicine, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation & Technology, Jupiter, FL, 33458, USA.
| | - Xue D Yang
- Department of Molecular Medicine, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation & Technology, Jupiter, FL, 33458, USA
| | - Ibrahim S Gheit
- Department of Molecular Medicine, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation & Technology, Jupiter, FL, 33458, USA
| | - Donald G Phinney
- Department of Molecular Medicine, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation & Technology, Jupiter, FL, 33458, USA
| |
Collapse
|
34
|
Zhang Y, Liu C, Liu M, Liu T, Lin H, Huang CB, Ning L. Attention is all you need: utilizing attention in AI-enabled drug discovery. Brief Bioinform 2023; 25:bbad467. [PMID: 38189543 PMCID: PMC10772984 DOI: 10.1093/bib/bbad467] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 11/03/2023] [Accepted: 11/25/2023] [Indexed: 01/09/2024] Open
Abstract
Recently, attention mechanism and derived models have gained significant traction in drug development due to their outstanding performance and interpretability in handling complex data structures. This review offers an in-depth exploration of the principles underlying attention-based models and their advantages in drug discovery. We further elaborate on their applications in various aspects of drug development, from molecular screening and target binding to property prediction and molecule generation. Finally, we discuss the current challenges faced in the application of attention mechanisms and Artificial Intelligence technologies, including data quality, model interpretability and computational resource constraints, along with future directions for research. Given the accelerating pace of technological advancement, we believe that attention-based models will have an increasingly prominent role in future drug discovery. We anticipate that these models will usher in revolutionary breakthroughs in the pharmaceutical domain, significantly accelerating the pace of drug development.
Collapse
Affiliation(s)
- Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Caiqi Liu
- Department of Gastrointestinal Medical Oncology, Harbin Medical University Cancer Hospital, No.150 Haping Road, Nangang District, Harbin, Heilongjiang 150081, China
- Key Laboratory of Molecular Oncology of Heilongjiang Province, No.150 Haping Road, Nangang District, Harbin, Heilongjiang 150081, China
| | - Mujiexin Liu
- Chongqing Key Laboratory of Sichuan-Chongqing Co-construction for Diagnosis and Treatment of Infectious Diseases Integrated Traditional Chinese and Western Medicine, College of Medical Technology, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Tianyuan Liu
- Graduate School of Science and Technology, University of Tsukuba, Tsukuba, Japan
| | - Hao Lin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Cheng-Bing Huang
- School of Computer Science and Technology, Aba Teachers University, Aba, China
| | - Lin Ning
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu 611844, China
| |
Collapse
|
35
|
Wang S, Wang L, Li F, Bai F. DeepSA: a deep-learning driven predictor of compound synthesis accessibility. J Cheminform 2023; 15:103. [PMID: 37919805 PMCID: PMC10621138 DOI: 10.1186/s13321-023-00771-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 10/20/2023] [Indexed: 11/04/2023] Open
Abstract
With the continuous development of artificial intelligence technology, more and more computational models for generating new molecules are being developed. However, we are often confronted with the question of whether these compounds are easy or difficult to synthesize, which refers to synthetic accessibility of compounds. In this study, a deep learning based computational model called DeepSA, was proposed to predict the synthesis accessibility of compounds, which provides a useful tool to choose molecules. DeepSA is a chemical language model that was developed by training on a dataset of 3,593,053 molecules using various natural language processing (NLP) algorithms, offering advantages over state-of-the-art methods and having a much higher area under the receiver operating characteristic curve (AUROC), i.e., 89.6%, in discriminating those molecules that are difficult to synthesize. This helps users select less expensive molecules for synthesis, reducing the time and cost required for drug discovery and development. Interestingly, a comparison of DeepSA with a Graph Attention-based method shows that using SMILES alone can also efficiently visualize and extract compound's informative features. DeepSA is available online on the below web server ( https://bailab.siais.shanghaitech.edu.cn/services/deepsa/ ) of our group, and the code is available at https://github.com/Shihang-Wang-58/DeepSA .
Collapse
Affiliation(s)
- Shihang Wang
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, China
| | - Lin Wang
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, China
| | - Fenglei Li
- School of Information Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, China
| | - Fang Bai
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, China.
- School of Information Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, China.
- Shanghai Clinical Research and Trial Center, Shanghai, 201210, China.
| |
Collapse
|
36
|
Bae B, Bae H, Nam H. LOGICS: Learning optimal generative distribution for designing de novo chemical structures. J Cheminform 2023; 15:77. [PMID: 37674239 PMCID: PMC10483765 DOI: 10.1186/s13321-023-00747-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 08/23/2023] [Indexed: 09/08/2023] Open
Abstract
In recent years, the field of computational drug design has made significant strides in the development of artificial intelligence (AI) models for the generation of de novo chemical compounds with desired properties and biological activities, such as enhanced binding affinity to target proteins. These high-affinity compounds have the potential to be developed into more potent therapeutics for a broad spectrum of diseases. Due to the lack of data required for the training of deep generative models, however, some of these approaches have fine-tuned their molecular generators using data obtained from a separate predictor. While these studies show that generative models can produce structures with the desired target properties, it remains unclear whether the diversity of the generated structures and the span of their chemical space align with the distribution of the intended target molecules. In this study, we present a novel generative framework, LOGICS, a framework for Learning Optimal Generative distribution Iteratively for designing target-focused Chemical Structures. We address the exploration-exploitation dilemma, which weighs the choice between exploring new options and exploiting current knowledge. To tackle this issue, we incorporate experience memory and employ a layered tournament selection approach to refine the fine-tuning process. The proposed method was applied to the binding affinity optimization of two target proteins of different protein classes, κ-opioid receptors, and PIK3CA, and the quality and the distribution of the generative molecules were evaluated. The results showed that LOGICS outperforms competing state-of-the-art models and generates more diverse de novo chemical structures with optimized properties. The source code is available at the GitHub repository ( https://github.com/GIST-CSBL/LOGICS ).
Collapse
Affiliation(s)
- Bongsung Bae
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Buk-Gu, Gwangju, 61005, Republic of Korea
| | - Haelee Bae
- AI Graduate School, Gwangju Institute of Science and Technology (GIST), Buk-Gu, Gwangju, 61005, Republic of Korea
| | - Hojung Nam
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Buk-Gu, Gwangju, 61005, Republic of Korea.
- AI Graduate School, Gwangju Institute of Science and Technology (GIST), Buk-Gu, Gwangju, 61005, Republic of Korea.
- Center for AI-Applied High Efficiency Drug Discovery (AHEDD), Gwangju Institute of Science and Technology (GIST), Buk-Gu, Gwangju, 61005, Republic of Korea.
| |
Collapse
|
37
|
Han R, Yoon H, Kim G, Lee H, Lee Y. Revolutionizing Medicinal Chemistry: The Application of Artificial Intelligence (AI) in Early Drug Discovery. Pharmaceuticals (Basel) 2023; 16:1259. [PMID: 37765069 PMCID: PMC10537003 DOI: 10.3390/ph16091259] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 08/24/2023] [Accepted: 09/04/2023] [Indexed: 09/29/2023] Open
Abstract
Artificial intelligence (AI) has permeated various sectors, including the pharmaceutical industry and research, where it has been utilized to efficiently identify new chemical entities with desirable properties. The application of AI algorithms to drug discovery presents both remarkable opportunities and challenges. This review article focuses on the transformative role of AI in medicinal chemistry. We delve into the applications of machine learning and deep learning techniques in drug screening and design, discussing their potential to expedite the early drug discovery process. In particular, we provide a comprehensive overview of the use of AI algorithms in predicting protein structures, drug-target interactions, and molecular properties such as drug toxicity. While AI has accelerated the drug discovery process, data quality issues and technological constraints remain challenges. Nonetheless, new relationships and methods have been unveiled, demonstrating AI's expanding potential in predicting and understanding drug interactions and properties. For its full potential to be realized, interdisciplinary collaboration is essential. This review underscores AI's growing influence on the future trajectory of medicinal chemistry and stresses the importance of ongoing synergies between computational and domain experts.
Collapse
Affiliation(s)
| | | | | | | | - Yoonji Lee
- College of Pharmacy, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
38
|
Salas-Estrada L, Provasi D, Qiu X, Kaniskan HÜ, Huang XP, DiBerto JF, Lamim Ribeiro JM, Jin J, Roth BL, Filizola M. De Novo Design of κ-Opioid Receptor Antagonists Using a Generative Deep-Learning Framework. J Chem Inf Model 2023; 63:5056-5065. [PMID: 37555591 PMCID: PMC10466374 DOI: 10.1021/acs.jcim.3c00651] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Indexed: 08/10/2023]
Abstract
Likely effective pharmacological interventions for the treatment of opioid addiction include attempts to attenuate brain reward deficits during periods of abstinence. Pharmacological blockade of the κ-opioid receptor (KOR) has been shown to abolish brain reward deficits in rodents during withdrawal, as well as to reduce the escalation of opioid use in rats with extended access to opioids. Although KOR antagonists represent promising candidates for the treatment of opioid addiction, very few potent selective KOR antagonists are known to date and most of them exhibit significant safety concerns. Here, we used a generative deep-learning framework for the de novo design of chemotypes with putative KOR antagonistic activity. Molecules generated by models trained with this framework were prioritized for chemical synthesis based on their predicted optimal interactions with the receptor. Our models and proposed training protocol were experimentally validated by binding and functional assays.
Collapse
Affiliation(s)
- Leslie Salas-Estrada
- Department
of Pharmacological Sciences, Icahn School
of Medicine at Mount Sinai, New York, New York 10029, United States
| | - Davide Provasi
- Department
of Pharmacological Sciences, Icahn School
of Medicine at Mount Sinai, New York, New York 10029, United States
| | - Xing Qiu
- Department
of Pharmacological Sciences, Icahn School
of Medicine at Mount Sinai, New York, New York 10029, United States
| | - Husnu Ümit Kaniskan
- Department
of Pharmacological Sciences, Icahn School
of Medicine at Mount Sinai, New York, New York 10029, United States
| | - Xi-Ping Huang
- National
Institute of Mental Health, Psychoactive Drug Screening Program, Department
of Pharmacology, University of North Carolina
School of Medicine, Chapel Hill, North Carolina 27599, United States
| | - Jeffrey F. DiBerto
- National
Institute of Mental Health, Psychoactive Drug Screening Program, Department
of Pharmacology, University of North Carolina
School of Medicine, Chapel Hill, North Carolina 27599, United States
| | - João Marcelo Lamim Ribeiro
- Department
of Pharmacological Sciences, Icahn School
of Medicine at Mount Sinai, New York, New York 10029, United States
| | - Jian Jin
- Department
of Pharmacological Sciences, Icahn School
of Medicine at Mount Sinai, New York, New York 10029, United States
- Mount
Sinai Center for Therapeutics Discovery, Departments of Oncological
Sciences and Neuroscience, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, New York 10029, United States
| | - Bryan L. Roth
- National
Institute of Mental Health, Psychoactive Drug Screening Program, Department
of Pharmacology, University of North Carolina
School of Medicine, Chapel Hill, North Carolina 27599, United States
- Division
of Chemical Biology and Medicinal Chemistry, University of North Carolina at Chapel Hill Eshelman School of Pharmacy, Chapel Hill, North Carolina 27599, United States
| | - Marta Filizola
- Department
of Pharmacological Sciences, Icahn School
of Medicine at Mount Sinai, New York, New York 10029, United States
| |
Collapse
|
39
|
Dou B, Zhu Z, Merkurjev E, Ke L, Chen L, Jiang J, Zhu Y, Liu J, Zhang B, Wei GW. Machine Learning Methods for Small Data Challenges in Molecular Science. Chem Rev 2023; 123:8736-8780. [PMID: 37384816 PMCID: PMC10999174 DOI: 10.1021/acs.chemrev.3c00189] [Citation(s) in RCA: 73] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, and technical limitations in data acquisition. However, big data have been the focus for the past decade, small data and their challenges have received little attention, even though they are technically more severe in machine learning (ML) and deep learning (DL) studies. Overall, the small data challenge is often compounded by issues, such as data diversity, imputation, noise, imbalance, and high-dimensionality. Fortunately, the current big data era is characterized by technological breakthroughs in ML, DL, and artificial intelligence (AI), which enable data-driven scientific discovery, and many advanced ML and DL technologies developed for big data have inadvertently provided solutions for small data problems. As a result, significant progress has been made in ML and DL for small data challenges in the past decade. In this review, we summarize and analyze several emerging potential solutions to small data challenges in molecular science, including chemical and biological sciences. We review both basic machine learning algorithms, such as linear regression, logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), kernel learning (KL), random forest (RF), and gradient boosting trees (GBT), and more advanced techniques, including artificial neural network (ANN), convolutional neural network (CNN), U-Net, graph neural network (GNN), Generative Adversarial Network (GAN), long short-term memory (LSTM), autoencoder, transformer, transfer learning, active learning, graph-based semi-supervised learning, combining deep learning with traditional machine learning, and physical model-based data augmentation. We also briefly discuss the latest advances in these methods. Finally, we conclude the survey with a discussion of promising trends in small data challenges in molecular science.
Collapse
Affiliation(s)
- Bozheng Dou
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Zailiang Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Ekaterina Merkurjev
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Lu Ke
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Long Chen
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jian Jiang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yueying Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jie Liu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Bengong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
40
|
Cascella M, Scarpati G, Bignami EG, Cuomo A, Vittori A, Di Gennaro P, Crispo A, Coluccia S. Utilizing an artificial intelligence framework (conditional generative adversarial network) to enhance telemedicine strategies for cancer pain management. JOURNAL OF ANESTHESIA, ANALGESIA AND CRITICAL CARE 2023; 3:19. [PMID: 37386680 PMCID: PMC10280947 DOI: 10.1186/s44158-023-00104-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 06/13/2023] [Indexed: 07/01/2023]
Abstract
BACKGROUND The utilization of artificial intelligence (AI) in healthcare has significant potential to revolutionize the delivery of medical services, particularly in the field of telemedicine. In this article, we investigate the capabilities of a specific deep learning model, a generative adversarial network (GAN), and explore its potential for enhancing the telemedicine approach to cancer pain management. MATERIALS AND METHODS We implemented a structured dataset comprising demographic and clinical variables from 226 patients and 489 telemedicine visits for cancer pain management. The deep learning model, specifically a conditional GAN, was employed to generate synthetic samples that closely resemble real individuals in terms of their characteristics. Subsequently, four machine learning (ML) algorithms were used to assess the variables associated with a higher number of remote visits. RESULTS The generated dataset exhibits a distribution comparable to the reference dataset for all considered variables, including age, number of visits, tumor type, performance status, characteristics of metastasis, opioid dosage, and type of pain. Among the algorithms tested, random forest demonstrated the highest performance in predicting a higher number of remote visits, achieving an accuracy of 0.8 on the test data. The simulations based on ML indicated that individuals who are younger than 45 years old, and those experiencing breakthrough cancer pain, may require an increased number of telemedicine-based clinical evaluations. CONCLUSION As the advancement of healthcare processes relies on scientific evidence, AI techniques such as GANs can play a vital role in bridging knowledge gaps and accelerating the integration of telemedicine into clinical practice. Nonetheless, it is crucial to carefully address the limitations of these approaches.
Collapse
Affiliation(s)
- Marco Cascella
- Department of Anesthesia and Critical Care, Istituto Nazionale Tumori-IRCCS, Fondazione Pascale, 80100, Naples, Italy.
| | - Giuliana Scarpati
- Department of Medicine, Surgery and Dentistry "Scuola Medica Salernitana, " University of Salerno, 84084, Baronissi, SA, Italy
| | - Elena Giovanna Bignami
- Critical Care and Pain Medicine Division, Department of Medicine and Surgery, University of Parma, Viale Gramsci 14, 43126, Parma, Italy
| | - Arturo Cuomo
- Department of Anesthesia and Critical Care, Istituto Nazionale Tumori-IRCCS, Fondazione Pascale, 80100, Naples, Italy
| | - Alessandro Vittori
- Department of Anesthesia and Critical Care, ARCO Roma, Ospedale Pediatrico Bambino Gesù IRCCS, Rome, Italy
| | - Piergiacomo Di Gennaro
- Epidemiology and Biostatistics Unit, Istituto Nazionale Tumori-IRCCS, Fondazione Pascale, 80100, Naples, Italy
| | - Anna Crispo
- Epidemiology and Biostatistics Unit, Istituto Nazionale Tumori-IRCCS, Fondazione Pascale, 80100, Naples, Italy
| | - Sergio Coluccia
- Epidemiology and Biostatistics Unit, Istituto Nazionale Tumori-IRCCS, Fondazione Pascale, 80100, Naples, Italy
| |
Collapse
|
41
|
Yoshimori A, Bajorath J. Motif2Mol: Prediction of New Active Compounds Based on Sequence Motifs of Ligand Binding Sites in Proteins Using a Biochemical Language Model. Biomolecules 2023; 13:biom13050833. [PMID: 37238703 DOI: 10.3390/biom13050833] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Revised: 05/05/2023] [Accepted: 05/12/2023] [Indexed: 05/28/2023] Open
Abstract
In drug design, the prediction of new active compounds from protein sequence data has only been attempted in a few studies thus far. This prediction task is principally challenging because global protein sequence similarity has strong evolutional and structural implications, but is often only vaguely related to ligand binding. Deep language models adapted from natural language processing offer new opportunities to attempt such predictions via machine translation by directly relating amino acid sequences and chemical structures to each based on textual molecular representations. Herein, we introduce a biochemical language model with transformer architecture for the prediction of new active compounds from sequence motifs of ligand binding sites. In a proof-of-concept application on inhibitors of more than 200 human kinases, the Motif2Mol model revealed promising learning characteristics and an unprecedented ability to consistently reproduce known inhibitors of different kinases.
Collapse
Affiliation(s)
- Atsushi Yoshimori
- Institute for Theoretical Medicine, Inc., 26-1 Muraoka-Higashi 2-Chome, Fujisawa 251-0012, Japan
| | - Jürgen Bajorath
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, 53115 Bonn, Germany
| |
Collapse
|
42
|
Chen L, Shen Q, Lou J. Magicmol: a light-weighted pipeline for drug-like molecule evolution and quick chemical space exploration. BMC Bioinformatics 2023; 24:173. [PMID: 37101113 PMCID: PMC10132416 DOI: 10.1186/s12859-023-05286-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Accepted: 04/13/2023] [Indexed: 04/28/2023] Open
Abstract
The flourishment of machine learning and deep learning methods has boosted the development of cheminformatics, especially regarding the application of drug discovery and new material exploration. Lower time and space expenses make it possible for scientists to search the enormous chemical space. Recently, some work combined reinforcement learning strategies with recurrent neural network (RNN)-based models to optimize the property of generated small molecules, which notably improved a batch of critical factors for these candidates. However, a common problem among these RNN-based methods is that several generated molecules have difficulty in synthesizing despite owning higher desired properties such as binding affinity. However, RNN-based framework better reproduces the molecule distribution among the training set than other categories of models during molecule exploration tasks. Thus, to optimize the whole exploration process and make it contribute to the optimization of specified molecules, we devised a light-weighted pipeline called Magicmol; this pipeline has a re-mastered RNN network and utilize SELFIES presentation instead of SMILES. Our backbone model achieved extraordinary performance while reducing the training cost; moreover, we devised reward truncate strategies to eliminate the model collapse problem. Additionally, adopting SELFIES presentation made it possible to combine STONED-SELFIES as a post-processing procedure for specified molecule optimization and quick chemical space exploration.
Collapse
Affiliation(s)
- Lin Chen
- Yangtze Delta Region (Huzhou) Institute of Intelligent Transportation, Huzhou University, Huzhou, China
- Zhejiang Province Key Laboratory of Smart Management and Application of Modern Agricultural Resources, School of Information Engineering, Huzhou University, Huzhou, China
| | - Qing Shen
- Yangtze Delta Region (Huzhou) Institute of Intelligent Transportation, Huzhou University, Huzhou, China
- School of Electronic Information, Huzhou College, Huzhou, China
| | - Jungang Lou
- Yangtze Delta Region (Huzhou) Institute of Intelligent Transportation, Huzhou University, Huzhou, China.
- Zhejiang Province Key Laboratory of Smart Management and Application of Modern Agricultural Resources, School of Information Engineering, Huzhou University, Huzhou, China.
| |
Collapse
|
43
|
Salas-Estrada L, Provasi D, Qui X, Kaniskan HÜ, Huang XP, DiBerto JF, Ribeiro JML, Jin J, Roth BL, Filizola M. De Novo Design of κ-Opioid Receptor Antagonists Using a Generative Deep Learning Framework. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.25.537995. [PMID: 37162828 PMCID: PMC10168226 DOI: 10.1101/2023.04.25.537995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Likely effective pharmacological interventions for the treatment of opioid addiction include attempts to attenuate brain reward deficits during periods of abstinence. Pharmacological blockade of the κ-opioid receptor (KOR) has been shown to abolish brain reward deficits in rodents during withdrawal, as well as to reduce the escalation of opioid use in rats with extended access to opioids. Although KOR antagonists represent promising candidates for the treatment of opioid addiction, very few potent selective KOR antagonists are known to date and most of them exhibit significant safety concerns. Here, we used a generative deep learning framework for the de novo design of chemotypes with putative KOR antagonistic activity. Molecules generated by models trained with this framework were prioritized for chemical synthesis based on their predicted optimal interactions with the receptor. Our models and proposed training protocol were experimentally validated by binding and functional assays.
Collapse
|
44
|
Yedla P, Babalghith AO, Andra VV, Syed R. PROTACs in the Management of Prostate Cancer. Molecules 2023; 28:molecules28093698. [PMID: 37175108 PMCID: PMC10179857 DOI: 10.3390/molecules28093698] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 04/15/2023] [Accepted: 04/17/2023] [Indexed: 05/15/2023] Open
Abstract
Cancer treatments with targeted therapy have gained immense interest due to their low levels of toxicity and high selectivity. Proteolysis-Targeting Chimeras (PROTACs) have drawn special attention in the development of cancer therapeutics owing to their unique mechanism of action, their ability to target undruggable proteins, and their focused target engagement. PROTACs selectively degrade the target protein through the ubiquitin-proteasome system, which describes a different mode of action compared to conventional small-molecule inhibitors or even antibodies. Among different cancer types, prostate cancer (PC) is the most prevalent non-cutaneous cancer in men. Genetic alterations and the overexpression of several genes, such as FOXA1, AR, PTEN, RB1, TP53, etc., suppress the immune response, resulting in drug resistance to conventional drugs in prostate cancer. Since the progression of ARV-110 (PROTAC for PC) into clinical phases, the focus of research has quickly shifted to protein degraders targeting prostate cancer. The present review highlights an overview of PROTACs in prostate cancer and their superiority over conventional inhibitors. We also delve into the underlying pathophysiology of the disease and explain the structural design and linkerology strategies for PROTAC molecules. Additionally, we touch on the various targets for PROTAC in prostate cancer, including the androgen receptor (AR) and other critical oncoproteins, and discuss the future prospects and challenges in this field.
Collapse
Affiliation(s)
- Poornachandra Yedla
- Department of Pharmacogenomics, Institute of Translational Research, Asian Healthcare Foundation, Asian Institute of Gastroenterology Hospitals, Gachibowli, Hyderabad 500082, India
| | - Ahmed O Babalghith
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, Makkah 21955, Saudi Arabia
| | - Vindhya Vasini Andra
- Department of Medical Oncology, Omega Hospitals, Gachibowli, Hyderabad 500032, India
| | - Riyaz Syed
- Department of Chemiinformatics, Centella Scientific, JHUB, Jawaharlal Nehru Technological University, Hyderabad 500085, India
| |
Collapse
|
45
|
Siemers FM, Bajorath J. Differences in learning characteristics between support vector machine and random forest models for compound classification revealed by Shapley value analysis. Sci Rep 2023; 13:5983. [PMID: 37045972 PMCID: PMC10097675 DOI: 10.1038/s41598-023-33215-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 04/09/2023] [Indexed: 04/14/2023] Open
Abstract
The random forest (RF) and support vector machine (SVM) methods are mainstays in molecular machine learning (ML) and compound property prediction. We have explored in detail how binary classification models derived using these algorithms arrive at their predictions. To these ends, approaches from explainable artificial intelligence (XAI) are applicable such as the Shapley value concept originating from game theory that we adapted and further extended for our analysis. In large-scale activity-based compound classification using models derived from training sets of increasing size, RF and SVM with the Tanimoto kernel produced very similar predictions that could hardly be distinguished. However, Shapley value analysis revealed that their learning characteristics systematically differed and that chemically intuitive explanations of accurate RF and SVM predictions had different origins.
Collapse
Affiliation(s)
- Friederike Maite Siemers
- B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Department of Life Science Informatics and Data Science, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, 53115, Bonn, Germany
| | - Jürgen Bajorath
- B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Department of Life Science Informatics and Data Science, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, 53115, Bonn, Germany.
| |
Collapse
|
46
|
Liu X, Zhang W, Tong X, Zhong F, Li Z, Xiong Z, Xiong J, Wu X, Fu Z, Tan X, Liu Z, Zhang S, Jiang H, Li X, Zheng M. MolFilterGAN: a progressively augmented generative adversarial network for triaging AI-designed molecules. J Cheminform 2023; 15:42. [PMID: 37031191 PMCID: PMC10082991 DOI: 10.1186/s13321-023-00711-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Accepted: 03/14/2023] [Indexed: 04/10/2023] Open
Abstract
Artificial intelligence (AI)-based molecular design methods, especially deep generative models for generating novel molecule structures, have gratified our imagination to explore unknown chemical space without relying on brute-force exploration. However, whether designed by AI or human experts, the molecules need to be accessibly synthesized and biologically evaluated, and the trial-and-error process remains a resources-intensive endeavor. Therefore, AI-based drug design methods face a major challenge of how to prioritize the molecular structures with potential for subsequent drug development. This study indicates that common filtering approaches based on traditional screening metrics fail to differentiate AI-designed molecules. To address this issue, we propose a novel molecular filtering method, MolFilterGAN, based on a progressively augmented generative adversarial network. Comparative analysis shows that MolFilterGAN outperforms conventional screening approaches based on drug-likeness or synthetic ability metrics. Retrospective analysis of AI-designed discoidin domain receptor 1 (DDR1) inhibitors shows that MolFilterGAN significantly increases the efficiency of molecular triaging. Further evaluation of MolFilterGAN on eight external ligand sets suggests that MolFilterGAN is useful in triaging or enriching bioactive compounds across a wide range of target types. These results highlighted the importance of MolFilterGAN in evaluating molecules integrally and further accelerating molecular discovery especially combined with advanced AI generative models.
Collapse
Affiliation(s)
- Xiaohong Liu
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
- AlphaMa Inc., No. 108, Yuxin Road, Suzhou Industrial Park, Suzhou, 215128, China
| | - Wei Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Xiaochu Tong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Feisheng Zhong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Zhaojun Li
- AlphaMa Inc., No. 108, Yuxin Road, Suzhou Industrial Park, Suzhou, 215128, China
| | - Zhaoping Xiong
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Jiacheng Xiong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Xiaolong Wu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| | - Zunyun Fu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Xiaoqin Tan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
- ByteDance AI Lab, No. 1999 Yishan Road, Shanghai, 201103, China
| | - Zhiguo Liu
- AlphaMa Inc., No. 108, Yuxin Road, Suzhou Industrial Park, Suzhou, 215128, China
| | - Sulin Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Hualiang Jiang
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, 310024, Hangzhou, China.
| |
Collapse
|
47
|
Chen Y, Wang Z, Wang L, Wang J, Li P, Cao D, Zeng X, Ye X, Sakurai T. Deep generative model for drug design from protein target sequence. J Cheminform 2023; 15:38. [PMID: 36978179 PMCID: PMC10052801 DOI: 10.1186/s13321-023-00702-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Accepted: 02/18/2023] [Indexed: 03/30/2023] Open
Abstract
Drug discovery for a protein target is a laborious and costly process. Deep learning (DL) methods have been applied to drug discovery and successfully generated novel molecular structures, and they can substantially reduce development time and costs. However, most of them rely on prior knowledge, either by drawing on the structure and properties of known molecules to generate similar candidate molecules or extracting information on the binding sites of protein pockets to obtain molecules that can bind to them. In this paper, DeepTarget, an end-to-end DL model, was proposed to generate novel molecules solely relying on the amino acid sequence of the target protein to reduce the heavy reliance on prior knowledge. DeepTarget includes three modules: Amino Acid Sequence Embedding (AASE), Structural Feature Inference (SFI), and Molecule Generation (MG). AASE generates embeddings from the amino acid sequence of the target protein. SFI inferences the potential structural features of the synthesized molecule, and MG seeks to construct the eventual molecule. The validity of the generated molecules was demonstrated by a benchmark platform of molecular generation models. The interaction between the generated molecules and the target proteins was also verified on the basis of two metrics, drug-target affinity and molecular docking. The results of the experiments indicated the efficacy of the model for direct molecule generation solely conditioned on amino acid sequence.
Collapse
Affiliation(s)
- Yangyang Chen
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan.
| | - Zixu Wang
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Lei Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, China
| | - Jianmin Wang
- The Interdisciplinary Graduate Program in Integrative Biotechnology and Translational Medicine, Yonsei University, Incheon, 21983, Republic of Korea
- Bioinformatics and Molecular Design Research Center (BMDRC), Incheon, 21983, Republic of Korea
| | - Pengyong Li
- School of Computer Science and Technology, Xidian University, Xian, 710071, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, China.
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, People's Republic of China.
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan.
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| |
Collapse
|
48
|
Charles S, Edgar MP, Mahapatra RK. Artificial intelligence based virtual screening study for competitive and allosteric inhibitors of the SARS-CoV-2 main protease. J Biomol Struct Dyn 2023; 41:15286-15304. [PMID: 36943715 DOI: 10.1080/07391102.2023.2188419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 02/27/2023] [Indexed: 03/23/2023]
Abstract
SARS-CoV-2 is a highly contagious and dangerous coronavirus that first appeared in late 2019 causing COVID-19, a pandemic of acute respiratory illnesses that is still a threat to health and the general public safety. We performed deep docking studies of 800 M unique compounds in both the active and allosteric sites of the SARS-COV-2 Main Protease (Mpro) and the 15 M and 13 M virtual hits obtained were further taken for conventional docking and molecular dynamic (MD) studies. The best XP Glide docking scores obtained were -14.242 and -12.059 kcal/mol by CHEMBL591669 and the highest binding affinities were -10.5 kcal/mol (from 444215) and -11.2 kcal/mol (from NPC95421) for active and allosteric sites, respectively. Some hits can bind both sites making them a great area of concern. Re-docking of 8 random allosteric complexes in the active site shows a significant reduction in docking scores with a t-test P value of 2.532 × 10-11 at 95% confidence. Some specific interactions have higher elevations in docking scores. MD studies on 15 complexes show that single-ligand systems are stable as compared to double-ligand systems, and the allosteric binders identified are shown to modulate the active site binding as evidenced by the changes in the interaction patterns and stability of ligands and active site residues. When an allosteric complex was docked to the second monomer to check for homodimer formation, the validated homodimer could not be re-established, further supporting the potential of the identified allosteric binders. These findings could be important in developing novel therapeutics against SARS-CoV-2.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Ssemuyiga Charles
- School of Biotechnology, KIIT Deemed to be University, Bhubaneswar, Odisha, India
- Department of Microbiology, Biotechnology and Plant Sciences, School of Biological Sciences, Makerere University, Kampala, Uganda
| | - Mulumba Pius Edgar
- School of Biotechnology, KIIT Deemed to be University, Bhubaneswar, Odisha, India
| | | |
Collapse
|
49
|
Tysinger EP, Rai BK, Sinitskiy AV. Can We Quickly Learn to "Translate" Bioactive Molecules with Transformer Models? J Chem Inf Model 2023; 63:1734-1744. [PMID: 36914216 DOI: 10.1021/acs.jcim.2c01618] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/16/2023]
Abstract
Meaningful exploration of the chemical space of druglike molecules in drug design is a highly challenging task due to a combinatorial explosion of possible modifications of molecules. In this work, we address this problem with transformer models, a type of machine learning (ML) model originally developed for machine translation. By training transformer models on pairs of similar bioactive molecules from the public ChEMBL data set, we enable them to learn medicinal-chemistry-meaningful, context-dependent transformations of molecules, including those absent from the training set. By retrospective analysis on the performance of transformer models on ChEMBL subsets of ligands binding to COX2, DRD2, or HERG protein targets, we demonstrate that the models can generate structures identical or highly similar to most active ligands, despite the models having not seen any ligands active against the corresponding protein target during training. Our work demonstrates that human experts working on hit expansion in drug design can easily and quickly employ transformer models, originally developed to translate texts from one natural language to another, to "translate" from known molecules active against a given protein target to novel molecules active against the same target.
Collapse
Affiliation(s)
- Emma P Tysinger
- Machine Learning and Computational Sciences, Pfizer Worldwide Research, Development, and Medical, 610 Main Street, Cambridge, Massachusetts 02139, United States
| | - Brajesh K Rai
- Machine Learning and Computational Sciences, Pfizer Worldwide Research, Development, and Medical, 610 Main Street, Cambridge, Massachusetts 02139, United States
| | - Anton V Sinitskiy
- Machine Learning and Computational Sciences, Pfizer Worldwide Research, Development, and Medical, 610 Main Street, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
50
|
Yu Y, Huang J, He H, Han J, Ye G, Xu T, Sun X, Chen X, Ren X, Li C, Li H, Huang W, Liu Y, Wang X, Gao Y, Cheng N, Guo N, Chen X, Feng J, Hua Y, Liu C, Zhu G, Xie Z, Yao L, Zhong W, Chen X, Liu W, Li H. Accelerated Discovery of Macrocyclic CDK2 Inhibitor QR-6401 by Generative Models and Structure-Based Drug Design. ACS Med Chem Lett 2023; 14:297-304. [PMID: 36923916 PMCID: PMC10009793 DOI: 10.1021/acsmedchemlett.2c00515] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 01/19/2023] [Indexed: 02/11/2023] Open
Abstract
Selective CDK2 inhibitors have the potential to provide effective therapeutics for CDK2-dependent cancers and for combating drug resistance due to high cyclin E1 (CCNE1) expression intrinsically or CCNE1 amplification induced by treatment of CDK4/6 inhibitors. Generative models that take advantage of deep learning are being increasingly integrated into early drug discovery for hit identification and lead optimization. Here we report the discovery of a highly potent and selective macrocyclic CDK2 inhibitor QR-6401 (23) accelerated by the application of generative models and structure-based drug design (SBDD). QR-6401 (23) demonstrated robust antitumor efficacy in an OVCAR3 ovarian cancer xenograft model via oral administration.
Collapse
Affiliation(s)
- Yang Yu
- Tencent
AI Lab, Tencent, Shenzhen 518057, China
| | | | - Hu He
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Jing Han
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Geyan Ye
- Tencent
AI Lab, Tencent, Shenzhen 518057, China
| | - Tingyang Xu
- Tencent
AI Lab, Tencent, Shenzhen 518057, China
| | | | - Xiumei Chen
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Xiaoming Ren
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Chunlai Li
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Huijuan Li
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Wei Huang
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Yangyang Liu
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Xinjuan Wang
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Yongzhi Gao
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Nianhe Cheng
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Na Guo
- BioDuro-Sundia, Shanghai, 200131, China
| | - Xibo Chen
- BioDuro-Sundia, Shanghai, 200131, China
| | | | - Yuxia Hua
- BioDuro-Sundia, Beijing, 102200, China
| | - Chong Liu
- BioDuro-Sundia, Beijing, 102200, China
| | - Guoyun Zhu
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Zhi Xie
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Lili Yao
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Wenge Zhong
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Xinde Chen
- Tencent
AI Lab, Tencent, Shenzhen 518057, China
| | - Wei Liu
- Tencent
AI Lab, Tencent, Shenzhen 518057, China
| | - Hailong Li
- Regor
Therapeutics Group, Shanghai, 201210, China
| |
Collapse
|