1
|
Xia X, Zhang Y, Zeng X, Zhang X, Zheng C, Su Y. Artificial Intelligence in Molecular Optimization: Current Paradigms and Future Frontiers. Int J Mol Sci 2025; 26:4878. [PMID: 40430017 PMCID: PMC12112088 DOI: 10.3390/ijms26104878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2025] [Revised: 05/07/2025] [Accepted: 05/14/2025] [Indexed: 05/29/2025] Open
Abstract
Molecular optimization plays a pivotal role in many domains since it holds promise for improving the properties of lead molecules. The advent of artificial intelligence (AI)-driven molecular optimization has revolutionized lead optimization workflows, which have significantly accelerated the development of drug candidates. However, AI models are also confronted with new challenges in practical molecular optimization, such as high-dimensional chemical space and data sparsity issues. This paper initially highlights the inherent benefits of molecular optimization in terms of optimizing the properties and maintaining the structural similarity of lead molecules, thereby highlighting its critical role in drug discovery. The next section systematically categorizes and analyzes existing AI-aided molecular optimization methods, comprising iterative search in discrete chemical space, end-to-end generation in continuous latent space, and iterative search in continuous latent space methods. Finally, we discuss the key challenges in AI-aided molecular optimization methods, including molecular representations, dataset selection, the properties to be optimized, and optimization algorithms, while proposing potential solutions and future research directions. In summary, this review provides a comprehensive analysis of existing representative AI-aided molecular optimization methods, thereby offering guidance for future research directions.
Collapse
Affiliation(s)
- Xin Xia
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China;
| | - Yajie Zhang
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei 230601, China; (Y.Z.); (X.Z.); (C.Z.)
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Lushan Road, Changsha 410012, China;
| | - Xingyi Zhang
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei 230601, China; (Y.Z.); (X.Z.); (C.Z.)
| | - Chunhou Zheng
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei 230601, China; (Y.Z.); (X.Z.); (C.Z.)
| | - Yansen Su
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China;
| |
Collapse
|
2
|
Lv W, Jia X, Tang B, Ma C, Fang X, Jin X, Niu Z, Han X. In silico modeling of targeted protein degradation. Eur J Med Chem 2025; 289:117432. [PMID: 40015161 DOI: 10.1016/j.ejmech.2025.117432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2024] [Revised: 02/18/2025] [Accepted: 02/19/2025] [Indexed: 03/01/2025]
Abstract
Targeted protein degradation (TPD) techniques, particularly proteolysis-targeting chimeras (PROTAC) and molecular glue degraders (MGD), have offered novel strategies in drug discovery. With rapid advancement of computer-aided drug design (CADD) and artificial intelligence-driven drug discovery (AIDD) in the biomedical field, a major focus has become how to effectively integrate these technologies into the TPD drug discovery pipeline to accelerate development, shorten timelines, and reduce costs. Currently, the main research directions for applying CADD and AIDD in TPD include: 1) ternary complex modeling; 2) linker generation; 3) strategies to predict degrader targets, activities and ADME/T properties; 4) In silico degrader design and discovery. Models developed in these areas play a crucial role in target identification, drug design, and optimization at various stages of the discovery process. However, the limited size and quality of datasets related to TPD present challenges, leaving room for further improvement in these models. TPD involves the complex ubiquitin-proteasome system, with numerous factors influencing outcomes. Most current models adopt a static perspective to interpret and predict relevant tasks. In the future, it may be necessary to shift toward dynamic approaches that better capture the intricate relationships among these components. Furthermore, incorporating new and diverse chemical spaces will enhance the precision design and application of TPD agents.
Collapse
Affiliation(s)
- Wenxing Lv
- Cancer Institute (Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education) of the Second Affiliated Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, 310029, China; Hangzhou Institute of Advanced Technology, Hangzhou, 310000, China.
| | - Xiaojuan Jia
- Cancer Institute (Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education) of the Second Affiliated Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, 310029, China.
| | - Bowen Tang
- College of Life Sciences, Zhejiang University, Hangzhou, 310058, China; Guangzhou New Block Technology Co., Ltd., Guangzhou, 510000, China.
| | - Chao Ma
- Guangzhou New Block Technology Co., Ltd., Guangzhou, 510000, China.
| | - Xiaopeng Fang
- Hangzhou Institute of Advanced Technology, Hangzhou, 310000, China.
| | - Xurui Jin
- MindRank AI, Hangzhou, 310000, China.
| | - Zhangming Niu
- MindRank AI, Hangzhou, 310000, China; National Heart and Lung Institute, Imperial College London, London, SW7 2AZ, UK.
| | - Xin Han
- Cancer Institute (Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education) of the Second Affiliated Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, 310029, China; State Key Laboratory for Chemistry and Molecular Engineering of Medicinal Resources (Guangxi Normal University), Guilin, 541004, China.
| |
Collapse
|
3
|
Shahmohammadi A, Dalvand S, Molaei A, Mousavi-Khoshdel SM, Yazdanfar N, Hasanzadeh M. Transition metal phosphide/ molybdenum disulfide heterostructures towards advanced electrochemical energy storage: recent progress and challenges. RSC Adv 2025; 15:13397-13430. [PMID: 40297000 PMCID: PMC12035537 DOI: 10.1039/d5ra01184a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2025] [Accepted: 04/08/2025] [Indexed: 04/30/2025] Open
Abstract
Transition metal phosphide @ molybdenum disulfide (TMP@MoS2) heterostructures, consisting of TMP as the core main catalytic body and MoS2 as the outer shell, can solve the three major problems in the field of renewable energy storage and catalysis, such as lack of resources, cost factors, and low cycling stability. The heterostructures synergistically combine the excellent conductivity and electrochemical performance of transition metal phosphides with the structural robustness and catalytic activity of molybdenum disulfide, which holds great promise for clean energy. This review addresses the advantages of TMP@MoS2 materials and their synthesis methods-e.g., hydrothermal routes and chemical vapor deposition regarding scalability and cost. Their electrochemical energy storage and catalytic functions e.g., hydrogen and oxygen evolution reactions (HER and OER) are also extensively explored. Their potential within battery and supercapacitor technologies is also assessed against leading performance metrics. Challenges toward industry-scale scalability, longevity, and environmental sustainability are also addressed, as are optimization and large-scale deployment strategies.
Collapse
Affiliation(s)
- Ali Shahmohammadi
- Faculty of Chemistry, Kharazmi University 43 South Mofatteh Avenue Tehran Iran
| | - Samad Dalvand
- Iranian Research & Development Center for Chemical Industries (IRDCI), Academic Center for Education, Culture and Research (ACECR) Karaj Iran
| | - Amirhossein Molaei
- Faculty of Petroleum and Natural Gas Engineering, Sahand University of Technology Tabriz Iran
| | | | - Najmeh Yazdanfar
- Iranian Research & Development Center for Chemical Industries (IRDCI), Academic Center for Education, Culture and Research (ACECR) Karaj Iran
| | - Mohammad Hasanzadeh
- Pharmaceutical Analysis Research Center, Tabriz University of Medical Sciences Tabriz Iran
| |
Collapse
|
4
|
Zhang H, Wang S, Li N, Xu Y, Huang Z, Zhang Y, Li J, Zuo Y, Li M, Li R, Yang B. Druggability Studies of Benzene Sulfonamide Substituted Diarylamide (E3) as a Novel Diuretic. Biomedicines 2025; 13:992. [PMID: 40299675 PMCID: PMC12024912 DOI: 10.3390/biomedicines13040992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2025] [Revised: 04/13/2025] [Accepted: 04/16/2025] [Indexed: 05/01/2025] Open
Abstract
Background/Objectives: Urea transporters (UTs) play an important role in the urine-concentrating mechanism and have been regarded as a novel drug target for developing salt-sparing diuretics. Our previous studies found that diarylamides 1H and 25a are specific UT inhibitors and have oral diuretic activity. However, these compounds necessitate further optimization and comprehensive druggability studies. Methods: The optimal compound was identified through structural optimization. Experiments were conducted to investigate its UT inhibitory activity and evaluate its diuretic effect. Furthermore, disease models were utilized to assess the compound's efficacy in treating hyponatremia. Pharmacokinetic studies were performed to examine its metabolic stability, and toxicity tests were conducted to evaluate its safety. Results: Based on the chemical structure of compound 25a, we synthesized a novel diarylamide compound, E3, by introducing a benzenesulfonamide group into its side chain. E3 exhibited dose-dependent inhibition of UT at the nanomolar level and demonstrated oral diuretic activity without causing electrolyte excretion disorders in both mice and rats. Experiments on UT-B-/- and UT-A1-/- mice indicated that E3 enhances the diuretic effect primarily by inhibiting UT-A1 more effectively than UT-B. Furthermore, E3 displayed good metabolic stability and favorable pharmacokinetic characteristics. E3 significantly ameliorated hyponatremia through diuresis in a rat model. Importantly, E3 did not induce acute oral toxicity, subacute oral toxicity, genotoxicity, or cardiotoxicity. Conclusions: Our study confirms that E3 exerts a diuretic effect by specifically inhibiting UTs and has good druggability, which offers potential for E3 to be developed into a new diuretic for the treatment of hyponatremia.
Collapse
Affiliation(s)
- Hang Zhang
- Department of Pharmacology, School of Basic Medical Sciences, Peking University, Beijing 100191, China; (H.Z.); (S.W.); (N.L.); (Z.H.); (M.L.)
| | - Shuyuan Wang
- Department of Pharmacology, School of Basic Medical Sciences, Peking University, Beijing 100191, China; (H.Z.); (S.W.); (N.L.); (Z.H.); (M.L.)
| | - Nannan Li
- Department of Pharmacology, School of Basic Medical Sciences, Peking University, Beijing 100191, China; (H.Z.); (S.W.); (N.L.); (Z.H.); (M.L.)
| | - Yue Xu
- Division of Pharmaceutics and Pharmacology, College of Pharmacy, The Ohio State University, Columbus, OH 43210, USA;
| | - Zhizhen Huang
- Department of Pharmacology, School of Basic Medical Sciences, Peking University, Beijing 100191, China; (H.Z.); (S.W.); (N.L.); (Z.H.); (M.L.)
| | - Yukun Zhang
- Chongqing Key Laboratory of Development and Utilization of Genuine Medicinal Materials in Three Gorges Reservoir Area, Chongqing 404120, China;
| | - Jing Li
- The State Key Laboratory of Anti-Infective Drug Development, Sunshine Lake Pharma Co., Ltd., Dongguan 523871, China; (J.L.); (Y.Z.)
| | - Yinglin Zuo
- The State Key Laboratory of Anti-Infective Drug Development, Sunshine Lake Pharma Co., Ltd., Dongguan 523871, China; (J.L.); (Y.Z.)
| | - Min Li
- Department of Pharmacology, School of Basic Medical Sciences, Peking University, Beijing 100191, China; (H.Z.); (S.W.); (N.L.); (Z.H.); (M.L.)
| | - Runtao Li
- School of Pharmaceutical Sciences, Peking University, Beijing 100191, China;
| | - Baoxue Yang
- Department of Pharmacology, School of Basic Medical Sciences, Peking University, Beijing 100191, China; (H.Z.); (S.W.); (N.L.); (Z.H.); (M.L.)
| |
Collapse
|
5
|
Chen A, Peng X, Shen T, Zheng L, Wu D, Wang S. Discovery, design, and engineering of enzymes based on molecular retrobiosynthesis. MLIFE 2025; 4:107-125. [PMID: 40313979 PMCID: PMC12042125 DOI: 10.1002/mlf2.70009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2024] [Revised: 02/06/2025] [Accepted: 02/13/2025] [Indexed: 05/03/2025]
Abstract
Biosynthesis-a process utilizing biological systems to synthesize chemical compounds-has emerged as a revolutionary solution to 21st-century challenges due to its environmental sustainability, scalability, and high stereoselectivity and regioselectivity. Recent advancements in artificial intelligence (AI) are accelerating biosynthesis by enabling intelligent design, construction, and optimization of enzymatic reactions and biological systems. We first introduce the molecular retrosynthesis route planning in biochemical pathway design, including single-step retrosynthesis algorithms and AI-based chemical retrosynthesis route design tools. We highlight the advantages and challenges of large language models in addressing the sparsity of chemical data. Furthermore, we review enzyme discovery methods based on sequence and structure alignment techniques. Breakthroughs in AI-based structural prediction methods are expected to significantly improve the accuracy of enzyme discovery. We also summarize methods for de novo enzyme generation for nonnatural or orphan reactions, focusing on AI-based enzyme functional annotation and enzyme discovery techniques based on reaction or small molecule similarity. Turning to enzyme engineering, we discuss strategies to improve enzyme thermostability, solubility, and activity, as well as the applications of AI in these fields. The shift from traditional experiment-driven models to data-driven and computationally driven intelligent models is already underway. Finally, we present potential challenges and provide a perspective on future research directions. We envision expanded applications of biocatalysis in drug development, green chemistry, and complex molecule synthesis.
Collapse
Affiliation(s)
- Ancheng Chen
- Shanghai Zelixir Biotech Company Ltd.ShanghaiChina
| | - Xiangda Peng
- Shanghai Zelixir Biotech Company Ltd.ShanghaiChina
| | - Tao Shen
- Shanghai Zelixir Biotech Company Ltd.ShanghaiChina
| | | | - Dong Wu
- Shanghai Zelixir Biotech Company Ltd.ShanghaiChina
| | - Sheng Wang
- Shanghai Zelixir Biotech Company Ltd.ShanghaiChina
| |
Collapse
|
6
|
Cassady H, Martin E, Liu Y, Bhattacharya D, Rochow MF, Dyer BA, Reinhart WF, Cooper VR, Hickner MA. Database of Nonaqueous Proton-Conducting Materials. ACS APPLIED MATERIALS & INTERFACES 2025; 17:16901-16908. [PMID: 40059360 PMCID: PMC11931497 DOI: 10.1021/acsami.4c22618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2024] [Revised: 02/24/2025] [Accepted: 02/24/2025] [Indexed: 03/21/2025]
Abstract
This work presents the assembly of 48 papers, representing 74 different compounds and blends, into a machine-readable database of nonaqueous proton-conducting materials. SMILES was used to encode the chemical structures of the molecules, and we tabulated the reported proton conductivity, proton diffusion coefficient, and material composition for a total of 3152 data points. The data spans a broad range of temperatures ranging from -70 to 260 °C. To explore this landscape of nonaqueous proton conductors, DFT was used to calculate the proton affinity of 18 unique proton carriers. The results were then compared to the activation energy derived from fitting experimental data to the Arrhenius equation. It was found that while the widely recognized positive correlation between the activation energy and proton affinity may hold among closely related molecules, this correlation does not necessarily apply across a broader range of molecules. This work serves as an example of the potential analyses that can be conducted using literature data combined with emerging research tools in computation and data science to address specific materials design problems.
Collapse
Affiliation(s)
- Harrison
J. Cassady
- Department
of Chemical Engineering and Materials Science, Michigan State University, East Lansing, Michigan 48824-1312, United States
- Energy
Technologies Area, Lawrence Berkeley National
Laboratory, Berkeley 94720-8099, California, United States
| | - Emeline Martin
- Department
of Chemical Engineering and Materials Science, Michigan State University, East Lansing, Michigan 48824-1312, United States
- Department
of Chemical Engineering, University of Michigan, Ann Arbor, Michigan 48109-1382, United
States
| | - Yifan Liu
- Materials
Science and Technology Division, Oak Ridge
National Laboratory, Oak Ridge, Tennessee 37831-2008, United States
| | - Debjyoti Bhattacharya
- Materials
Science and Engineering, The Pennsylvania
State University, University
Park, Pennsylvania 16802, United States
| | - Maria F. Rochow
- Department
of Chemical Engineering and Materials Science, Michigan State University, East Lansing, Michigan 48824-1312, United States
| | - Brock A. Dyer
- Department
of Physics and Astronomy, Ursinus College, Collegeville, Pennsylvania 19426, United States
| | - Wesley F. Reinhart
- Materials
Science and Engineering, The Pennsylvania
State University, University
Park, Pennsylvania 16802, United States
- Institute
for Computational and Data Sciences, The
Pennsylvania State University, University Park, Pennsylvania 16802, United States
| | - Valentino R. Cooper
- Materials
Science and Technology Division, Oak Ridge
National Laboratory, Oak Ridge, Tennessee 37831-2008, United States
| | - Michael A. Hickner
- Department
of Chemical Engineering and Materials Science, Michigan State University, East Lansing, Michigan 48824-1312, United States
| |
Collapse
|
7
|
Tavakoli M, Chiu YTT, Carlton AM, Van Vranken D, Baldi P. Chemically Informed Deep Learning for Interpretable Radical Reaction Prediction. J Chem Inf Model 2025; 65:1228-1242. [PMID: 39871741 PMCID: PMC11815866 DOI: 10.1021/acs.jcim.4c01901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Revised: 01/14/2025] [Accepted: 01/15/2025] [Indexed: 01/29/2025]
Abstract
Organic radical reactions are crucial in many areas of chemistry, including synthetic, biological, and atmospheric chemistry. We develop a predictive framework based on the interaction of molecular orbitals that operates on mechanistic-level radical reactions. Given our chemistry-aware model, all predictions are provided with different levels of interpretability. Our models are trained and evaluated using the RMechDB database of radical reaction steps. Our model predicts the correct orbital interaction and products for 96% of the test reactions in RMechDB. By chaining these predictions, we perform a pathway search capable of identifying all intermediates and byproducts of a radical reaction. We test the pathway search on two classes of problems in atmospheric and polymerization chemistry. RMechRP is publicly available online at https://deeprxn.ics.uci.edu/rmechrp/.
Collapse
Affiliation(s)
- Mohammadamin Tavakoli
- Department
of Computer Science, University of California,
Irvine, Irvine, California 92697, United States
| | - Yin Ting T. Chiu
- Department
of Chemistry, University of California,
Irvine, Irvine, California 92697, United States
| | - Ann Marie Carlton
- Department
of Chemistry, University of California,
Irvine, Irvine, California 92697, United States
| | - David Van Vranken
- Department
of Chemistry, University of California,
Irvine, Irvine, California 92697, United States
| | - Pierre Baldi
- Department
of Computer Science, University of California,
Irvine, Irvine, California 92697, United States
| |
Collapse
|
8
|
Ramos MC, Collison CJ, White AD. A review of large language models and autonomous agents in chemistry. Chem Sci 2025; 16:2514-2572. [PMID: 39829984 PMCID: PMC11739813 DOI: 10.1039/d4sc03921a] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2024] [Accepted: 12/03/2024] [Indexed: 01/22/2025] Open
Abstract
Large language models (LLMs) have emerged as powerful tools in chemistry, significantly impacting molecule design, property prediction, and synthesis optimization. This review highlights LLM capabilities in these domains and their potential to accelerate scientific discovery through automation. We also review LLM-based autonomous agents: LLMs with a broader set of tools to interact with their surrounding environment. These agents perform diverse tasks such as paper scraping, interfacing with automated laboratories, and synthesis planning. As agents are an emerging topic, we extend the scope of our review of agents beyond chemistry and discuss across any scientific domains. This review covers the recent history, current capabilities, and design of LLMs and autonomous agents, addressing specific challenges, opportunities, and future directions in chemistry. Key challenges include data quality and integration, model interpretability, and the need for standard benchmarks, while future directions point towards more sophisticated multi-modal agents and enhanced collaboration between agents and experimental methods. Due to the quick pace of this field, a repository has been built to keep track of the latest studies: https://github.com/ur-whitelab/LLMs-in-science.
Collapse
Affiliation(s)
- Mayk Caldas Ramos
- FutureHouse Inc. San Francisco CA USA
- Department of Chemical Engineering, University of Rochester Rochester NY USA
| | - Christopher J Collison
- School of Chemistry and Materials Science, Rochester Institute of Technology Rochester NY USA
| | - Andrew D White
- FutureHouse Inc. San Francisco CA USA
- Department of Chemical Engineering, University of Rochester Rochester NY USA
| |
Collapse
|
9
|
Xia M, Zhang Y, Song H, Jia Y, Yang M. Predicting Rate Constants of Hydrogen Abstraction Reactions between OH/HO 2 and Alkanes by Machine Learning Models. J Phys Chem A 2025; 129:309-316. [PMID: 39696780 DOI: 10.1021/acs.jpca.4c07426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2024]
Abstract
The hydrogen abstraction reactions by small radicals from fuel molecules play an important role in the oxidation of fuels. However, experimental measurements and/or theoretical calculations of their rate constants under combustion conditions are very challenging due to their high reactivity. Machine learning offers a promising approach to predicting thermal rate constants. In this work, three machine learning methods, XGB, FNN, and XGB-FNN hybrid algorithms, were employed to train and predict the rate constants of the hydrogen abstraction reactions between alkanes and OH/HO2. Six descriptors were selected according to the Pearson correlation coefficients, the importance of descriptors, and the clustering heat map. It was proven that the XGB-FNN model is the most robust. The constructed XGB-FNN model achieved an average deviation of 89.13% for the alkanes + OH reactions and 190.93% for the alkanes + HO2 reactions on their respective prediction sets. The model was also used to predict the rate constants of the reactions involving larger alkanes, demonstrating its extrapolation capability. Furthermore, the model has the ability to distinguish the reactivity of the reactions with the hydrogen atom abstracted at different sites of alkane.
Collapse
Affiliation(s)
- Min Xia
- College of Physical Science and Technology, Central China Normal University, Wuhan 430079, China
- State Key Laboratory of Magnetic Resonance Spectroscopy and Imaging, National Center for Magnetic Resonance in Wuhan, Wuhan Institute of Physics and Mathematics, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China
| | - Yu Zhang
- State Key Laboratory of Magnetic Resonance Spectroscopy and Imaging, National Center for Magnetic Resonance in Wuhan, Wuhan Institute of Physics and Mathematics, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China
| | - Hongwei Song
- State Key Laboratory of Magnetic Resonance Spectroscopy and Imaging, National Center for Magnetic Resonance in Wuhan, Wuhan Institute of Physics and Mathematics, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China
| | - Ya Jia
- College of Physical Science and Technology, Central China Normal University, Wuhan 430079, China
- School of Life Sciences, Central China Normal University, Wuhan 430079, China
| | - Minghui Yang
- State Key Laboratory of Magnetic Resonance Spectroscopy and Imaging, National Center for Magnetic Resonance in Wuhan, Wuhan Institute of Physics and Mathematics, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China
- Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan 430074, China
| |
Collapse
|
10
|
Zhang X, Gao H, Qi Y, Li Y, Wang R. Generation of Rational Drug-like Molecular Structures Through a Multiple-Objective Reinforcement Learning Framework. Molecules 2024; 30:18. [PMID: 39795076 PMCID: PMC11721775 DOI: 10.3390/molecules30010018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2024] [Revised: 12/04/2024] [Accepted: 12/15/2024] [Indexed: 01/13/2025] Open
Abstract
As an appealing approach for discovering novel leads, the key advantage of de novo drug design lies in its ability to explore a much broader dimension of chemical space, without being confined to the knowledge of existing compounds. So far, many generative models have been described in the literature, which have completely redefined the concept of de novo drug design. However, many of them lack practical value for real-world drug discovery. In this work, we have developed a graph-based generative model within a reinforcement learning framework, namely, METEOR (Molecular Exploration Through multiplE-Objective Reinforcement). The backend agent of METEOR is based on the well-established GCPN model. To ensure the overall quality of the generated molecular graphs, we implemented a set of rules to identify and exclude undesired substructures. Importantly, METEOR is designed to conduct multi-objective optimization, i.e., simultaneously optimizing binding affinity, drug-likeness, and synthetic accessibility of the generated molecules under the guidance of a special reward function. We demonstrate in a specific test case that without prior knowledge of true binders to the chosen target protein, METEOR generated molecules with superior properties compared to those in the ZINC 250k data set. In conclusion, we have demonstrated the potential of METEOR as a practical tool for generating rational drug-like molecules in the early phase of drug discovery.
Collapse
Affiliation(s)
| | | | | | - Yan Li
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, China
| | - Renxiao Wang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, China
| |
Collapse
|
11
|
Ashraf SN, Blackwell JH, Holdgate GA, Lucas SCC, Solovyeva A, Storer RI, Whitehurst BC. Hit me with your best shot: Integrated hit discovery for the next generation of drug targets. Drug Discov Today 2024; 29:104143. [PMID: 39173704 DOI: 10.1016/j.drudis.2024.104143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 08/07/2024] [Accepted: 08/16/2024] [Indexed: 08/24/2024]
Abstract
Identification of high-quality hit chemical matter is of vital importance to the success of drug discovery campaigns. However, this goal is becoming ever harder to achieve as the targets entering the portfolios of pharmaceutical and biotechnology companies are increasingly trending towards novel and traditionally challenging to drug. This demand has fuelled the development and adoption of numerous new screening approaches, whereby the contemporary hit identification toolbox comprises a growing number of orthogonal and complementary technologies including high-throughput screening, fragment-based ligand design, affinity screening (affinity-selection mass spectrometry, differential scanning fluorimetry, DNA-encoded library screening), as well as increasingly sophisticated computational predictive approaches. Herein we describe how an integrated strategy for hit discovery, whereby multiple hit identification techniques are tactically applied, selected in the context of target suitability and resource priority, represents an optimal and often essential approach to maximise the likelihood of identifying quality starting points from which to develop the next generation of medicines.
Collapse
Affiliation(s)
- S Neha Ashraf
- Hit Discovery, Discovery Science, AstraZeneca R&D, Cambridge CB2 0AA, UK
| | - J Henry Blackwell
- Hit Discovery, Discovery Science, AstraZeneca R&D, Cambridge CB2 0AA, UK
| | | | - Simon C C Lucas
- Hit Discovery, Discovery Science, AstraZeneca R&D, Cambridge CB2 0AA, UK
| | - Alisa Solovyeva
- Hit Discovery, Discovery Science, AstraZeneca R&D, Gothenburg SE-431 83, Sweden
| | - R Ian Storer
- Hit Discovery, Discovery Science, AstraZeneca R&D, Cambridge CB2 0AA, UK.
| | | |
Collapse
|
12
|
König C, Vellido A. Understanding predictions of drug profiles using explainable machine learning models. BioData Min 2024; 17:25. [PMID: 39090651 PMCID: PMC11293102 DOI: 10.1186/s13040-024-00378-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Accepted: 07/26/2024] [Indexed: 08/04/2024] Open
Abstract
PURPOSE The analysis of absorption, distribution, metabolism, and excretion (ADME) molecular properties is of relevance to drug design, as they directly influence the drug's effectiveness at its target location. This study concerns their prediction, using explainable Machine Learning (ML) models. The aim of the study is to find which molecular features are relevant to the prediction of the different ADME properties and measure their impact on the predictive model. METHODS The relative relevance of individual features for ADME activity is gauged by estimating feature importance in ML models' predictions. Feature importance is calculated using feature permutation and the individual impact of features is measured by SHAP additive explanations. RESULTS The study reveals the relevance of specific molecular descriptors for each ADME property and quantifies their impact on the ADME property prediction. CONCLUSION The reported research illustrates how explainable ML models can provide detailed insights about the individual contributions of molecular features to the final prediction of an ADME property, as an effort to support experts in the process of drug candidate selection through a better understanding of the impact of molecular features.
Collapse
Affiliation(s)
- Caroline König
- Intelligent Data Science and Artificial Intelligence (IDEAI-UPC) Research Centre, Universitat Politècnica de Catalunya (UPC Barcelona Tech), Jordi Girona 1-3, Barcelona, 08034, Catalonia, Spain.
- Department of Computer Science, Universitat Politècnica de Catalunya (UPC Barcelona Tech), Jordi Girona 1-3, Barcelona, 08034, Catalonia, Spain.
| | - Alfredo Vellido
- Intelligent Data Science and Artificial Intelligence (IDEAI-UPC) Research Centre, Universitat Politècnica de Catalunya (UPC Barcelona Tech), Jordi Girona 1-3, Barcelona, 08034, Catalonia, Spain
- Department of Computer Science, Universitat Politècnica de Catalunya (UPC Barcelona Tech), Jordi Girona 1-3, Barcelona, 08034, Catalonia, Spain
| |
Collapse
|
13
|
Ahmad S, Raza K. An extensive review on lung cancer therapeutics using machine learning techniques: state-of-the-art and perspectives. J Drug Target 2024; 32:635-646. [PMID: 38662768 DOI: 10.1080/1061186x.2024.2347358] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2024] [Accepted: 04/18/2024] [Indexed: 05/07/2024]
Abstract
There are over 100 types of human cancer, accounting for millions of deaths every year. Lung cancer alone claims over 1.8 million lives per year and is expected to surpass 3.2 million by 2050, which underscores the urgent need for rapid drug development and repurposing initiatives. The application of AI emerges as a pivotal solution to developing anti-cancer therapeutics. This state-of-the-art review aims to explore the various applications of AI in lung cancer therapeutics. Predictive models can analyse large datasets, including clinical data, genetic information, and treatment outcomes, for novel drug design and to generate personalised treatment recommendations, potentially optimising therapeutic strategies, enhancing treatment efficacy, and minimising adverse effects. A thorough literature review study was conducted based on articles indexed in PubMed and Scopus. We compiled the use of various machine learning approaches, including CNN, RNN, GAN, VAEs, and other AI techniques, enhancing efficiency with accuracy exceeding 95%, which is validated through a computer-aided drug design process. AI can revolutionise lung cancer therapeutics, streamlining processes and saving biological scientists' time and effort-however, further research is needed to overcome challenges and fully unlock AI's potential in Lung Cancer Therapeutics.
Collapse
Affiliation(s)
- Shaban Ahmad
- Department of Computer Science, Jamia Millia Islamia, New Delhi, India
| | - Khalid Raza
- Department of Computer Science, Jamia Millia Islamia, New Delhi, India
| |
Collapse
|
14
|
Singh RK, Nayak NP, Behl T, Arora R, Anwer MK, Gulati M, Bungau SG, Brisc MC. Exploring the Intersection of Geophysics and Diagnostic Imaging in the Health Sciences. Diagnostics (Basel) 2024; 14:139. [PMID: 38248016 PMCID: PMC11154438 DOI: 10.3390/diagnostics14020139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 01/03/2024] [Accepted: 01/05/2024] [Indexed: 01/23/2024] Open
Abstract
To develop diagnostic imaging approaches, this paper emphasizes the transformational potential of merging geophysics with health sciences. Diagnostic imaging technology improvements have transformed the health sciences by enabling earlier and more precise disease identification, individualized therapy, and improved patient care. This review article examines the connection between geophysics and diagnostic imaging in the field of health sciences. Geophysics, which is typically used to explore Earth's subsurface, has provided new uses of its methodology in the medical field, providing innovative solutions to pressing medical problems. The article examines the different geophysical techniques like electrical imaging, seismic imaging, and geophysics and their corresponding imaging techniques used in health sciences like tomography, magnetic resonance imaging, ultrasound imaging, etc. The examination includes the description, similarities, differences, and challenges associated with these techniques and how modified geophysical techniques can be used in imaging methods in health sciences. Examining the progression of each method from geophysics to medical imaging and its contributions to illness diagnosis, treatment planning, and monitoring are highlighted. Also, the utilization of geophysical data analysis techniques like signal processing and inversion techniques in image processing in health sciences has been briefly explained, along with different mathematical and computational tools in geophysics and how they can be implemented for image processing in health sciences. The key findings include the development of machine learning and artificial intelligence in geophysics-driven medical imaging, demonstrating the revolutionary effects of data-driven methods on precision, speed, and predictive modeling.
Collapse
Affiliation(s)
- Rahul Kumar Singh
- Energy Cluster, University of Petroleum and Energy Studies, Dehradun 248007, Uttarakhand, India; (R.K.S.); (N.P.N.)
| | - Nirlipta Priyadarshini Nayak
- Energy Cluster, University of Petroleum and Energy Studies, Dehradun 248007, Uttarakhand, India; (R.K.S.); (N.P.N.)
| | - Tapan Behl
- Amity School of Pharmaceutical Sciences, Amity University, Mohali 140306, Punjab, India
| | - Rashmi Arora
- Chitkara College of Pharmacy, Chitkara University, Rajpura 140401, Punjab, India;
| | - Md. Khalid Anwer
- Department of Pharmaceutics, College of Pharmacy, Prince Sattam Bin Abdulaziz University, Alkharj 11942, Saudi Arabia;
| | - Monica Gulati
- School of Pharmaceutical Sciences, Lovely Professional University, Phagwara 1444411, Punjab, India;
- Australian Research Centre in Complementary and Integrative Medicine, Faculty of Health, University of Technology Sydney, Ultimo, NSW 20227, Australia
| | - Simona Gabriela Bungau
- Department of Pharmacy, Faculty of Medicine and Pharmacy, University of Oradea, 410028 Oradea, Romania
- Doctoral School of Biological and Biomedical Sciences, University of Oradea, 410087 Oradea, Romania
| | - Mihaela Cristina Brisc
- Department of Medical Disciplines, Faculty of Medicine and Pharmacy, University of Oradea, 410073 Oradea, Romania;
| |
Collapse
|
15
|
Xue X, Sun H, Yang M, Liu X, Hu HY, Deng Y, Wang X. Advances in the Application of Artificial Intelligence-Based Spectral Data Interpretation: A Perspective. Anal Chem 2023; 95:13733-13745. [PMID: 37688541 DOI: 10.1021/acs.analchem.3c02540] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/11/2023]
Abstract
The interpretation of spectral data, including mass, nuclear magnetic resonance, infrared, and ultraviolet-visible spectra, is critical for obtaining molecular structural information. The development of advanced sensing technology has multiplied the amount of available spectral data. Chemical experts must use basic principles corresponding to the spectral information generated by molecular fragments and functional groups. This is a time-consuming process that requires a solid professional knowledge base. In recent years, the rapid development of computer science and its applications in cheminformatics and the emergence of computer-aided expert systems have greatly reduced the difficulty in analyzing large quantities of data. For expert systems, however, the problem-solving strategy must be known in advance or extracted by human experts and translated into algorithms. Gratifyingly, the development of artificial intelligence (AI) methods has shown great promise for solving such problems. Traditional algorithms, including the latest neural network algorithms, have shown great potential for both extracting useful information and processing massive quantities of data. This Perspective highlights recent innovations covering all of the emerging AI-based spectral interpretation techniques. In addition, the main limitations and current obstacles are presented, and the corresponding directions for further research are proposed. Moreover, this Perspective gives the authors' personal outlook on the development and future applications of spectral interpretation.
Collapse
Affiliation(s)
- Xi Xue
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Hanyu Sun
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Minjian Yang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Xue Liu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Hai-Yu Hu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd. Beijing 100080, China
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaojian Wang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- CarbonSilicon AI Technology Co., Ltd. Beijing 100080, China
| |
Collapse
|
16
|
Dou B, Zhu Z, Merkurjev E, Ke L, Chen L, Jiang J, Zhu Y, Liu J, Zhang B, Wei GW. Machine Learning Methods for Small Data Challenges in Molecular Science. Chem Rev 2023; 123:8736-8780. [PMID: 37384816 PMCID: PMC10999174 DOI: 10.1021/acs.chemrev.3c00189] [Citation(s) in RCA: 87] [Impact Index Per Article: 43.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, and technical limitations in data acquisition. However, big data have been the focus for the past decade, small data and their challenges have received little attention, even though they are technically more severe in machine learning (ML) and deep learning (DL) studies. Overall, the small data challenge is often compounded by issues, such as data diversity, imputation, noise, imbalance, and high-dimensionality. Fortunately, the current big data era is characterized by technological breakthroughs in ML, DL, and artificial intelligence (AI), which enable data-driven scientific discovery, and many advanced ML and DL technologies developed for big data have inadvertently provided solutions for small data problems. As a result, significant progress has been made in ML and DL for small data challenges in the past decade. In this review, we summarize and analyze several emerging potential solutions to small data challenges in molecular science, including chemical and biological sciences. We review both basic machine learning algorithms, such as linear regression, logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), kernel learning (KL), random forest (RF), and gradient boosting trees (GBT), and more advanced techniques, including artificial neural network (ANN), convolutional neural network (CNN), U-Net, graph neural network (GNN), Generative Adversarial Network (GAN), long short-term memory (LSTM), autoencoder, transformer, transfer learning, active learning, graph-based semi-supervised learning, combining deep learning with traditional machine learning, and physical model-based data augmentation. We also briefly discuss the latest advances in these methods. Finally, we conclude the survey with a discussion of promising trends in small data challenges in molecular science.
Collapse
Affiliation(s)
- Bozheng Dou
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Zailiang Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Ekaterina Merkurjev
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Lu Ke
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Long Chen
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jian Jiang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yueying Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jie Liu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Bengong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
17
|
Zhou L, Wang Y, Peng L, Li Z, Luo X. Identifying potential drug-target interactions based on ensemble deep learning. Front Aging Neurosci 2023; 15:1176400. [PMID: 37396659 PMCID: PMC10309650 DOI: 10.3389/fnagi.2023.1176400] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 05/10/2023] [Indexed: 07/04/2023] Open
Abstract
Introduction Drug-target interaction prediction is one important step in drug research and development. Experimental methods are time consuming and laborious. Methods In this study, we developed a novel DTI prediction method called EnGDD by combining initial feature acquisition, dimensional reduction, and DTI classification based on Gradient boosting neural network, Deep neural network, and Deep Forest. Results EnGDD was compared with seven stat-of-the-art DTI prediction methods (BLM-NII, NRLMF, WNNGIP, NEDTP, DTi2Vec, RoFDT, and MolTrans) on the nuclear receptor, GPCR, ion channel, and enzyme datasets under cross validations on drugs, targets, and drug-target pairs, respectively. EnGDD computed the best recall, accuracy, F1-score, AUC, and AUPR under the majority of conditions, demonstrating its powerful DTI identification performance. EnGDD predicted that D00182 and hsa2099, D07871 and hsa1813, DB00599 and hsa2562, D00002 and hsa10935 have a higher interaction probabilities among unknown drug-target pairs and may be potential DTIs on the four datasets, respectively. In particular, D00002 (Nadide) was identified to interact with hsa10935 (Mitochondrial peroxiredoxin3) whose up-regulation might be used to treat neurodegenerative diseases. Finally, EnGDD was used to find possible drug targets for Parkinson's disease and Alzheimer's disease after confirming its DTI identification performance. The results show that D01277, D04641, and D08969 may be applied to the treatment of Parkinson's disease through targeting hsa1813 (dopamine receptor D2) and D02173, D02558, and D03822 may be the clues of treatment for patients with Alzheimer's disease through targeting hsa5743 (prostaglandinendoperoxide synthase 2). The above prediction results need further biomedical validation. Discussion We anticipate that our proposed EnGDD model can help discover potential therapeutic clues for various diseases including neurodegenerative diseases.
Collapse
Affiliation(s)
- Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Yuzhuang Wang
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Zejun Li
- School of Computer Science, Hunan Institute of Technology, Hengyang, China
| | - Xueming Luo
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| |
Collapse
|
18
|
Yang M, Sun H, Liu X, Xue X, Deng Y, Wang X. CMGN: a conditional molecular generation net to design target-specific molecules with desired properties. Brief Bioinform 2023:7165252. [PMID: 37193672 DOI: 10.1093/bib/bbad185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Revised: 04/06/2023] [Accepted: 04/23/2023] [Indexed: 05/18/2023] Open
Abstract
The rational design of chemical entities with desired properties for a specific target is a long-standing challenge in drug design. Generative neural networks have emerged as a powerful approach to sample novel molecules with specific properties, termed as inverse drug design. However, generating molecules with biological activity against certain targets and predefined drug properties still remains challenging. Here, we propose a conditional molecular generation net (CMGN), the backbone of which is a bidirectional and autoregressive transformer. CMGN applies large-scale pretraining for molecular understanding and navigates the chemical space for specified targets by fine-tuning with corresponding datasets. Additionally, fragments and properties were trained to recover molecules to learn the structure-properties relationships. Our model crisscrosses the chemical space for specific targets and properties that control fragment-growth processes. Case studies demonstrated the advantages and utility of our model in fragment-to-lead processes and multi-objective lead optimization. The results presented in this paper illustrate that CMGN has the potential to accelerate the drug discovery process.
Collapse
Affiliation(s)
- Minjian Yang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Department of Medicinal Chemistry, Beijing Key Laboratory of Active Substances Discovery and Druggability Evaluation, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Hanyu Sun
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Xue Liu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Xi Xue
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd., China
| | - Xiaojian Wang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Department of Medicinal Chemistry, Beijing Key Laboratory of Active Substances Discovery and Druggability Evaluation, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| |
Collapse
|
19
|
Desmedt E, Smets D, Woller T, Alonso M, De Vleeschouwer F. Designing hexaphyrins for high-potential NLO switches: the synergy of core-modifications and meso-substitutions. Phys Chem Chem Phys 2023. [PMID: 37162298 DOI: 10.1039/d3cp01240a] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Due to the enormous size of the chemical compound space, usually only small regions are traversed with traditional direct molecular design approaches making the discovery for novel functionalized molecules for nonlinear optical applications challenging. By applying inverse molecular design algorithms, we aim to efficiently explore larger regions of the compound space in search of promising hexaphyrin-based molecular switches as measured by their first-hyperpolarizability (βHRS) contrast. We focus on the 28R → 30R switch with a functionalization pattern allowing for centrosymmetric OFF states yielding zero βHRS response. This switch is particularly challenging as full meso-substitution with a single type of functional group or core-modifications result in almost no contrast enhancement. We carried out four inverse design procedures during which two sets of core-modifications and three sets of meso-substitutions sites were systematically optimized. All 4 optimal switches are characterized by a mix of meso-substitutions and core-modifications, of which the best performing switch yields a 10-fold improvement over the parent macrocycle. Throughout the inverse design procedures, we collected and analyzed a database biased towards high NLO contrasts that contains 277 different patterns for hexaphyrin-based switches. We derived three design rules to obtain highly functional 28R → 30R NLO switches: (I) a combination of 2 strong EWG and 1 EDG group is the ideal recipe for increasing the NLO contrast, though their position also plays an important role. (II) The type of core-modification is less important when only the diagonal positions are core-modified. Switches with 4 core-modifications show a clear preference for oxygen. (III) Keeping centrosymmetry in the OFF state remains highly beneficial given the investigated functionalization pattern. Finally, we have demonstrated that combining meso-substitutions with core-modifications can synergistically improve the NLO contrast.
Collapse
Affiliation(s)
- Eline Desmedt
- Department of General Chemistry Algemene Chemie (ALGC), Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussel, Belgium.
| | - David Smets
- Department of General Chemistry Algemene Chemie (ALGC), Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussel, Belgium.
| | - Tatiana Woller
- Department of General Chemistry Algemene Chemie (ALGC), Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussel, Belgium.
| | - Mercedes Alonso
- Department of General Chemistry Algemene Chemie (ALGC), Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussel, Belgium.
| | - Freija De Vleeschouwer
- Department of General Chemistry Algemene Chemie (ALGC), Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussel, Belgium.
| |
Collapse
|
20
|
Yao L, Yang M, Song J, Yang Z, Sun H, Shi H, Liu X, Ji X, Deng Y, Wang X. Conditional Molecular Generation Net Enables Automated Structure Elucidation Based on 13C NMR Spectra and Prior Knowledge. Anal Chem 2023; 95:5393-5401. [PMID: 36926883 DOI: 10.1021/acs.analchem.2c05817] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
Structure elucidation of unknown compounds based on nuclear magnetic resonance (NMR) remains a challenging problem in both synthetic organic and natural product chemistry. Library matching has been an efficient method to assist structure elucidation. However, it is limited by the coverage of libraries. In addition, prior knowledge such as molecular fragments is neglected. To solve the problem, we propose a conditional molecular generation net (CMGNet) to allow input of multiple sources of information. CMGNet not only uses 13C NMR spectrum data as input but molecular formulas and fragments of molecules are also employed as input conditions. Our model applies large-scale pretraining for molecular understanding and fine-tuning on two NMR spectral data sets of different granularity levels to accommodate structure elucidation tasks. CMGNet generates structures based on 13C NMR data, molecular formula, and fragment information, with a recovery rate of 94.17% in the top 10 recommendations. In addition, the generative model performed well in the generation of various classes of compounds and in the structural revision task. CMGNet has a deep understanding of molecular connectivities from 13C NMR, molecular formula, and fragments, paving the way for a new paradigm of deep learning-assisted inverse problem-solving.
Collapse
Affiliation(s)
- Lin Yao
- CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China
| | - Minjian Yang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Jianfei Song
- CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China
| | - Zhuo Yang
- CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China
| | - Hanyu Sun
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Hui Shi
- CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China
| | - Xue Liu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Xiangyang Ji
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China.,Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaojian Wang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China.,CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China
| |
Collapse
|
21
|
Shen SC, Khare E, Lee NA, Saad MK, Kaplan DL, Buehler MJ. Computational Design and Manufacturing of Sustainable Materials through First-Principles and Materiomics. Chem Rev 2023; 123:2242-2275. [PMID: 36603542 DOI: 10.1021/acs.chemrev.2c00479] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Engineered materials are ubiquitous throughout society and are critical to the development of modern technology, yet many current material systems are inexorably tied to widespread deterioration of ecological processes. Next-generation material systems can address goals of environmental sustainability by providing alternatives to fossil fuel-based materials and by reducing destructive extraction processes, energy costs, and accumulation of solid waste. However, development of sustainable materials faces several key challenges including investigation, processing, and architecting of new feedstocks that are often relatively mechanically weak, complex, and difficult to characterize or standardize. In this review paper, we outline a framework for examining sustainability in material systems and discuss how recent developments in modeling, machine learning, and other computational tools can aid the discovery of novel sustainable materials. We consider these through the lens of materiomics, an approach that considers material systems holistically by incorporating perspectives of all relevant scales, beginning with first-principles approaches and extending through the macroscale to consider sustainable material design from the bottom-up. We follow with an examination of how computational methods are currently applied to select examples of sustainable material development, with particular emphasis on bioinspired and biobased materials, and conclude with perspectives on opportunities and open challenges.
Collapse
Affiliation(s)
- Sabrina C Shen
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Avenue 1-165, Cambridge, Massachusetts 02139, United States.,Department of Materials Science and Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Eesha Khare
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Avenue 1-165, Cambridge, Massachusetts 02139, United States.,Department of Materials Science and Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Nicolas A Lee
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Avenue 1-165, Cambridge, Massachusetts 02139, United States.,School of Architecture and Planning, Media Lab, Massachusetts Institute of Technology, 75 Amherst Street, Cambridge, Massachusetts 02139, United States
| | - Michael K Saad
- Department of Biomedical Engineering, Tufts University, 4 Colby Street, Medford, Massachusetts 02155, United States
| | - David L Kaplan
- Department of Biomedical Engineering, Tufts University, 4 Colby Street, Medford, Massachusetts 02155, United States
| | - Markus J Buehler
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Avenue 1-165, Cambridge, Massachusetts 02139, United States.,Center for Computational Science and Engineering, Schwarzman College of Computing, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
22
|
Fromer JC, Coley CW. Computer-aided multi-objective optimization in small molecule discovery. PATTERNS (NEW YORK, N.Y.) 2023; 4:100678. [PMID: 36873904 PMCID: PMC9982302 DOI: 10.1016/j.patter.2023.100678] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
Molecular discovery is a multi-objective optimization problem that requires identifying a molecule or set of molecules that balance multiple, often competing, properties. Multi-objective molecular design is commonly addressed by combining properties of interest into a single objective function using scalarization, which imposes assumptions about relative importance and uncovers little about the trade-offs between objectives. In contrast to scalarization, Pareto optimization does not require knowledge of relative importance and reveals the trade-offs between objectives. However, it introduces additional considerations in algorithm design. In this review, we describe pool-based and de novo generative approaches to multi-objective molecular discovery with a focus on Pareto optimization algorithms. We show how pool-based molecular discovery is a relatively direct extension of multi-objective Bayesian optimization and how the plethora of different generative models extend from single-objective to multi-objective optimization in similar ways using non-dominated sorting in the reward function (reinforcement learning) or to select molecules for retraining (distribution learning) or propagation (genetic algorithms). Finally, we discuss some remaining challenges and opportunities in the field, emphasizing the opportunity to adopt Bayesian optimization techniques into multi-objective de novo design.
Collapse
Affiliation(s)
- Jenna C Fromer
- Department of Chemical Engineering, MIT, Cambridge, MA 02139, USA
| | - Connor W Coley
- Department of Chemical Engineering, MIT, Cambridge, MA 02139, USA.,Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA 02139, USA
| |
Collapse
|
23
|
Sridharan B, Mehta S, Pathak Y, Priyakumar UD. Deep Reinforcement Learning for Molecular Inverse Problem of Nuclear Magnetic Resonance Spectra to Molecular Structure. J Phys Chem Lett 2022; 13:4924-4933. [PMID: 35635003 DOI: 10.1021/acs.jpclett.2c00624] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Spectroscopy is the study of how matter interacts with electromagnetic radiation. The spectra of any molecule are highly information-rich, yet the inverse relation of spectra to the corresponding molecular structure is still an unsolved problem. Nuclear magnetic resonance (NMR) spectroscopy is one such critical technique in the scientists' toolkit to characterize molecules. In this work, a novel machine learning framework is proposed that attempts to solve this inverse problem by navigating the chemical space to find the correct structure given an NMR spectra. The proposed framework uses a combination of online Monte Carlo tree search (MCTS) and a set of graph convolution networks to build a molecule iteratively. Our method can predict the structure of the molecule ∼80% of the time in its top 3 guesses for molecules with <10 heavy atoms. We believe that the proposed framework is a significant step in solving the inverse design problem of NMR spectra.
Collapse
Affiliation(s)
- Bhuvanesh Sridharan
- Centre for Computational Natural Science and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, India
| | - Sarvesh Mehta
- Centre for Computational Natural Science and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, India
| | - Yashaswi Pathak
- Centre for Computational Natural Science and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, India
| | - U Deva Priyakumar
- Centre for Computational Natural Science and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, India
| |
Collapse
|