1
|
Xia X, Liu Y, Zheng C, Zhang X, Wu Q, Gao X, Zeng X, Su Y. Evolutionary Multiobjective Molecule Optimization in an Implicit Chemical Space. J Chem Inf Model 2024; 64:5161-5174. [PMID: 38870455 PMCID: PMC11235097 DOI: 10.1021/acs.jcim.4c00031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 05/08/2024] [Accepted: 05/13/2024] [Indexed: 06/15/2024]
Abstract
Optimization techniques play a pivotal role in advancing drug development, serving as the foundation of numerous generative methods tailored to efficiently design optimized molecules derived from existing lead compounds. However, existing methods often encounter difficulties in generating diverse, novel, and high-property molecules that simultaneously optimize multiple drug properties. To overcome this bottleneck, we propose a multiobjective molecule optimization framework (MOMO). MOMO employs a specially designed Pareto-based multiproperty evaluation strategy at the molecular sequence level to guide the evolutionary search in an implicit chemical space. A comparative analysis of MOMO with five state-of-the-art methods across two benchmark multiproperty molecule optimization tasks reveals that MOMO markedly outperforms them in terms of diversity, novelty, and optimized properties. The practical applicability of MOMO in drug discovery has also been validated on four challenging tasks in the real-world discovery problem. These results suggest that MOMO can provide a useful tool to facilitate molecule optimization problems with multiple properties.
Collapse
Affiliation(s)
- Xin Xia
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
- Institute
of Artificial Intelligence, Hefei Comprehensive
National Science Center, 5089 Wangjiang West Road, Hefei 230088, AnhuiChina
| | - Yiping Liu
- College
of Computer Science and Electronic Engineering, Hunan University, Changsha 410012, China
| | - Chunhou Zheng
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
| | - Xingyi Zhang
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
| | - Qingwen Wu
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
| | - Xin Gao
- Computer
Science Program, Computer, Electrical and Mathematical Sciences and
Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology
(KAUST), Thuwal 23955-6900, Kingdom
of Saudi Arabia
| | - Xiangxiang Zeng
- College
of Computer Science and Electronic Engineering, Hunan University, Changsha 410012, China
| | - Yansen Su
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
- Institute
of Artificial Intelligence, Hefei Comprehensive
National Science Center, 5089 Wangjiang West Road, Hefei 230088, AnhuiChina
| |
Collapse
|
2
|
Fromer JC, Coley CW. Computer-aided multi-objective optimization in small molecule discovery. PATTERNS (NEW YORK, N.Y.) 2023; 4:100678. [PMID: 36873904 PMCID: PMC9982302 DOI: 10.1016/j.patter.2023.100678] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
Molecular discovery is a multi-objective optimization problem that requires identifying a molecule or set of molecules that balance multiple, often competing, properties. Multi-objective molecular design is commonly addressed by combining properties of interest into a single objective function using scalarization, which imposes assumptions about relative importance and uncovers little about the trade-offs between objectives. In contrast to scalarization, Pareto optimization does not require knowledge of relative importance and reveals the trade-offs between objectives. However, it introduces additional considerations in algorithm design. In this review, we describe pool-based and de novo generative approaches to multi-objective molecular discovery with a focus on Pareto optimization algorithms. We show how pool-based molecular discovery is a relatively direct extension of multi-objective Bayesian optimization and how the plethora of different generative models extend from single-objective to multi-objective optimization in similar ways using non-dominated sorting in the reward function (reinforcement learning) or to select molecules for retraining (distribution learning) or propagation (genetic algorithms). Finally, we discuss some remaining challenges and opportunities in the field, emphasizing the opportunity to adopt Bayesian optimization techniques into multi-objective de novo design.
Collapse
Affiliation(s)
- Jenna C Fromer
- Department of Chemical Engineering, MIT, Cambridge, MA 02139, USA
| | - Connor W Coley
- Department of Chemical Engineering, MIT, Cambridge, MA 02139, USA.,Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA 02139, USA
| |
Collapse
|
3
|
Sundin I, Voronov A, Xiao H, Papadopoulos K, Bjerrum EJ, Heinonen M, Patronov A, Kaski S, Engkvist O. Human-in-the-loop assisted de novo molecular design. J Cheminform 2022; 14:86. [PMID: 36578043 PMCID: PMC9795720 DOI: 10.1186/s13321-022-00667-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 12/03/2022] [Indexed: 12/29/2022] Open
Abstract
A de novo molecular design workflow can be used together with technologies such as reinforcement learning to navigate the chemical space. A bottleneck in the workflow that remains to be solved is how to integrate human feedback in the exploration of the chemical space to optimize molecules. A human drug designer still needs to design the goal, expressed as a scoring function for the molecules that captures the designer's implicit knowledge about the optimization task. Little support for this task exists and, consequently, a chemist usually resorts to iteratively building the objective function of multi-parameter optimization (MPO) in de novo design. We propose a principled approach to use human-in-the-loop machine learning to help the chemist to adapt the MPO scoring function to better match their goal. An advantage is that the method can learn the scoring function directly from the user's feedback while they browse the output of the molecule generator, instead of the current manual tuning of the scoring function with trial and error. The proposed method uses a probabilistic model that captures the user's idea and uncertainty about the scoring function, and it uses active learning to interact with the user. We present two case studies for this: In the first use-case, the parameters of an MPO are learned, and in the second use-case a non-parametric component of the scoring function to capture human domain knowledge is developed. The results show the effectiveness of the methods in two simulated example cases with an oracle, achieving significant improvement in less than 200 feedback queries, for the goals of a high QED score and identifying potent molecules for the DRD2 receptor, respectively. We further demonstrate the performance gains with a medicinal chemist interacting with the system.
Collapse
Affiliation(s)
- Iiris Sundin
- grid.5373.20000000108389418Department of Computer Science, Aalto University, Espoo, Finland
| | - Alexey Voronov
- grid.418151.80000 0001 1519 6403Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Haoping Xiao
- grid.5373.20000000108389418Department of Computer Science, Aalto University, Espoo, Finland
| | - Kostas Papadopoulos
- grid.418151.80000 0001 1519 6403Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden ,Present Address: Odyssey Therapeutics, Cambridge, MA USA
| | - Esben Jannik Bjerrum
- grid.418151.80000 0001 1519 6403Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden ,Present Address: Odyssey Therapeutics, Cambridge, MA USA
| | - Markus Heinonen
- grid.5373.20000000108389418Department of Computer Science, Aalto University, Espoo, Finland
| | - Atanas Patronov
- grid.418151.80000 0001 1519 6403Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden ,Present Address: Odyssey Therapeutics, Cambridge, MA USA
| | - Samuel Kaski
- grid.5373.20000000108389418Department of Computer Science, Aalto University, Espoo, Finland ,grid.5379.80000000121662407Department of Computer Science, University of Manchester, Manchester, UK
| | - Ola Engkvist
- grid.418151.80000 0001 1519 6403Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden ,grid.5371.00000 0001 0775 6028Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden
| |
Collapse
|
4
|
Interpretable Machine Learning Models for Molecular Design of Tyrosine Kinase Inhibitors Using Variational Autoencoders and Perturbation-Based Approach of Chemical Space Exploration. Int J Mol Sci 2022; 23:ijms231911262. [PMID: 36232566 PMCID: PMC9569663 DOI: 10.3390/ijms231911262] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 09/21/2022] [Accepted: 09/21/2022] [Indexed: 11/17/2022] Open
Abstract
In the current study, we introduce an integrative machine learning strategy for the autonomous molecular design of protein kinase inhibitors using variational autoencoders and a novel cluster-based perturbation approach for exploration of the chemical latent space. The proposed strategy combines autoencoder-based embedding of small molecules with a cluster-based perturbation approach for efficient navigation of the latent space and a feature-based kinase inhibition likelihood classifier that guides optimization of the molecular properties and targeted molecular design. In the proposed generative approach, molecules sharing similar structures tend to cluster in the latent space, and interpolating between two molecules in the latent space enables smooth changes in the molecular structures and properties. The results demonstrated that the proposed strategy can efficiently explore the latent space of small molecules and kinase inhibitors along interpretable directions to guide the generation of novel family-specific kinase molecules that display a significant scaffold diversity and optimal biochemical properties. Through assessment of the latent-based and chemical feature-based binary and multiclass classifiers, we developed a robust probabilistic evaluator of kinase inhibition likelihood that is specifically tailored to guide the molecular design of novel SRC kinase molecules. The generated molecules originating from LCK and ABL1 kinase inhibitors yielded ~40% of novel and valid SRC kinase compounds with high kinase inhibition likelihood probability values (p > 0.75) and high similarity (Tanimoto coefficient > 0.6) to the known SRC inhibitors. By combining the molecular perturbation design with the kinase inhibition likelihood analysis and similarity assessments, we showed that the proposed molecular design strategy can produce novel valid molecules and transform known inhibitors of different kinase families into potential chemical probes of the SRC kinase with excellent physicochemical profiles and high similarity to the known SRC kinase drugs. The results of our study suggest that task-specific manipulation of a biased latent space may be an important direction for more effective task-oriented and target-specific autonomous chemical design models.
Collapse
|
5
|
Bolcato G, Heid E, Boström J. On the Value of Using 3D Shape and Electrostatic Similarities in Deep Generative Methods. J Chem Inf Model 2022; 62:1388-1398. [PMID: 35271260 PMCID: PMC8965872 DOI: 10.1021/acs.jcim.1c01535] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
![]()
Multiparameter optimization,
the heart of drug design, is still
an open challenge. Thus, improved methods for automated compound design
with multiple controlled properties are desired. Here, we present
a significant extension to our previously described fragment-based
reinforcement learning method (DeepFMPO) for the generation of novel
molecules with optimal properties. As before, the generative process
outputs optimized molecules similar to the input structures, now with
the improved feature of replacing parts of these molecules with fragments
of similar three-dimensional (3D) shape and electrostatics. We developed
and benchmarked a new python package, ESP-Sim, for the comparison
of the electrostatic potential and the molecular shape, allowing the
calculation of high-quality partial charges (e.g., RESP with B3LYP/6-31G**)
obtained using the quantum chemistry program Psi4. By performing comparisons
of 3D fragments, we can simulate 3D properties while overcoming the
notoriously difficult step of accurately describing bioactive conformations.
The new improved generative (DeepFMPO v3D) method is demonstrated
with a scaffold-hopping exercise identifying CDK2 bioisosteres. The
code is open-source and freely available.
Collapse
Affiliation(s)
- Giovanni Bolcato
- Molecular Modeling Section, University of Padova, 35131 Padova, Italy
| | - Esther Heid
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, 02139 Massachusetts, United States
| | - Jonas Boström
- Medicinal Chemistry, Early CVRM, BioPharmaceuticals R&D, AstraZeneca, 431 50 Mölndal, Sweden
| |
Collapse
|
6
|
Abstract
Artificial intelligence (AI) tools find increasing application in drug discovery supporting every stage of the Design-Make-Test-Analyse (DMTA) cycle. The main focus of this chapter is the application in molecular generation with the aid of deep neural networks (DNN). We present a historical overview of the main advances in the field. We analyze the concepts of distribution and goal-directed learning and then highlight some of the recent applications of generative models in drug design with a focus into research work from the biopharmaceutical industry. We present in some more detail REINVENT which is an open-source software developed within our group in AstraZeneca and the main platform for AI molecular design support for a number of medicinal chemistry projects in the company and we also demonstrate some of our work in library design. Finally, we present some of the main challenges in the application of AI in Drug Discovery and different approaches to respond to these challenges which define areas for current and future work.
Collapse
|
7
|
Frye L, Bhat S, Akinsanya K, Abel R. From computer-aided drug discovery to computer-driven drug discovery. DRUG DISCOVERY TODAY. TECHNOLOGIES 2021; 39:111-117. [PMID: 34906321 DOI: 10.1016/j.ddtec.2021.08.001] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 07/06/2021] [Accepted: 08/02/2021] [Indexed: 12/16/2022]
Abstract
Computational chemistry and structure-based design have traditionally been viewed as a subset of tools that could aid acceleration of the drug discovery process, but were not commonly regarded as a driving force in small molecule drug discovery. In the last decade however, there have been dramatic advances in the field, including (1) development of physics-based computational approaches to accurately predict a broad variety of endpoints from potency to solubility, (2) improvements in artificial intelligence and deep learning methods and (3) dramatic increases in computational power with the advent of GPUs and cloud computing, resulting in the ability to explore and accurately profile vast amounts of drug-like chemical space in silico. There have also been simultaneous advancements in structural biology such as cryogenic electron microscopy (cryo-EM) and computational protein-structure prediction, allowing for access to many more high-resolution 3D structures of novel drug-receptor complexes. The convergence of these breakthroughs has positioned structurally-enabled computational methods to be a driving force behind the discovery of novel small molecule therapeutics. This review will give a broad overview of the synergies in recent advances in the fields of computational chemistry, machine learning and structural biology, in particular in the areas of hit identification, hit-to-lead, and lead optimization.
Collapse
Affiliation(s)
- Leah Frye
- Schrödinger Inc., 120 West 45th Street, 17th Floor, New York, NY 10036-4041, United States
| | - Sathesh Bhat
- Schrödinger Inc., 120 West 45th Street, 17th Floor, New York, NY 10036-4041, United States
| | - Karen Akinsanya
- Schrödinger Inc., 120 West 45th Street, 17th Floor, New York, NY 10036-4041, United States
| | - Robert Abel
- Schrödinger Inc., 120 West 45th Street, 17th Floor, New York, NY 10036-4041, United States.
| |
Collapse
|