1
|
Zahoránszky-Kőhalmi G, Wan KK, Godfrey AG. Hilbert-curve assisted structure embedding method. J Cheminform 2024; 16:87. [PMID: 39075547 PMCID: PMC11285582 DOI: 10.1186/s13321-024-00850-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 04/30/2024] [Indexed: 07/31/2024] Open
Abstract
MOTIVATION Chemical space embedding methods are widely utilized in various research settings for dimensional reduction, clustering and effective visualization. The maps generated by the embedding process can provide valuable insight to medicinal chemists in terms of the relationships between structural, physicochemical and biological properties of compounds. However, these maps are known to be difficult to interpret, and the ''landscape'' on the map is prone to ''rearrangement'' when embedding different sets of compounds. RESULTS In this study we present the Hilbert-Curve Assisted Space Embedding (HCASE) method which was designed to create maps by organizing structures according to a logic familiar to medicinal chemists. First, a chemical space is created with the help of a set of ''reference scaffolds''. These scaffolds are sorted according to the medicinal chemistry inspired Scaffold-Key algorithm found in prior art. Next, the ordered scaffolds are mapped to a line which is folded into a higher dimensional (here: 2D) space. The intricately folded line is referred to as a pseudo-Hilbert-Curve. The embedding of a compound happens by locating its most similar reference scaffold in the pseudo-Hilbert-Curve and assuming the respective position. Through a series of experiments, we demonstrate the properties of the maps generated by the HCASE method. Subjects of embeddings were compounds of the DrugBank and CANVASS libraries, and the chemical spaces were defined by scaffolds extracted from the ChEMBL database. SCIENTIFIC CONTRIBUTION The novelty of HCASE method lies in generating robust and intuitive chemical space embeddings that are reflective of a medicinal chemist's reasoning, and the precedential use of space filling (Hilbert) curve in the process. AVAILABILITY https://github.com/ncats/hcase.
Collapse
Affiliation(s)
- Gergely Zahoránszky-Kőhalmi
- National Center for Advancing Translational Sciences (NCATS/NIH), 9800 Medical Center Dr., Rockville, MD, 20850, USA.
| | - Kanny K Wan
- National Center for Advancing Translational Sciences (NCATS/NIH), 9800 Medical Center Dr., Rockville, MD, 20850, USA
| | - Alexander G Godfrey
- National Center for Advancing Translational Sciences (NCATS/NIH), 9800 Medical Center Dr., Rockville, MD, 20850, USA
| |
Collapse
|
2
|
Samanipour S, Barron LP, van Herwerden D, Praetorius A, Thomas KV, O’Brien JW. Exploring the Chemical Space of the Exposome: How Far Have We Gone? JACS AU 2024; 4:2412-2425. [PMID: 39055136 PMCID: PMC11267556 DOI: 10.1021/jacsau.4c00220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 05/29/2024] [Accepted: 05/31/2024] [Indexed: 07/27/2024]
Abstract
Around two-thirds of chronic human disease can not be explained by genetics alone. The Lancet Commission on Pollution and Health estimates that 16% of global premature deaths are linked to pollution. Additionally, it is now thought that humankind has surpassed the safe planetary operating space for introducing human-made chemicals into the Earth System. Direct and indirect exposure to a myriad of chemicals, known and unknown, poses a significant threat to biodiversity and human health, from vaccine efficacy to the rise of antimicrobial resistance as well as autoimmune diseases and mental health disorders. The exposome chemical space remains largely uncharted due to the sheer number of possible chemical structures, estimated at over 1060 unique forms. Conventional methods have cataloged only a fraction of the exposome, overlooking transformation products and often yielding uncertain results. In this Perspective, we have reviewed the latest efforts in mapping the exposome chemical space and its subspaces. We also provide our view on how the integration of data-driven approaches might be able to bridge the identified gaps.
Collapse
Affiliation(s)
- Saer Samanipour
- Van’t
Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1090 GD, The Netherlands
- UvA
Data Science Center, University of Amsterdam, Amsterdam 1090 GD, The Netherlands
- Queensland
Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Cornwall Street, Woolloongabba, Queensland 4102, Australia
| | - Leon Patrick Barron
- Van’t
Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1090 GD, The Netherlands
- MRC
Centre for Environment and Health, Environmental Research Group, School
of Public Health, Faculty of Medicine, Imperial
College London, London W12 0BZ, United Kingdom
| | - Denice van Herwerden
- Van’t
Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1090 GD, The Netherlands
| | - Antonia Praetorius
- Institute
for Biodiversity and Ecosystem Dynamics (IBED), University of Amsterdam, Amsterdam 1090 GD, The Netherlands
| | - Kevin V. Thomas
- Queensland
Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Cornwall Street, Woolloongabba, Queensland 4102, Australia
| | - Jake William O’Brien
- Van’t
Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1090 GD, The Netherlands
- Queensland
Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Cornwall Street, Woolloongabba, Queensland 4102, Australia
| |
Collapse
|
3
|
Llompart P, Minoletti C, Baybekov S, Horvath D, Marcou G, Varnek A. Will we ever be able to accurately predict solubility? Sci Data 2024; 11:303. [PMID: 38499581 PMCID: PMC10948805 DOI: 10.1038/s41597-024-03105-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 02/29/2024] [Indexed: 03/20/2024] Open
Abstract
Accurate prediction of thermodynamic solubility by machine learning remains a challenge. Recent models often display good performances, but their reliability may be deceiving when used prospectively. This study investigates the origins of these discrepancies, following three directions: a historical perspective, an analysis of the aqueous solubility dataverse and data quality. We investigated over 20 years of published solubility datasets and models, highlighting overlooked datasets and the overlaps between popular sets. We benchmarked recently published models on a novel curated solubility dataset and report poor performances. We also propose a workflow to cure aqueous solubility data aiming at producing useful models for bench chemist. Our results demonstrate that some state-of-the-art models are not ready for public usage because they lack a well-defined applicability domain and overlook historical data sources. We report the impact of factors influencing the utility of the models: interlaboratory standard deviation, ionic state of the solute and data sources. The herein obtained models, and quality-assessed datasets are publicly available.
Collapse
Affiliation(s)
- P Llompart
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
- IDD/CADD, Sanofi, Vitry-Sur-Seine, France
| | | | - S Baybekov
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| | - D Horvath
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| | - G Marcou
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France.
| | - A Varnek
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| |
Collapse
|
4
|
Caminero Gomes Soares A, Marques Sousa GH, Calil RL, Goulart Trossini GH. Absorption matters: A closer look at popular oral bioavailability rules for drug approvals. Mol Inform 2023; 42:e202300115. [PMID: 37550251 DOI: 10.1002/minf.202300115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 07/10/2023] [Accepted: 08/07/2023] [Indexed: 08/09/2023]
Abstract
This study examines how two popular drug-likeness concepts used in early development, Lipinski Rule of Five (Ro5) and Veber's Rules, possibly affected drug profiles of FDA approved drugs since 1997. Our findings suggest that when all criteria are applied, relevant compounds may be excluded, addressing the harmfulness of blindly employing these rules. Of all oral drugs in the period used for this analysis, around 66 % conform to the RO5 and 85 % to Veber's Rules. Molecular Weight and calculated LogP showed low consistent values over time, apart from being the two least followed rules, challenging their relevance. On the other hand, hydrogen bond related rules and the number of rotatable bonds are amongst the most followed criteria and show exceptional consistency over time. Furthermore, our analysis indicates that topological polar surface area and total count of hydrogen bonds cannot be used as interchangeable parameters, contrary to the original proposal. This research enhances the comprehension of drug profiles that were FDA approved in the post-Lipinski period. Medicinal chemists could utilize these heuristics as a limited guide to direct their exploration of the oral bioavailability chemical space, but they must also steer the wheel to break these rules and explore different regions when necessary.
Collapse
Affiliation(s)
- Artur Caminero Gomes Soares
- School of Pharmaceutical Sciences, University of São Paulo, Department of Pharmacy, Laboratório de Integração entre Técnicas Experimentais e Computacionais (LITEC), Av. Prof. Lineu Prestes, 580, São Paulo, SP, Brazil
| | - Gustavo Henrique Marques Sousa
- School of Pharmaceutical Sciences, University of São Paulo, Department of Pharmacy, Laboratório de Integração entre Técnicas Experimentais e Computacionais (LITEC), Av. Prof. Lineu Prestes, 580, São Paulo, SP, Brazil
| | - Raisa Ludmila Calil
- School of Pharmaceutical Sciences, University of São Paulo, Department of Pharmacy, Laboratório de Integração entre Técnicas Experimentais e Computacionais (LITEC), Av. Prof. Lineu Prestes, 580, São Paulo, SP, Brazil
| | - Gustavo Henrique Goulart Trossini
- School of Pharmaceutical Sciences, University of São Paulo, Department of Pharmacy, Laboratório de Integração entre Técnicas Experimentais e Computacionais (LITEC), Av. Prof. Lineu Prestes, 580, São Paulo, SP, Brazil
| |
Collapse
|
5
|
Sala D, Batebi H, Ledwitch K, Hildebrand PW, Meiler J. Targeting in silico GPCR conformations with ultra-large library screening for hit discovery. Trends Pharmacol Sci 2023; 44:150-161. [PMID: 36669974 PMCID: PMC9974811 DOI: 10.1016/j.tips.2022.12.006] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 12/23/2022] [Accepted: 12/27/2022] [Indexed: 01/20/2023]
Abstract
The use of deep machine learning (ML) in protein structure prediction has made it possible to easily access a large number of annotated conformations that can potentially compensate for missing experimental structures in structure-based drug discovery (SBDD). However, it is still unclear whether the accuracy of these predicted conformations is sufficient for screening chemical compounds that will effectively interact with a protein target for pharmacological purposes. In this opinion article, we examine the potential benefits and limitations of using state-annotated conformations for ultra-large library screening (ULLS) in light of the growing size of ultra-large libraries (ULLs). We believe that targeting different conformational states of common drug targets like G-protein-coupled receptors (GPCRs), which can regulate human physiology by switching between different conformations, can offer multiple advantages.
Collapse
Affiliation(s)
- D Sala
- Institute of Drug Discovery, Faculty of Medicine, University of Leipzig, 04103 Leipzig, Germany
| | - H Batebi
- Institute of Medical Physics and Biophysics, Faculty of Medicine, University of Leipzig, 04103 Leipzig, Germany
| | - K Ledwitch
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37240, USA; Department of Chemistry, Vanderbilt University, Nashville, TN 37235, USA
| | - P W Hildebrand
- Institute of Medical Physics and Biophysics, Faculty of Medicine, University of Leipzig, 04103 Leipzig, Germany
| | - J Meiler
- Institute of Drug Discovery, Faculty of Medicine, University of Leipzig, 04103 Leipzig, Germany; Center for Structural Biology, Vanderbilt University, Nashville, TN 37240, USA; Department of Chemistry, Vanderbilt University, Nashville, TN 37235, USA.
| |
Collapse
|
6
|
Manen-Freixa L, Borrell JI, Teixidó J, Estrada-Tejedor R. Deconstructing Markush: Improving the R&D Efficiency Using Library Selection in Early Drug Discovery. Pharmaceuticals (Basel) 2022; 15:ph15091159. [PMID: 36145380 PMCID: PMC9503783 DOI: 10.3390/ph15091159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 09/02/2022] [Accepted: 09/14/2022] [Indexed: 11/16/2022] Open
Abstract
Most of the product patents claim a large number of compounds based on a Markush structure. However, the identification and optimization of new principal active ingredients is frequently driven by a simple Free Wilson approach, leading to a highly focused study only involving the chemical space nearby a hit compound. This fact raises the question: do the tested compounds described in patents really reflect the full molecular diversity described in the Markush structure? In this study, we contrast the performance of rational selection to conventional approaches in seven real-case patents, assessing their ability to describe the patent's chemical space. Results demonstrate that the integration of computer-aided library selection methods in the early stages of the drug discovery process would boost the identification of new potential hits across the chemical space.
Collapse
|
7
|
Trapotsi MA, Hosseini-Gerami L, Bender A. Computational analyses of mechanism of action (MoA): data, methods and integration. RSC Chem Biol 2022; 3:170-200. [PMID: 35360890 PMCID: PMC8827085 DOI: 10.1039/d1cb00069a] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 12/09/2021] [Indexed: 12/15/2022] Open
Abstract
The elucidation of a compound's Mechanism of Action (MoA) is a challenging task in the drug discovery process, but it is important in order to rationalise phenotypic findings and to anticipate potential side-effects. Bioinformatic approaches, advances in machine learning techniques and the increasing deposition of high-throughput data in public databases have significantly contributed to recent advances in the field, but it is not straightforward to decide which data and methods are most suitable to use in a given case. In this review, we focus on these methods and data and their applications in generating MoA hypotheses for subsequent experimental validation. We discuss compound-specific data such as -omics, cell morphology and bioactivity data, as well as commonly used supplementary prior knowledge such as network and pathway data, and provide information on databases where this data can be accessed. In terms of methodologies, we discuss both well-established methods (connectivity mapping, pathway enrichment) as well as more developing methods (neural networks and multi-omics integration). Finally, we review case studies where the MoA of a compound was successfully suggested from computational analysis by incorporating multiple data modalities and/or methodologies. Our aim for this review is to provide researchers with insights into the benefits and drawbacks of both the data and methods in terms of level of understanding, biases and interpretation - and to highlight future avenues of investigation which we foresee will improve the field of MoA elucidation, including greater public access to -omics data and methodologies which are capable of data integration.
Collapse
Affiliation(s)
- Maria-Anna Trapotsi
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| | - Layla Hosseini-Gerami
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| | - Andreas Bender
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| |
Collapse
|
8
|
Kerstjens A, De Winter H. LEADD: Lamarckian evolutionary algorithm for de novo drug design. J Cheminform 2022; 14:3. [PMID: 35033209 PMCID: PMC8760751 DOI: 10.1186/s13321-022-00582-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 12/30/2021] [Indexed: 11/10/2022] Open
Abstract
Given an objective function that predicts key properties of a molecule, goal-directed de novo molecular design is a useful tool to identify molecules that maximize or minimize said objective function. Nonetheless, a common drawback of these methods is that they tend to design synthetically unfeasible molecules. In this paper we describe a Lamarckian evolutionary algorithm for de novo drug design (LEADD). LEADD attempts to strike a balance between optimization power, synthetic accessibility of designed molecules and computational efficiency. To increase the likelihood of designing synthetically accessible molecules, LEADD represents molecules as graphs of molecular fragments, and limits the bonds that can be formed between them through knowledge-based pairwise atom type compatibility rules. A reference library of drug-like molecules is used to extract fragments, fragment preferences and compatibility rules. A novel set of genetic operators that enforce these rules in a computationally efficient manner is presented. To sample chemical space more efficiently we also explore a Lamarckian evolutionary mechanism that adapts the reproductive behavior of molecules. LEADD has been compared to both standard virtual screening and a comparable evolutionary algorithm using a standardized benchmark suite and was shown to be able to identify fitter molecules more efficiently. Moreover, the designed molecules are predicted to be easier to synthesize than those designed by other evolutionary algorithms.
Collapse
Affiliation(s)
- Alan Kerstjens
- Department of Pharmaceutical Sciences, Faculty of Pharmaceutical, Biomedical and Veterinary Sciences, University of Antwerp, Universiteitsplein 1A, 2610, Wilrijk, Belgium
| | - Hans De Winter
- Department of Pharmaceutical Sciences, Faculty of Pharmaceutical, Biomedical and Veterinary Sciences, University of Antwerp, Universiteitsplein 1A, 2610, Wilrijk, Belgium.
| |
Collapse
|
9
|
Zabolotna Y, Volochnyuk DM, Ryabukhin SV, Horvath D, Gavrilenko KS, Marcou G, Moroz YS, Oksiuta O, Varnek A. A Close-up Look at the Chemical Space of Commercially Available Building Blocks for Medicinal Chemistry. J Chem Inf Model 2021; 62:2171-2185. [PMID: 34928600 DOI: 10.1021/acs.jcim.1c00811] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The ability to efficiently synthesize desired compounds can be a limiting factor for chemical space exploration in drug discovery. This ability is conditioned not only by the existence of well-studied synthetic protocols but also by the availability of corresponding reagents, so-called building blocks (BBs). In this work, we present a detailed analysis of the chemical space of 400 000 purchasable BBs. The chemical space was defined by corresponding synthons─fragments contributed to the final molecules upon reaction. They allow an analysis of BB physicochemical properties and diversity, unbiased by the leaving and protective groups in actual reagents. The main classes of BBs were analyzed in terms of their availability, rule-of-two-defined quality, and diversity. Available BBs were eventually compared to a reference set of biologically relevant synthons derived from ChEMBL fragmentation, in order to illustrate how well they cover the actual medicinal chemistry needs. This was performed on a newly constructed universal generative topographic map of synthon chemical space that enables visualization of both libraries and analysis of their overlapped and library-specific regions.
Collapse
Affiliation(s)
- Yuliana Zabolotna
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Dmitriy M Volochnyuk
- Institute of Organic Chemistry, National Academy of Sciences of Ukraine, Murmanska Street 5, Kyiv 02660, Ukraine.,Enamine Ltd., 78 Chervonotkatska str., 02660 Kiev, Ukraine
| | - Sergey V Ryabukhin
- The Institute of High Technologies, Kyiv National Taras Shevchenko University, 64 Volodymyrska Street, Kyiv 01601, Ukraine.,Enamine Ltd., 78 Chervonotkatska str., 02660 Kiev, Ukraine
| | - Dragos Horvath
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Konstantin S Gavrilenko
- Research-And-Education ChemBioCenter, National Taras Shevchenko University of Kyiv, Chervonotkatska str., 61, 03022 Kiev, Ukraine.,Enamine Ltd., 78 Chervonotkatska str., 02660 Kiev, Ukraine
| | - Gilles Marcou
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Yurii S Moroz
- Research-And-Education ChemBioCenter, National Taras Shevchenko University of Kyiv, Chervonotkatska str., 61, 03022 Kiev, Ukraine.,Chemspace, Chervonotkatska Street 78, 02094 Kyiv, Ukraine
| | - Oleksandr Oksiuta
- Institute of Organic Chemistry, National Academy of Sciences of Ukraine, Murmanska Street 5, Kyiv 02660, Ukraine.,Chemspace, Chervonotkatska Street 78, 02094 Kyiv, Ukraine
| | - Alexandre Varnek
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France.,Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021 Sapporo, Japan
| |
Collapse
|
10
|
Gantzer P, Creton B, Nieto-Draghi C. Comparisons of Molecular Structure Generation Methods Based on Fragment Assemblies and Genetic Graphs. J Chem Inf Model 2021; 61:4245-4258. [PMID: 34405674 DOI: 10.1021/acs.jcim.1c00803] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The use of quantitative structure-property relationships (QSPRs) helps in predicting molecular properties for several decades, while the automatic design of new molecular structures is still emerging. The choice of algorithms to generate molecules is not obvious and is related to several factors such as the desired chemical diversity (according to an initial dataset's content) and the level of construction (the use of atoms, fragments, pattern-based methods). In this paper, we address the problem of molecular structure generation by revisiting two approaches: fragment-based methods (FMs) and genetic-based methods (GMs). We define a set of indices to compare generation methods on a specific task. New indices inform about the explored data space (coverage), compare how the data space is explored (representativeness), and quantifies the ratio of molecules satisfying requirements (generation specificity) without the use of a database composed of real chemicals as a reference. These indices were employed to compare generations of molecules fulfilling the desired property criterion, evaluated by QSPR.
Collapse
Affiliation(s)
- Philippe Gantzer
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, 92852 Rueil-Malmaison, France
| | - Benoit Creton
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, 92852 Rueil-Malmaison, France
| | - Carlos Nieto-Draghi
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, 92852 Rueil-Malmaison, France
| |
Collapse
|
11
|
Zabolotna Y, Ertl P, Horvath D, Bonachera F, Marcou G, Varnek A. NP Navigator: A New Look at the Natural Product Chemical Space. Mol Inform 2021; 40:e2100068. [PMID: 34170632 DOI: 10.1002/minf.202100068] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Accepted: 05/15/2021] [Indexed: 11/08/2022]
Abstract
Natural products (NPs), being evolutionary selected over millions of years to bind to biological macromolecules, remained an important source of inspiration for medicinal chemists even after the advent of efficient drug discovery technologies such as combinatorial chemistry and high-throughput screening. Thus, there is a strong demand for efficient and user-friendly computational tools that allow to analyze large libraries of NPs. In this context, we introduce NP Navigator - a freely available intuitive online tool for visualization and navigation through the chemical space of NPs and NP-like molecules. It is based on the hierarchical ensemble of generative topographic maps, featuring NPs from the COlleCtion of Open NatUral producTs (COCONUT), bioactive compounds from ChEMBL and commercially available molecules from ZINC. NP Navigator allows to efficiently analyze different aspects of NPs - chemotype distribution, physicochemical properties, biological activity and commercial availability of NPs. The latter concerns not only purchasable NPs but also their close analogs that can be considered as synthetic mimetics of NPs or pseudo-NPs.
Collapse
Affiliation(s)
- Yuliana Zabolotna
- University of Strasbourg, Laboratory of Chemoinformatics, 4, rue B. Pascal, 67081, Strasbourg, France
| | - Peter Ertl
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4056, Basel, Switzerland
| | - Dragos Horvath
- University of Strasbourg, Laboratory of Chemoinformatics, 4, rue B. Pascal, 67081, Strasbourg, France
| | - Fanny Bonachera
- University of Strasbourg, Laboratory of Chemoinformatics, 4, rue B. Pascal, 67081, Strasbourg, France
| | - Gilles Marcou
- University of Strasbourg, Laboratory of Chemoinformatics, 4, rue B. Pascal, 67081, Strasbourg, France
| | - Alexandre Varnek
- University of Strasbourg, Laboratory of Chemoinformatics, 4, rue B. Pascal, 67081, Strasbourg, France.,Institute for Chemical Reaction Design and Discovery, WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Sapporo, Kita-ku, 001-0021 Sapporo, Japan
| |
Collapse
|
12
|
Medina-Franco JL, Sánchez-Cruz N, López-López E, Díaz-Eufracio BI. Progress on open chemoinformatic tools for expanding and exploring the chemical space. J Comput Aided Mol Des 2021; 36:341-354. [PMID: 34143323 PMCID: PMC8211976 DOI: 10.1007/s10822-021-00399-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 06/14/2021] [Indexed: 01/10/2023]
Abstract
The concept of chemical space is a cornerstone in chemoinformatics, and it has broad conceptual and practical applicability in many areas of chemistry, including drug design and discovery. One of the most considerable impacts is in the study of structure-property relationships where the property can be a biological activity or any other characteristic of interest to a particular chemistry discipline. The chemical space is highly dependent on the molecular representation that is also a cornerstone concept in computational chemistry. Herein, we discuss the recent progress on chemoinformatic tools developed to expand and characterize the chemical space of compound data sets using different types of molecular representations, generate visual representations of such spaces, and explore structure-property relationships in the context of chemical spaces. We emphasize the development of methods and freely available tools focusing on drug discovery applications. We also comment on the general advantages and shortcomings of using freely available and easy-to-use tools and discuss the value of using such open resources for research, education, and scientific dissemination.
Collapse
Affiliation(s)
- José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico.
| | - Norberto Sánchez-Cruz
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico
| | - Edgar López-López
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico.,Departamento de Química y Programa de Posgrado en Farmacología, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional, Apartado 14-740, 07000, Mexico City, Mexico
| | - Bárbara I Díaz-Eufracio
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico
| |
Collapse
|
13
|
Shrivastava AD, Kell DB. FragNet, a Contrastive Learning-Based Transformer Model for Clustering, Interpreting, Visualizing, and Navigating Chemical Space. Molecules 2021; 26:2065. [PMID: 33916824 PMCID: PMC8038408 DOI: 10.3390/molecules26072065] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 03/29/2021] [Accepted: 04/01/2021] [Indexed: 12/12/2022] Open
Abstract
The question of molecular similarity is core in cheminformatics and is usually assessed via a pairwise comparison based on vectors of properties or molecular fingerprints. We recently exploited variational autoencoders to embed 6M molecules in a chemical space, such that their (Euclidean) distance within the latent space so formed could be assessed within the framework of the entire molecular set. However, the standard objective function used did not seek to manipulate the latent space so as to cluster the molecules based on any perceived similarity. Using a set of some 160,000 molecules of biological relevance, we here bring together three modern elements of deep learning to create a novel and disentangled latent space, viz transformers, contrastive learning, and an embedded autoencoder. The effective dimensionality of the latent space was varied such that clear separation of individual types of molecules could be observed within individual dimensions of the latent space. The capacity of the network was such that many dimensions were not populated at all. As before, we assessed the utility of the representation by comparing clozapine with its near neighbors, and we also did the same for various antibiotics related to flucloxacillin. Transformers, especially when as here coupled with contrastive learning, effectively provide one-shot learning and lead to a successful and disentangled representation of molecular latent spaces that at once uses the entire training set in their construction while allowing "similar" molecules to cluster together in an effective and interpretable way.
Collapse
Affiliation(s)
- Aditya Divyakant Shrivastava
- Department of Computer Science and Engineering, Nirma University, Ahmedabad 382481, India;
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St., Liverpool L69 7ZB, UK
| | - Douglas B. Kell
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St., Liverpool L69 7ZB, UK
- Novo Nordisk Foundation Centre for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs Lyngby, Denmark
- Mellizyme Ltd., Liverpool Science Park, IC1, 131 Mount Pleasant, Liverpool L3 5TF, UK
| |
Collapse
|
14
|
Torres MDT, Cao J, Franco OL, Lu TK, de la Fuente-Nunez C. Synthetic Biology and Computer-Based Frameworks for Antimicrobial Peptide Discovery. ACS NANO 2021; 15:2143-2164. [PMID: 33538585 PMCID: PMC8734659 DOI: 10.1021/acsnano.0c09509] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Antibiotic resistance is one of the greatest challenges of our time. This global health problem originated from a paucity of truly effective antibiotic classes and an increased incidence of multi-drug-resistant bacterial isolates in hospitals worldwide. Indeed, it has been recently estimated that 10 million people will die annually from drug-resistant infections by the year 2050. Therefore, the need to develop out-of-the-box strategies to combat antibiotic resistance is urgent. The biological world has provided natural templates, called antimicrobial peptides (AMPs), which exhibit multiple intrinsic medical properties including the targeting of bacteria. AMPs can be used as scaffolds and, via engineering, can be reconfigured for optimized potency and targetability toward drug-resistant pathogens. Here, we review the recent development of tools for the discovery, design, and production of AMPs and propose that the future of peptide drug discovery will involve the convergence of computational and synthetic biology principles.
Collapse
Affiliation(s)
- Marcelo D T Torres
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
| | - Jicong Cao
- Synthetic Biology Group, MIT Synthetic Biology Center, Department of Biological Engineering and Electrical Engineering and Computer Science, Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Octavio L Franco
- Centro de Análises Proteômicas e Bioquímicas, Universidade Católica de Brasília, Brasília, DF 70790160, Brazil
- S-inova Biotech, Universidade Católica Dom Bosco, Campo Grande, MS 79117010, Brazil
| | - Timothy K Lu
- Synthetic Biology Group, MIT Synthetic Biology Center, Department of Biological Engineering and Electrical Engineering and Computer Science, Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
| |
Collapse
|
15
|
Zabolotna Y, Lin A, Horvath D, Marcou G, Volochnyuk DM, Varnek A. Chemography: Searching for Hidden Treasures. J Chem Inf Model 2020; 61:179-188. [PMID: 33334102 DOI: 10.1021/acs.jcim.0c00936] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
The days when medicinal chemistry was limited to a few series of compounds of therapeutic interest are long gone. Nowadays, no human may succeed to acquire a complete overview of more than a billion existing or feasible compounds within which the potential "blockbuster drugs" are well hidden and yet only a few mouse clicks away. To reach these "hidden treasures", we adapted the generative topographic mapping method to enable efficient navigation through the chemical space, from a global overview to a structural pattern detection, covering, for the first time, the complete ZINC library of purchasable compounds, relative to 1.6 million biologically relevant ChEMBL molecules. About 40 000 hierarchical maps of the chemical space were constructed. Structural motifs inherent to only one library were identified. Roughly 20 000 off-market ChEMBL compound families represent incentives to enrich commercial catalogs. Alternatively, 125 000 ZINC-specific compound classes, absent in structure-activity bases, are novel paths to explore in medicinal chemistry. The complete list of these chemotypes can be downloaded using the link https://forms.gle/B6bUJj82t9EfmttV6.
Collapse
Affiliation(s)
- Yuliana Zabolotna
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081 France
| | - Arkadii Lin
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081 France
| | - Dragos Horvath
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081 France
| | - Gilles Marcou
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081 France
| | - Dmitriy M Volochnyuk
- Institute of Organic Chemistry National Academy of Sciences of Ukraine, Murmanska Street 5, Kyiv 02660, Ukraine.,Enamine Ltd., Chervonotkatska Street 78, Kyiv 02094, Ukraine
| | - Alexandre Varnek
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081 France
| |
Collapse
|
16
|
Lin A, Baskin II, Marcou G, Horvath D, Beck B, Varnek A. Parallel Generative Topographic Mapping: An Efficient Approach for Big Data Handling. Mol Inform 2020; 39:e2000009. [PMID: 32347666 PMCID: PMC7757192 DOI: 10.1002/minf.202000009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Accepted: 04/10/2020] [Indexed: 11/12/2022]
Abstract
Generative Topographic Mapping (GTM) can be efficiently used to visualize, analyze and model large chemical data. The GTM manifold needs to span the chemical space deemed relevant for a given problem. Therefore, the Frame set (FS) of compounds used for the manifold construction must well cover a given chemical space. Intuitively, the FS size must raise with the size and diversity of the target library. At the same time, the GTM training can be very slow or even becomes technically impossible at FS sizes of the order of 105 compounds - which is a very small number compared to today's commercially accessible compounds, and, especially, to the theoretically feasible molecules. In order to solve this problem, we propose a Parallel GTM algorithm based on the merging of "intermediate" manifolds constructed in parallel for different subsets of molecules. An ensemble of these subsets forms a FS for the "final" manifold. In order to assess the efficiency of the new algorithm, 80 GTMs were built on the FSs of different sizes ranging from 10 to 1.8 M compounds selected from the ChEMBL database. Each GTM was challenged to build classification models for up to 712 biological activities (depending on the FS size). With the novel parallel GTM procedure, we could thus cover the entire spectrum of possible FS sizes, whereas previous studies were forced to rely on the working hypothesis that FS sizes of few thousands of compounds are sufficient to describe the ChEMBL chemical space. In fact, this study formally proves this to be true: a FS containing only 5000 randomly picked compounds is sufficient to represent the entire ChEMBL collection (1.8 M molecules), in the sense that a further increase of FS compound numbers has no benefice impact on the predictive propensity of the above-mentioned 712 activity classification models. Parallel GTM may, however, be required to generate maps based on very large FS, that might improve chemical space cartography of big commercial and virtual libraries, approaching billions of compounds.
Collapse
Affiliation(s)
- Arkadii Lin
- University of StrasbourgLaboratory of Chemoinformatics, Faculty of Chemistry4, Blaise Pascal str.67081StrasbourgFrance
| | - Igor I. Baskin
- Faculty of PhysicsLomonosov Moscow State University1/2, Leninskie Gory str.119991MoscowRussia
| | - Gilles Marcou
- University of StrasbourgLaboratory of Chemoinformatics, Faculty of Chemistry4, Blaise Pascal str.67081StrasbourgFrance
| | - Dragos Horvath
- University of StrasbourgLaboratory of Chemoinformatics, Faculty of Chemistry4, Blaise Pascal str.67081StrasbourgFrance
| | - Bernd Beck
- Department of Medicinal ChemistryBoehringer Ingelheim Pharma GmbH & Co. KG65, Birkendorfer str.88397Biberach an der RissGermany
| | - Alexandre Varnek
- University of StrasbourgLaboratory of Chemoinformatics, Faculty of Chemistry4, Blaise Pascal str.67081StrasbourgFrance
| |
Collapse
|
17
|
Mercer DK, Torres MDT, Duay SS, Lovie E, Simpson L, von Köckritz-Blickwede M, de la Fuente-Nunez C, O'Neil DA, Angeles-Boza AM. Antimicrobial Susceptibility Testing of Antimicrobial Peptides to Better Predict Efficacy. Front Cell Infect Microbiol 2020; 10:326. [PMID: 32733816 PMCID: PMC7358464 DOI: 10.3389/fcimb.2020.00326] [Citation(s) in RCA: 77] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 05/29/2020] [Indexed: 12/11/2022] Open
Abstract
During the development of antimicrobial peptides (AMP) as potential therapeutics, antimicrobial susceptibility testing (AST) stands as an essential part of the process in identification and optimisation of candidate AMP. Standard methods for AST, developed almost 60 years ago for testing conventional antibiotics, are not necessarily fit for purpose when it comes to determining the susceptibility of microorganisms to AMP. Without careful consideration of the parameters comprising AST there is a risk of failing to identify novel antimicrobials at a time when antimicrobial resistance (AMR) is leading the planet toward a post-antibiotic era. More physiologically/clinically relevant AST will allow better determination of the preclinical activity of drug candidates and allow the identification of lead compounds. An important consideration is the efficacy of AMP in biological matrices replicating sites of infection, e.g., blood/plasma/serum, lung bronchiolar lavage fluid/sputum, urine, biofilms, etc., as this will likely be more predictive of clinical efficacy. Additionally, specific AST for different target microorganisms may help to better predict efficacy of AMP in specific infections. In this manuscript, we describe what we believe are the key considerations for AST of AMP and hope that this information can better guide the preclinical development of AMP toward becoming a new generation of urgently needed antimicrobials.
Collapse
Affiliation(s)
| | - Marcelo D. T. Torres
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, Penn Institute for Computational Science, and Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, United States
| | - Searle S. Duay
- Department of Chemistry, Institute of Materials Science, University of Connecticut, Storrs, CT, United States
| | - Emma Lovie
- NovaBiotics Ltd, Aberdeen, United Kingdom
| | | | | | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, Penn Institute for Computational Science, and Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, United States
| | | | - Alfredo M. Angeles-Boza
- Department of Chemistry, Institute of Materials Science, University of Connecticut, Storrs, CT, United States
| |
Collapse
|
18
|
Rauer C, Bereau T. Hydration free energies from kernel-based machine learning: Compound-database bias. J Chem Phys 2020; 153:014101. [DOI: 10.1063/5.0012230] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Clemens Rauer
- Max Planck Institute for Polymer Research, 55128 Mainz, Germany
| | - Tristan Bereau
- Max Planck Institute for Polymer Research, 55128 Mainz, Germany
- Van ’t Hoff Institute for Molecular Sciences and Informatics Institute, University of Amsterdam, Amsterdam 1098 XH, The Netherlands
| |
Collapse
|
19
|
Hoffmann C, Centi A, Menichetti R, Bereau T. Molecular dynamics trajectories for 630 coarse-grained drug-membrane permeations. Sci Data 2020; 7:51. [PMID: 32054852 PMCID: PMC7018832 DOI: 10.1038/s41597-020-0391-0] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Accepted: 01/22/2020] [Indexed: 02/07/2023] Open
Abstract
The permeation of small-molecule drugs across a phospholipid membrane bears much interest both in the pharmaceutical sciences and in physical chemistry. Connecting the chemistry of the drug and the lipids to the resulting thermodynamic properties remains of immediate importance. Here we report molecular dynamics (MD) simulation trajectories using the coarse-grained (CG) Martini force field. A wide, representative coverage of chemistry is provided: across solutes-exhaustively enumerating all 105 CG dimers-and across six phospholipids. For each combination, umbrella-sampling simulations provide detailed structural information of the solute at all depths from the bilayer midplane to bulk water, allowing a precise reconstruction of the potential of mean force. Overall, the present database contains trajectories from 15,120 MD simulations. This database may serve the further identification of structure-property relationships between compound chemistry and drug permeability.
Collapse
Affiliation(s)
| | - Alessia Centi
- Max Planck Institute for Polymer Research, 55128, Mainz, Germany
| | - Roberto Menichetti
- Max Planck Institute for Polymer Research, 55128, Mainz, Germany
- Physics Department, University of Trento, 38123, Trento, Italy
- INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, 38123, Trento, Italy
| | - Tristan Bereau
- Max Planck Institute for Polymer Research, 55128, Mainz, Germany.
| |
Collapse
|
20
|
Capecchi A, Zhang A, Reymond JL. Populating Chemical Space with Peptides Using a Genetic Algorithm. J Chem Inf Model 2020; 60:121-132. [PMID: 31868369 DOI: 10.1021/acs.jcim.9b01014] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
In drug discovery, one uses chemical space as a concept to organize molecules according to their structures and properties. One often would like to generate new possible molecules at a specific location in the chemical space marked by a molecule of interest. Herein, we report the peptide design genetic algorithm (PDGA, code available at https://github.com/reymond-group/PeptideDesignGA ), a computational tool capable of producing peptide sequences of various topologies (linear, cyclic/polycyclic, or dendritic) in proximity of any molecule of interest in a chemical space defined by macromolecule extended atom-pair fingerprint (MXFP), an atom-pair fingerprint describing molecular shape and pharmacophores. We show that the PDGA generates high-similarity analogues of bioactive peptides with diverse peptide chain topologies and of nonpeptide target molecules. We illustrate the chemical space accessible by the PDGA with an interactive 3D map of the MXFP property space available at http://faerun.gdb.tools/ . The PDGA should be generally useful to generate peptides at any location in the chemical space.
Collapse
Affiliation(s)
- Alice Capecchi
- Department of Chemistry and Biochemistry , University of Bern , Freiestrasse 3 , 3012 Bern , Switzerland
| | - Alain Zhang
- Department of Chemistry and Biochemistry , University of Bern , Freiestrasse 3 , 3012 Bern , Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry , University of Bern , Freiestrasse 3 , 3012 Bern , Switzerland
| |
Collapse
|
21
|
Horvath D, Marcou G, Varnek A. Generative topographic mapping in drug design. DRUG DISCOVERY TODAY. TECHNOLOGIES 2019; 32-33:99-107. [PMID: 33386101 DOI: 10.1016/j.ddtec.2020.06.003] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 06/10/2020] [Accepted: 06/18/2020] [Indexed: 06/12/2023]
Abstract
This is a review article of Generative Topographic Mapping (GTM) - a non-linear dimensionality reduction technique producing generative 2D maps of high-dimensional vector spaces - and its specific applications in Drug Design (chemical space cartography, compound library design and analysis, virtual screening, pharmacological profiling, de novo drug design, conformational space & docking interaction cartography, etc.) Written by chemoinformaticians for potential users among medicinal chemists and biologists, the article purposely avoids all underlying mathematics. First, the GTM concept is intuitively explained, based on the strong analogies with the rather popular Self-Organizing Maps (SOMs), which are well established library analysis tools. GTM is basically a fuzzy-logics-based generalization of SOMs. The second part of the review, some of published GTM applications in drug design are briefly revisited.
Collapse
Affiliation(s)
- Dragos Horvath
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France.
| | - Gilles Marcou
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France.
| |
Collapse
|
22
|
Grygorenko OO, Volochnyuk DM, Ryabukhin SV, Judd DB. The Symbiotic Relationship Between Drug Discovery and Organic Chemistry. Chemistry 2019; 26:1196-1237. [PMID: 31429510 DOI: 10.1002/chem.201903232] [Citation(s) in RCA: 102] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Revised: 08/19/2019] [Indexed: 12/20/2022]
Abstract
All pharmaceutical products contain organic molecules; the source may be a natural product or a fully synthetic molecule, or a combination of both. Thus, it follows that organic chemistry underpins both existing and upcoming pharmaceutical products. The reverse relationship has also affected organic synthesis, changing its landscape towards increasingly complex targets. This Review article sets out to give a concise appraisal of this symbiotic relationship between organic chemistry and drug discovery, along with a discussion of the design concepts and highlighting key milestones along the journey. In particular, criteria for a high-quality compound library design enabling efficient virtual navigation of chemical space, as well as rise and fall of concepts for its synthetic exploration (such as combinatorial chemistry; diversity-, biology-, lead-, or fragment-oriented syntheses; and DNA-encoded libraries) are critically surveyed.
Collapse
Affiliation(s)
- Oleksandr O Grygorenko
- Enamine Ltd., Chervonotkatska Street 78, Kiev, 02094, Ukraine.,Taras Shevchenko National University of Kiev, Volodymyrska Street 60, Kiev, 01601, Ukraine
| | - Dmitriy M Volochnyuk
- Enamine Ltd., Chervonotkatska Street 78, Kiev, 02094, Ukraine.,Taras Shevchenko National University of Kiev, Volodymyrska Street 60, Kiev, 01601, Ukraine.,Institute of Organic Chemistry, National Academy of Sciences of Ukraine, Murmanska Street 5, Kiev, 02660, Ukraine
| | - Sergey V Ryabukhin
- Enamine Ltd., Chervonotkatska Street 78, Kiev, 02094, Ukraine.,Taras Shevchenko National University of Kiev, Volodymyrska Street 60, Kiev, 01601, Ukraine
| | - Duncan B Judd
- Awridian Ltd., Stevenage Bioscience Catalyst, Gunnelswood Road, Stevenage, Herts, SG1 2FX, UK
| |
Collapse
|
23
|
Lin A, Beck B, Horvath D, Marcou G, Varnek A. Diversifying chemical libraries with generative topographic mapping. J Comput Aided Mol Des 2019; 34:805-815. [DOI: 10.1007/s10822-019-00215-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Accepted: 07/15/2019] [Indexed: 01/28/2023]
|
24
|
Horvath D, Marcou G, Varnek A. Generative Topographic Mapping of the Docking Conformational Space. Molecules 2019; 24:molecules24122269. [PMID: 31216756 PMCID: PMC6631714 DOI: 10.3390/molecules24122269] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 06/14/2019] [Accepted: 06/15/2019] [Indexed: 12/21/2022] Open
Abstract
Following previous efforts to render the Conformational Space (CS) of flexible compounds by Generative Topographic Mapping (GTM), this polyvalent mapping technique is here adapted to the docking problem. Contact fingerprints (CF) characterize ligands from the perspective of the binding site by monitoring protein atoms that are “touched” by those of the ligand. A “Contact” (CF) map was built by GTM-driven dimensionality reduction of the CF vector space. Alternatively, a “Hybrid” (Hy) map used a composite descriptor of CFs concatenated with ligand fragment descriptors. These maps indirectly represent the active site and integrate the binding information of multiple ligands. The concept is illustrated by a docking study into the ATP-binding site of CDK2, using the S4MPLE program to generate thousands of poses for each ligand. Both maps were challenged to (1) Discriminate native from non-native ligand poses, e.g., create RMSD-landscapes “colored” by the conformer ensemble of ligands of known binding modes in order to highlight “native” map zones (poses with RMSD to PDB structures < 2Å). Then, projection of poses of other ligands on such landscapes might serve to predict those falling in native zones as being well-docked. (2) Distinguish ligands–characterized by their ensemble of conformers–by their potency, e.g., testing the hypotheses whether zones privileged by potent binders are clearly separated from the ones preferred by decoys on the maps. Hybrid maps were better in both challenges and outperformed the classical energy and individual contact satisfaction scores in discriminating ligands by potency. Moreover, the intuitive visualization and analysis of docking CS may, as already mentioned, have several applications–from highlighting of key contacts to monitoring docking calculation convergence.
Collapse
Affiliation(s)
- Dragos Horvath
- Laboratoire de Chemoinformatique, UMR7140 CNRS/Univ. of Strasbourg, 1, rue Blaise Pascal, 67000 Strasbourg, France.
| | - Gilles Marcou
- Laboratoire de Chemoinformatique, UMR7140 CNRS/Univ. of Strasbourg, 1, rue Blaise Pascal, 67000 Strasbourg, France.
| | - Alexandre Varnek
- Laboratoire de Chemoinformatique, UMR7140 CNRS/Univ. of Strasbourg, 1, rue Blaise Pascal, 67000 Strasbourg, France.
| |
Collapse
|
25
|
Awale M, Sirockin F, Stiefl N, Reymond JL. Medicinal Chemistry Aware Database GDBMedChem. Mol Inform 2019; 38:e1900031. [PMID: 31169974 DOI: 10.1002/minf.201900031] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 05/21/2019] [Indexed: 12/17/2022]
Abstract
The generated database GDB17 enumerates 166.4 billion possible molecules up to 17 atoms of C, N, O, S and halogens following simple chemical stability and synthetic feasibility rules, however medicinal chemistry criteria are not taken into account. Here we applied rules inspired by medicinal chemistry to exclude problematic functional groups and complex molecules from GDB17, and sampled the resulting subset uniformly across molecular size, stereochemistry and polarity to form GDBMedChem as a compact collection of 10 million small molecules. This collection has reduced complexity and better synthetic accessibility than the entire GDB17 but retains higher sp3 -carbon fraction and natural product likeness scores compared to known drugs. GDBMedChem molecules are more diverse and very different from known molecules in terms of substructures and represent an unprecedented source of diversity for drug design. GDBMedChem is available for 3D-visualization, similarity searching and for download at http://gdb.unibe.ch.
Collapse
Affiliation(s)
- Mahendra Awale
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Finton Sirockin
- Novartis Institutes for Biomedical Research, Basel, Switzerland
| | - Nikolaus Stiefl
- Novartis Institutes for Biomedical Research, Basel, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| |
Collapse
|
26
|
Lin A, Horvath D, Marcou G, Beck B, Varnek A. Multi-task generative topographic mapping in virtual screening. J Comput Aided Mol Des 2019; 33:331-343. [PMID: 30739238 DOI: 10.1007/s10822-019-00188-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 02/02/2019] [Indexed: 12/16/2022]
Abstract
The previously reported procedure to generate "universal" Generative Topographic Maps (GTMs) of the drug-like chemical space is in practice a multi-task learning process, in which both operational GTM parameters (example: map grid size) and hyperparameters (key example: the molecular descriptor space to be used) are being chosen by an evolutionary process in order to fit/select "universal" GTM manifolds. After selection (a one-time task aimed at optimizing the compromise in terms of neighborhood behavior compliance, over a large pool of various biological targets), for any further use the manifolds are ready to provide "fit-free" predictive models. Using any structure-activity set-irrespectively whether the associated target served at map fitting stage or not-the generation or "coloring" a property landscape enables predicting the property for any external molecule, with zero additional fitable parameters involved. While previous works have signaled the excellent behavior of such models in aggressive three-fold cross-validation assessments of their predictive power, the present work wished to explore their behavior in Virtual Screening (VS), here simulated on hand of external DUD ligand and decoy series that are fully disjoint from the ChEMBL-extracted landscape coloring sets. Beyond the rather robust results of the universal GTM manifolds in this challenge, it could be shown that the descriptor spaces selected by the evolutionary multi-task learner were intrinsically able to serve as an excellent support for many other VS procedures, starting from parameter-free similarity searching, to local (target-specific) GTM models, to parameter-rich, nonlinear Random Forest and Neural Network approaches.
Collapse
Affiliation(s)
- Arkadii Lin
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4, Blaise Pascal Str., 67081, Strasbourg, France.,Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorferstrasse 65, 88397, Biberach an der Riss, Germany
| | - Dragos Horvath
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4, Blaise Pascal Str., 67081, Strasbourg, France
| | - Gilles Marcou
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4, Blaise Pascal Str., 67081, Strasbourg, France
| | - Bernd Beck
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorferstrasse 65, 88397, Biberach an der Riss, Germany
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4, Blaise Pascal Str., 67081, Strasbourg, France.
| |
Collapse
|
27
|
Casciuc I, Zabolotna Y, Horvath D, Marcou G, Bajorath J, Varnek A. Virtual Screening with Generative Topographic Maps: How Many Maps Are Required? J Chem Inf Model 2018; 59:564-572. [DOI: 10.1021/acs.jcim.8b00650] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Iuri Casciuc
- Laboratoire de Chémoinformatique UMR 7140 CNRS, Institut LeBel 4, rue B. Pascal 67081 Strasbourg, France
| | - Yuliana Zabolotna
- Laboratoire de Chémoinformatique UMR 7140 CNRS, Institut LeBel 4, rue B. Pascal 67081 Strasbourg, France
| | - Dragos Horvath
- Laboratoire de Chémoinformatique UMR 7140 CNRS, Institut LeBel 4, rue B. Pascal 67081 Strasbourg, France
| | - Gilles Marcou
- Laboratoire de Chémoinformatique UMR 7140 CNRS, Institut LeBel 4, rue B. Pascal 67081 Strasbourg, France
| | - Jürgen Bajorath
- B-IT, Limes, Unit Chem. Biol. & Med. Chem., University of Bonn, 53115 Bonn, Germany
| | - Alexandre Varnek
- Laboratoire de Chémoinformatique UMR 7140 CNRS, Institut LeBel 4, rue B. Pascal 67081 Strasbourg, France
| |
Collapse
|
28
|
Abstract
Advances in computer processing speed and storage capacity have enabled researchers to generate virtual chemical libraries containing billions of molecules. While these numbers appear large, they are only a small fraction of the number of organic molecules that could potentially be synthesized. This review provides an overview of recent advances in the generation and use of virtual chemical libraries in medicinal chemistry. We also consider the practical implications of these libraries in drug discovery programs and highlight a number of current and future challenges.
Collapse
Affiliation(s)
- W Patrick Walters
- Relay Therapeutics , 215 First Street , Cambridge , Massachusetts 02142 , United States
| |
Collapse
|
29
|
Chen H, Kogej T, Engkvist O. Cheminformatics in Drug Discovery, an Industrial Perspective. Mol Inform 2018; 37:e1800041. [PMID: 29774657 DOI: 10.1002/minf.201800041] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2018] [Accepted: 04/23/2018] [Indexed: 12/12/2022]
Abstract
Cheminformatics has established itself as a core discipline within large scale drug discovery operations. It would be impossible to handle the amount of data generated today in a small molecule drug discovery project without persons skilled in cheminformatics. In addition, due to increased emphasis on "Big Data", machine learning and artificial intelligence, not only in the society in general, but also in drug discovery, it is expected that the cheminformatics field will be even more important in the future. Traditional areas like virtual screening, library design and high-throughput screening analysis are highlighted in this review. Applying machine learning in drug discovery is an area that has become very important. Applications of machine learning in early drug discovery has been extended from predicting ADME properties and target activity to tasks like de novo molecular design and prediction of chemical reactions.
Collapse
Affiliation(s)
- Hongming Chen
- Hit Discovery, Discovery Sciences, Innovative Medicines and Early, Development Biotech Unit, AstraZeneca R&D Gothenburg, 431 83, Mölndal, Sweden
| | - Thierry Kogej
- Hit Discovery, Discovery Sciences, Innovative Medicines and Early, Development Biotech Unit, AstraZeneca R&D Gothenburg, 431 83, Mölndal, Sweden
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, Innovative Medicines and Early, Development Biotech Unit, AstraZeneca R&D Gothenburg, 431 83, Mölndal, Sweden
| |
Collapse
|