1
|
Stacey J, Canault B, Pickett SD, Gillet VJ. Visualising lead optimisation series using reduced graphs. J Cheminform 2025; 17:60. [PMID: 40275393 PMCID: PMC12023594 DOI: 10.1186/s13321-025-01002-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2024] [Accepted: 03/30/2025] [Indexed: 04/26/2025] Open
Abstract
The typical way in which lead optimisation (LO) series are represented in the medicinal chemistry literature is as Markush structures and associated R-group tables. The Markush structure shows a central core or molecular scaffold that is common to the series with R groups that indicate the points of variability that have been explored in the series. The associated R-group table shows the substituent combinations that exist in individual molecules in the series together with properties of those compounds. This format provides an intuitive way of visualising any structure-activity relationship (SAR) that is present. Automated approaches that attempt to reproduce this well understood format, such as the SAR map, are based on maximum common substructure approaches and do not take account of small changes that may be made to the core structure itself or of the situation where more than one core exists in the data. Here we describe an automated approach to represent LO series that is based on reduced graph descriptions of molecules. A publicly available LO dataset from a drug discovery programme at GSK is analysed to show how the method can group together compounds from the same series even when there are small substructural differences within the core of the series while also being able to identify different related compound series. The resulting visualisation is useful in identifying areas where series are under explored and for mapping design ideas onto the current dataset. The code to generate the visualisations is released into the public domain to promote further research in this area.Scientific contribution: We describe a software tool for analysing lead optimisation series using reduced graph representations of molecules. The representation allows compounds that have similar but not identical chemical scaffolds to be grouped together and is, therefore, an advance on methods that are based on the more traditional Markush structure and SAR tables. The software is a useful addition to the med chem toolbox as it can provide a holistic view of lead optimisation data by representing what might otherwise be seen as separate series as a single series of compounds.
Collapse
Affiliation(s)
- Jessica Stacey
- Information School, University of Sheffield, The Wave, 2 Whitham Road, Sheffield, S10 2AH, UK
| | - Baptiste Canault
- GlaxoSmithKline, Gunnels Wood Road, Stevenage, Herts, SG1 2NY, UK
| | | | - Valerie J Gillet
- Information School, University of Sheffield, The Wave, 2 Whitham Road, Sheffield, S10 2AH, UK.
| |
Collapse
|
2
|
López-Pérez K, Avellaneda-Tamayo JF, Chen L, López-López E, Juárez-Mercado KE, Medina-Franco JL, Miranda-Quintana RA. Molecular similarity: Theory, applications, and perspectives. ARTIFICIAL INTELLIGENCE CHEMISTRY 2024; 2:100077. [PMID: 40124654 PMCID: PMC11928018 DOI: 10.1016/j.aichem.2024.100077] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/25/2025]
Abstract
Molecular similarity pervades much of our understanding and rationalization of chemistry. This has become particularly evident in the current data-intensive era of chemical research, with similarity measures serving as the backbone of many Machine Learning (ML) supervised and unsupervised procedures. Here, we present a discussion on the role of molecular similarity in drug design, chemical space exploration, chemical "art" generation, molecular representations, and many more. We also discuss more recent topics in molecular similarity, like the ability to efficiently compare large molecular libraries.
Collapse
Affiliation(s)
- Kenneth López-Pérez
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, FL 32611, USA
| | - Juan F. Avellaneda-Tamayo
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City 04510, Mexico
| | - Lexin Chen
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, FL 32611, USA
| | - Edgar López-López
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City 04510, Mexico
- Department of Chemistry and Graduate Program in Pharmacology, Center for Research and Advanced Studies of the National Polytechnic Institute, Section 14-740, Mexico City 07000, Mexico
| | - K. Eurídice Juárez-Mercado
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City 04510, Mexico
| | - José L. Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City 04510, Mexico
| | | |
Collapse
|
3
|
Li Y, Peng C, Chi F, Huang Z, Yuan M, Zhou X, Jiang C. The iPhylo suite: an interactive platform for building and annotating biological and chemical taxonomic trees. Brief Bioinform 2024; 26:bbae679. [PMID: 39737565 DOI: 10.1093/bib/bbae679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Revised: 11/06/2024] [Accepted: 12/13/2024] [Indexed: 01/01/2025] Open
Abstract
Accurate and rapid taxonomic classifications are essential for systematically exploring organisms and metabolites in diverse environments. Many tools have been developed for biological taxonomic trees, but limitations apply, and a streamlined method for constructing chemical taxonomic trees is notably absent. We present the iPhylo suite (https://www.iphylo.net/), a comprehensive, automated, and interactive platform for biological and chemical taxonomic analysis. The iPhylo suite features web-based modules for the interactive construction and annotation of taxonomic trees and a stand-alone command-line interface (CLI) for local operation or deployment on high-performance computing (HPC) clusters. iPhylo supports National Center for Biotechnology Information (NCBI) taxonomy for biologicals and ChemOnt and NPClassifier for chemical classifications. The iPhylo visualization module, fully implemented in R, allows users to save progress locally and customize the underlying R code. Finally, the CLI module facilitates analysis across all hierarchical relational databases. We showcase the iPhylo suite's capabilities for visualizing environmental microbiomes, analyzing gut microbial metabolite synthesis preferences, and discovering novel correlations between microbiome and metabolome in humans and environment. Overall, the iPhylo suite is distinguished by its unified and interactive framework for in-depth taxonomic and integrative analyses of biological and chemical features and beyond.
Collapse
Affiliation(s)
- Yueer Li
- MOE Key Laboratory of Biosystems Homeostasis & Protection, and Zhejiang Provincial Key Laboratory of Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, 866 Yuhangtang Road, Xihu District, Hangzhou, Zhejiang 310030, China
| | - Chen Peng
- MOE Key Laboratory of Biosystems Homeostasis & Protection, and Zhejiang Provincial Key Laboratory of Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, 866 Yuhangtang Road, Xihu District, Hangzhou, Zhejiang 310030, China
| | - Fei Chi
- Innovation Center of Yangtze River Delta, Zhejiang University, 828 Zhongxing Road, Jiashan County, Jiaxing, Zhejiang 314103, China
| | - Zinuo Huang
- MOE Key Laboratory of Biosystems Homeostasis & Protection, and Zhejiang Provincial Key Laboratory of Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, 866 Yuhangtang Road, Xihu District, Hangzhou, Zhejiang 310030, China
| | - Mengyi Yuan
- MOE Key Laboratory of Biosystems Homeostasis & Protection, and Zhejiang Provincial Key Laboratory of Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, 866 Yuhangtang Road, Xihu District, Hangzhou, Zhejiang 310030, China
| | - Xin Zhou
- Department of Genetics, Stanford University, Stanford, 291 Campus Drive, Santa Clara County, CA 94305, United States
| | - Chao Jiang
- MOE Key Laboratory of Biosystems Homeostasis & Protection, and Zhejiang Provincial Key Laboratory of Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, 866 Yuhangtang Road, Xihu District, Hangzhou, Zhejiang 310030, China
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, First Affiliated Hospital, Zhejiang University School of Medicine, 79 Qingchun Road, Shangcheng District, Hangzhou, Zhejiang 310009, China
- Center for Life Sciences, Shaoxing Institute, Zhejiang University, 8 Nanbin East Road, Shangyu District, Shaoxing, Zhejiang 321000, China
| |
Collapse
|
4
|
Zabolotna Y, Bonachera F, Horvath D, Lin A, Marcou G, Klimchuk O, Varnek A. Chemspace Atlas: Multiscale Chemography of Ultralarge Libraries for Drug Discovery. J Chem Inf Model 2022; 62:4537-4548. [DOI: 10.1021/acs.jcim.2c00509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Yuliana Zabolotna
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Fanny Bonachera
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Dragos Horvath
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Arkadii Lin
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Gilles Marcou
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Olga Klimchuk
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Alexandre Varnek
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| |
Collapse
|
5
|
Chong LC, Gandhi G, Lee JM, Yeo WWY, Choi SB. Drug Discovery of Spinal Muscular Atrophy (SMA) from the Computational Perspective: A Comprehensive Review. Int J Mol Sci 2021; 22:8962. [PMID: 34445667 PMCID: PMC8396480 DOI: 10.3390/ijms22168962] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Accepted: 01/27/2021] [Indexed: 01/02/2023] Open
Abstract
Spinal muscular atrophy (SMA), one of the leading inherited causes of child mortality, is a rare neuromuscular disease arising from loss-of-function mutations of the survival motor neuron 1 (SMN1) gene, which encodes the SMN protein. When lacking the SMN protein in neurons, patients suffer from muscle weakness and atrophy, and in the severe cases, respiratory failure and death. Several therapeutic approaches show promise with human testing and three medications have been approved by the U.S. Food and Drug Administration (FDA) to date. Despite the shown promise of these approved therapies, there are some crucial limitations, one of the most important being the cost. The FDA-approved drugs are high-priced and are shortlisted among the most expensive treatments in the world. The price is still far beyond affordable and may serve as a burden for patients. The blooming of the biomedical data and advancement of computational approaches have opened new possibilities for SMA therapeutic development. This article highlights the present status of computationally aided approaches, including in silico drug repurposing, network driven drug discovery as well as artificial intelligence (AI)-assisted drug discovery, and discusses the future prospects.
Collapse
Affiliation(s)
- Li Chuin Chong
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Suite 9.2, 9th Floor, Wisma Chase Perdana, Changkat Semantan, Kuala Lumpur 50490, Malaysia; (L.C.C.); (J.M.L.)
| | - Gayatri Gandhi
- Perdana University Graduate School of Medicine, Perdana University, Suite 9.2, 9th Floor, Wisma Chase Perdana, Changkat Semantan, Kuala Lumpur 50490, Malaysia; (G.G.); (W.W.Y.Y.)
| | - Jian Ming Lee
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Suite 9.2, 9th Floor, Wisma Chase Perdana, Changkat Semantan, Kuala Lumpur 50490, Malaysia; (L.C.C.); (J.M.L.)
| | - Wendy Wai Yeng Yeo
- Perdana University Graduate School of Medicine, Perdana University, Suite 9.2, 9th Floor, Wisma Chase Perdana, Changkat Semantan, Kuala Lumpur 50490, Malaysia; (G.G.); (W.W.Y.Y.)
| | - Sy-Bing Choi
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Suite 9.2, 9th Floor, Wisma Chase Perdana, Changkat Semantan, Kuala Lumpur 50490, Malaysia; (L.C.C.); (J.M.L.)
| |
Collapse
|
6
|
A machine-learning-based alloy design platform that enables both forward and inverse predictions for thermo-mechanically controlled processed (TMCP) steel alloys. Sci Rep 2021; 11:11012. [PMID: 34040040 PMCID: PMC8155048 DOI: 10.1038/s41598-021-90237-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 05/10/2021] [Indexed: 11/29/2022] Open
Abstract
Predicting mechanical properties such as yield strength (YS) and ultimate tensile strength (UTS) is an intricate undertaking in practice, notwithstanding a plethora of well-established theoretical and empirical models. A data-driven approach should be a fundamental exercise when making YS/UTS predictions. For this study, we collected 16 descriptors (attributes) that implicate the compositional and processing information and the corresponding YS/UTS values for 5473 thermo-mechanically controlled processed (TMCP) steel alloys. We set up an integrated machine-learning (ML) platform consisting of 16 ML algorithms to predict the YS/UTS based on the descriptors. The integrated ML platform involved regularization-based linear regression algorithms, ensemble ML algorithms, and some non-linear ML algorithms. Despite the dirty nature of most real-world industry data, we obtained acceptable holdout dataset test results such as R2 > 0.6 and MSE < 0.01 for seven non-linear ML algorithms. The seven fully trained non-linear ML models were used for the ensuing ‘inverse design (prediction)’ based on an elitist-reinforced, non-dominated sorting genetic algorithm (NSGA-II). The NSGA-II enabled us to predict solutions that exhibit desirable YS/UTS values for each ML algorithm. In addition, the NSGA-II-driven solutions in the 16-dimensional input feature space were visualized using holographic research strategy (HRS) in order to systematically compare and analyze the inverse-predicted solutions for each ML algorithm.
Collapse
|
7
|
Probst D, Reymond JL. Visualization of very large high-dimensional data sets as minimum spanning trees. J Cheminform 2020; 12:12. [PMID: 33431043 PMCID: PMC7015965 DOI: 10.1186/s13321-020-0416-x] [Citation(s) in RCA: 146] [Impact Index Per Article: 29.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 02/04/2020] [Indexed: 01/10/2023] Open
Abstract
The chemical sciences are producing an unprecedented amount of large, high-dimensional data sets containing chemical structures and associated properties. However, there are currently no algorithms to visualize such data while preserving both global and local features with a sufficient level of detail to allow for human inspection and interpretation. Here, we propose a solution to this problem with a new data visualization method, TMAP, capable of representing data sets of up to millions of data points and arbitrary high dimensionality as a two-dimensional tree (http://tmap.gdb.tools). Visualizations based on TMAP are better suited than t-SNE or UMAP for the exploration and interpretation of large data sets due to their tree-like nature, increased local and global neighborhood and structure preservation, and the transparency of the methods the algorithm is based on. We apply TMAP to the most used chemistry data sets including databases of molecules such as ChEMBL, FDB17, the Natural Products Atlas, DSSTox, as well as to the MoleculeNet benchmark collection of data sets. We also show its broad applicability with further examples from biology, particle physics, and literature.![]()
Collapse
Affiliation(s)
- Daniel Probst
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland.
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland.
| |
Collapse
|
8
|
Meyer JG, Liu S, Miller IJ, Coon JJ, Gitter A. Learning Drug Functions from Chemical Structures with Convolutional Neural Networks and Random Forests. J Chem Inf Model 2019; 59:4438-4449. [PMID: 31518132 PMCID: PMC6819987 DOI: 10.1021/acs.jcim.9b00236] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Indexed: 02/08/2023]
Abstract
Empirical testing of chemicals for drug efficacy costs many billions of dollars every year. The ability to predict the action of molecules in silico would greatly increase the speed and decrease the cost of prioritizing drug leads. Here, we asked whether drug function, defined as MeSH "therapeutic use" classes, can be predicted from only a chemical structure. We evaluated two chemical-structure-derived drug classification methods, chemical images with convolutional neural networks and molecular fingerprints with random forests, both of which outperformed previous predictions that used drug-induced transcriptomic changes as chemical representations. This suggests that the structure of a chemical contains at least as much information about its therapeutic use as the transcriptional cellular response to that chemical. Furthermore, because training data based on chemical structure is not limited to a small set of molecules for which transcriptomic measurements are available, our strategy can leverage more training data to significantly improve predictive accuracy to 83-88%. Finally, we explore use of these models for prediction of side effects and drug-repurposing opportunities and demonstrate the effectiveness of this modeling strategy for multilabel classification.
Collapse
Affiliation(s)
- Jesse G. Meyer
- Department
of Chemistry, Department of Biomolecular Chemistry, National Center for
Quantitative Biology of Complex Systems, Department of Computer Sciences, Morgridge Institute
for Research, DOE Great Lakes Bioenergy Research Center, and Department of Biostatistics and
Medical Informatics, University of Wisconsin—Madison, Madison, Wisconsin 53706, United States
| | - Shengchao Liu
- Department
of Chemistry, Department of Biomolecular Chemistry, National Center for
Quantitative Biology of Complex Systems, Department of Computer Sciences, Morgridge Institute
for Research, DOE Great Lakes Bioenergy Research Center, and Department of Biostatistics and
Medical Informatics, University of Wisconsin—Madison, Madison, Wisconsin 53706, United States
| | - Ian J. Miller
- Department
of Chemistry, Department of Biomolecular Chemistry, National Center for
Quantitative Biology of Complex Systems, Department of Computer Sciences, Morgridge Institute
for Research, DOE Great Lakes Bioenergy Research Center, and Department of Biostatistics and
Medical Informatics, University of Wisconsin—Madison, Madison, Wisconsin 53706, United States
| | - Joshua J. Coon
- Department
of Chemistry, Department of Biomolecular Chemistry, National Center for
Quantitative Biology of Complex Systems, Department of Computer Sciences, Morgridge Institute
for Research, DOE Great Lakes Bioenergy Research Center, and Department of Biostatistics and
Medical Informatics, University of Wisconsin—Madison, Madison, Wisconsin 53706, United States
| | - Anthony Gitter
- Department
of Chemistry, Department of Biomolecular Chemistry, National Center for
Quantitative Biology of Complex Systems, Department of Computer Sciences, Morgridge Institute
for Research, DOE Great Lakes Bioenergy Research Center, and Department of Biostatistics and
Medical Informatics, University of Wisconsin—Madison, Madison, Wisconsin 53706, United States
| |
Collapse
|
9
|
Ravikumar B, Alam Z, Peddinti G, Aittokallio T. C-SPADE: a web-tool for interactive analysis and visualization of drug screening experiments through compound-specific bioactivity dendrograms. Nucleic Acids Res 2019; 45:W495-W500. [PMID: 28472495 PMCID: PMC5570255 DOI: 10.1093/nar/gkx384] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Accepted: 04/25/2017] [Indexed: 12/20/2022] Open
Abstract
The advent of polypharmacology paradigm in drug discovery calls for novel chemoinformatic tools for analyzing compounds’ multi-targeting activities. Such tools should provide an intuitive representation of the chemical space through capturing and visualizing underlying patterns of compound similarities linked to their polypharmacological effects. Most of the existing compound-centric chemoinformatics tools lack interactive options and user interfaces that are critical for the real-time needs of chemical biologists carrying out compound screening experiments. Toward that end, we introduce C-SPADE, an open-source exploratory web-tool for interactive analysis and visualization of drug profiling assays (biochemical, cell-based or cell-free) using compound-centric similarity clustering. C-SPADE allows the users to visually map the chemical diversity of a screening panel, explore investigational compounds in terms of their similarity to the screening panel, perform polypharmacological analyses and guide drug-target interaction predictions. C-SPADE requires only the raw drug profiling data as input, and it automatically retrieves the structural information and constructs the compound clusters in real-time, thereby reducing the time required for manual analysis in drug development or repurposing applications. The web-tool provides a customizable visual workspace that can either be downloaded as figure or Newick tree file or shared as a hyperlink with other users. C-SPADE is freely available at http://cspade.fimm.fi/.
Collapse
Affiliation(s)
- Balaguru Ravikumar
- Institute for Molecular Medicine Finland, FIMM, University of Helsinki, Helsinki, Finland
| | - Zaid Alam
- Institute for Molecular Medicine Finland, FIMM, University of Helsinki, Helsinki, Finland
| | - Gopal Peddinti
- Institute for Molecular Medicine Finland, FIMM, University of Helsinki, Helsinki, Finland
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland, FIMM, University of Helsinki, Helsinki, Finland.,Department of Mathematics and Statistics, University of Turku, Turku, Finland
| |
Collapse
|
10
|
Métivier JP, Cuissart B, Bureau R, Lepailleur A. The Pharmacophore Network: A Computational Method for Exploring Structure–Activity Relationships from a Large Chemical Data Set. J Med Chem 2018; 61:3551-3564. [DOI: 10.1021/acs.jmedchem.7b01890] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Affiliation(s)
- Jean-Philippe Métivier
- Centre d’Etudes et de Recherche sur le Médicament de Normandie, Normandie Univ, UNICAEN, CERMN, 14000 Caen, France
- Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen, Normandie Univ, UNICAEN, ENSICAEN, CNRS, GREYC, 14000 Caen, France
| | - Bertrand Cuissart
- Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen, Normandie Univ, UNICAEN, ENSICAEN, CNRS, GREYC, 14000 Caen, France
| | - Ronan Bureau
- Centre d’Etudes et de Recherche sur le Médicament de Normandie, Normandie Univ, UNICAEN, CERMN, 14000 Caen, France
| | - Alban Lepailleur
- Centre d’Etudes et de Recherche sur le Médicament de Normandie, Normandie Univ, UNICAEN, CERMN, 14000 Caen, France
| |
Collapse
|
11
|
Smith RD, Lu J, Carlson HA. Are there physicochemical differences between allosteric and competitive ligands? PLoS Comput Biol 2017; 13:e1005813. [PMID: 29125840 PMCID: PMC5699844 DOI: 10.1371/journal.pcbi.1005813] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Revised: 11/22/2017] [Accepted: 10/05/2017] [Indexed: 01/04/2023] Open
Abstract
Previous studies have compared the physicochemical properties of allosteric compounds to non-allosteric compounds. Those studies have found that allosteric compounds tend to be smaller, more rigid, more hydrophobic, and more drug-like than non-allosteric compounds. However, previous studies have not properly corrected for the fact that some protein targets have much more data than other systems. This generates concern regarding the possible skew that can be introduced by the inherent bias in the available data. Hence, this study aims to determine how robust the previous findings are to the addition of newer data. This study utilizes the Allosteric Database (ASD v3.0) and ChEMBL v20 to systematically obtain large datasets of both allosteric and competitive ligands. This dataset contains 70,219 and 9,511 unique ligands for the allosteric and competitive sets, respectively. Physically relevant compound descriptors were computed to examine the differences in their chemical properties. Particular attention was given to removing redundancy in the data and normalizing across ligand diversity and varied protein targets. The resulting distributions only show that allosteric ligands tend to be more aromatic and rigid and do not confirm the increase in hydrophobicity or difference in drug-likeness. These results are robust across different normalization schemes.
Collapse
Affiliation(s)
- Richard D. Smith
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, United States of America
| | - Jing Lu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States of America
| | - Heather A. Carlson
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, United States of America
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States of America
- * E-mail:
| |
Collapse
|
12
|
Perez de Souza L, Naake T, Tohge T, Fernie AR. From chromatogram to analyte to metabolite. How to pick horses for courses from the massive web resources for mass spectral plant metabolomics. Gigascience 2017; 6:1-20. [PMID: 28520864 PMCID: PMC5499862 DOI: 10.1093/gigascience/gix037] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2017] [Revised: 05/08/2017] [Accepted: 05/12/2017] [Indexed: 01/19/2023] Open
Abstract
The grand challenge currently facing metabolomics is the expansion of the coverage of the metabolome from a minor percentage of the metabolic complement of the cell toward the level of coverage afforded by other post-genomic technologies such as transcriptomics and proteomics. In plants, this problem is exacerbated by the sheer diversity of chemicals that constitute the metabolome, with the number of metabolites in the plant kingdom generally considered to be in excess of 200 000. In this review, we focus on web resources that can be exploited in order to improve analyte and ultimately metabolite identification and quantification. There is a wide range of available software that not only aids in this but also in the related area of peak alignment; however, for the uninitiated, choosing which program to use is a daunting task. For this reason, we provide an overview of the pros and cons of the software as well as comments regarding the level of programing skills required to effectively exploit their basic functions. In addition, the torrent of available genome and transcriptome sequences that followed the advent of next-generation sequencing has opened up further valuable resources for metabolite identification. All things considered, we posit that only via a continued communal sharing of information such as that deposited in the databases described within the article are we likely to be able to make significant headway toward improving our coverage of the plant metabolome.
Collapse
Affiliation(s)
- Leonardo Perez de Souza
- Max-Planck-Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
| | - Thomas Naake
- Max-Planck-Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
| | - Takayuki Tohge
- Max-Planck-Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
| | - Alisdair R Fernie
- Max-Planck-Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
| |
Collapse
|
13
|
Awale M, Probst D, Reymond JL. WebMolCS: A Web-Based Interface for Visualizing Molecules in Three-Dimensional Chemical Spaces. J Chem Inf Model 2017; 57:643-649. [PMID: 28316236 DOI: 10.1021/acs.jcim.6b00690] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The concept of chemical space provides a convenient framework to analyze large collections of molecules by placing them in property spaces where distances represent similarities. Here we report webMolCS, a new type of web-based interface visualizing up to 5000 user-defined molecules in six different three-dimensional (3D) chemical spaces obtained by principal component analysis or similarity mapping of multidimensional property spaces describing composition (MQN: 42D molecular quantum numbers, SMIfp: 34D SMILES fingerprint), shapes and pharmacophores (APfp: 20D atom pair fingerprint, Xfp: 55D category extended atom pair fingerprint), and substructures (Sfp: 1024D binary substructure fingerprint, ECfp4:1024D extended connectivity fingerprint). Each molecule is shown as a sphere, and its structure appears on mouse over. The sphere is color-coded by similarity to the first compound in the list, by the list rank, or by a user-defined value, which reveals the relationship between any property encoded by these values and structural similarities. WebMolCS is freely available at www.gdb.unibe.ch .
Collapse
Affiliation(s)
- Mahendra Awale
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| | - Daniel Probst
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| |
Collapse
|