1
|
Hoover AJ, Spale M, Lahue B, Bitton DA. Matcher: An Open-Source Application for Translating Large Structure/Property Data Sets into Insights for Drug Design. J Chem Inf Model 2023; 63:1852-1857. [PMID: 36977316 DOI: 10.1021/acs.jcim.3c00015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]
Abstract
To solve recurring problems in drug discovery, matched molecular pair (MMP) analysis is used to understand relationships between chemical structure and function. For the MMP analysis of large data sets (>10,000 compounds), available tools lack flexible search and visualization functionality and require computational expertise. Here, we present Matcher, an open-source application for MMP analysis, with novel search algorithms and fully automated querying-to-visualization that requires no programming expertise. Matcher enables unprecedented control over the search and clustering of MMP transformations based on both variable fragment and constant environment structure, which is critical for disentangling relevant and irrelevant data to a given problem. Users can exert such control through a built-in chemical sketcher and with a few mouse clicks can navigate between resulting MMP transformations, statistics, property distribution graphs, and structures with raw experimental data, for confident and accelerated decision making. Matcher can be used with any collection of structure/property data; here, we demonstrate usage with a public ChEMBL data set of about 20,000 small molecules with CYP3A4 and/or hERG inhibition data. Users can reproduce all examples demonstrated herein via unique links within Matcher's interface-a functionality that anyone can use to preserve and share their own analyses. Matcher and all its dependencies are open-source, can be used for free, and are available with containerized deployment from code at https://github.com/Merck/Matcher. Matcher makes large structure/property data sets more transparent than ever before and accelerates the data-driven solution of common problems in drug discovery.
Collapse
Affiliation(s)
- Andrew J Hoover
- Computational and Structural Chemistry, Merck & Co., Inc., Boston, Massachusetts 02115, United States
| | - Martin Spale
- R&D Informatics Solutions, MSD Czech Republic s.r.o., Prague 150 00, Czech Republic
| | - Brian Lahue
- Computational and Structural Chemistry, Merck & Co., Inc., Boston, Massachusetts 02115, United States
| | - Danny A Bitton
- R&D Informatics Solutions, MSD Czech Republic s.r.o., Prague 150 00, Czech Republic
| |
Collapse
|
2
|
Gurvic D, Leach AG, Zachariae U. Data-Driven Derivation of Molecular Substructures That Enhance Drug Activity in Gram-Negative Bacteria. J Med Chem 2022; 65:6088-6099. [PMID: 35427114 PMCID: PMC9059115 DOI: 10.1021/acs.jmedchem.1c01984] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Indexed: 11/28/2022]
Abstract
The complex cell envelope of Gram-negative bacteria creates a formidable barrier to antibiotic influx. Reduced drug uptake impedes drug development and contributes to a wide range of drug-resistant bacterial infections, including those caused by extremely resistant species prioritized by the World Health Organization. To develop new and efficient treatments, a better understanding of the molecular features governing Gram-negative permeability is essential. Here, we present a data-driven approach, using matched molecular pair analysis and machine learning on minimal inhibitory concentration data from Gram-positive and Gram-negative bacteria to uncover chemical features that influence Gram-negative bioactivity. We find recurring chemical moieties, of a wider range than previously known, that consistently improve activity and suggest that this insight can be used to optimize compounds for increased Gram-negative uptake. Our findings may help to expand the chemical space of broad-spectrum antibiotics and aid the search for new antibiotic compound classes.
Collapse
Affiliation(s)
- Dominik Gurvic
- Computational
Biology, School of Life Sciences, University
of Dundee, Dow Street, Dundee DD1
5EH, United Kingdom
| | - Andrew G. Leach
- Division
of Pharmacy and Optometry, University of
Manchester, Oxford Road, Manchester M13 9PL, United Kingdom
- Medchemica
Limited, Mereside, Alderley
Park, Macclesfield, SK10
4TG, United Kingdom
| | - Ulrich Zachariae
- Computational
Biology, School of Life Sciences, University
of Dundee, Dow Street, Dundee DD1
5EH, United Kingdom
| |
Collapse
|
3
|
Yang ZY, Fu L, Lu AP, Liu S, Hou TJ, Cao DS. Semi-automated workflow for molecular pair analysis and QSAR-assisted transformation space expansion. J Cheminform 2021; 13:86. [PMID: 34774096 PMCID: PMC8590336 DOI: 10.1186/s13321-021-00564-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 10/30/2021] [Indexed: 12/01/2022] Open
Abstract
In the process of drug discovery, the optimization of lead compounds has always been a challenge faced by pharmaceutical chemists. Matched molecular pair analysis (MMPA), a promising tool to efficiently extract and summarize the relationship between structural transformation and property change, is suitable for local structural optimization tasks. Especially, the integration of MMPA with QSAR modeling can further strengthen the utility of MMPA in molecular optimization navigation. In this study, a new semi-automated procedure based on KNIME was developed to support MMPA on both large- and small-scale datasets, including molecular preparation, QSAR model construction, applicability domain evaluation, and MMP calculation and application. Two examples covering regression and classification tasks were provided to gain a better understanding of the importance of MMPA, which has also shown the reliability and utility of this MMPA-by-QSAR pipeline. ![]()
Collapse
Affiliation(s)
- Zi-Yi Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, People's Republic of China.,Hunan Key Laboratory of Diagnostic and Therapeutic Drug Research for Chronic Diseases, Changsha, 410013, Hunan, China
| | - Li Fu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, People's Republic of China.,Hunan Key Laboratory of Diagnostic and Therapeutic Drug Research for Chronic Diseases, Changsha, 410013, Hunan, China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, 999077, SAR, People's Republic of China
| | - Shao Liu
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha, 410008, Hunan, People's Republic of China
| | - Ting-Jun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, People's Republic of China.
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, People's Republic of China. .,Hunan Key Laboratory of Diagnostic and Therapeutic Drug Research for Chronic Diseases, Changsha, 410013, Hunan, China. .,Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, 999077, SAR, People's Republic of China.
| |
Collapse
|
4
|
Naveja JJ, Vogt M. Automatic Identification of Analogue Series from Large Compound Data Sets: Methods and Applications. Molecules 2021; 26:5291. [PMID: 34500724 PMCID: PMC8433811 DOI: 10.3390/molecules26175291] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Revised: 08/27/2021] [Accepted: 08/28/2021] [Indexed: 01/21/2023] Open
Abstract
Analogue series play a key role in drug discovery. They arise naturally in lead optimization efforts where analogues are explored based on one or a few core structures. However, it is much harder to accurately identify and extract pairs or series of analogue molecules in large compound databases with no predefined core structures. This methodological review outlines the most common and recent methodological developments to automatically identify analogue series in large libraries. Initial approaches focused on using predefined rules to extract scaffold structures, such as the popular Bemis-Murcko scaffold. Later on, the matched molecular pair concept led to efficient algorithms to identify similar compounds sharing a common core structure by exploring many putative scaffolds for each compound. Further developments of these ideas yielded, on the one hand, approaches for hierarchical scaffold decomposition and, on the other hand, algorithms for the extraction of analogue series based on single-site modifications (so-called matched molecular series) by exploring potential scaffold structures based on systematic molecule fragmentation. Eventually, further development of these approaches resulted in methods for extracting analogue series defined by a single core structure with several substitution sites that allow convenient representations, such as R-group tables. These methods enable the efficient analysis of large data sets with hundreds of thousands or even millions of compounds and have spawned many related methodological developments.
Collapse
Affiliation(s)
- José J. Naveja
- Instituto de Química, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico;
| | - Martin Vogt
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5-6, 53115 Bonn, Germany
| |
Collapse
|
5
|
Can we accelerate medicinal chemistry by augmenting the chemist with Big Data and artificial intelligence? Drug Discov Today 2018; 23:1373-1384. [DOI: 10.1016/j.drudis.2018.03.011] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Revised: 02/27/2018] [Accepted: 03/20/2018] [Indexed: 12/18/2022]
|
6
|
Dalke A, Hert J, Kramer C. mmpdb: An Open-Source Matched Molecular Pair Platform for Large Multiproperty Data Sets. J Chem Inf Model 2018; 58:902-910. [DOI: 10.1021/acs.jcim.8b00173] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Andrew Dalke
- Andrew Dalke Scientific AB, SE-461 30 Trollhättan, Sweden
| | - Jérôme Hert
- Roche Pharma Research and Early Development, Roche Innovation Center, CH-4070 Basel, Switzerland
| | - Christian Kramer
- Roche Pharma Research and Early Development, Roche Innovation Center, CH-4070 Basel, Switzerland
| |
Collapse
|
7
|
Ehmki ESR, Rarey M. Exploring Structure-Activity Relationships with Three-Dimensional Matched Molecular Pairs-A Review. ChemMedChem 2018; 13:482-489. [PMID: 29211343 DOI: 10.1002/cmdc.201700628] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2017] [Revised: 11/27/2017] [Indexed: 11/10/2022]
Abstract
A matched molecular pair (MMP) consists of two small molecules that differ by a few atoms only. The minor structural difference between the molecules allows a detailed analysis of changes in properties. Three-dimensional (3D) MMPs extend the concept of chemical similarity by spatial similarity. Conformations must be generated, and superimpositions have to be calculated. The additional complexity and uncertainty as well as the smaller amount of available experimental data substantially complicates the derivation of models. Nonetheless, there are some benefits that make the transition worthwhile. The 3D concept gives detailed insight into mechanisms behind several methods classically used by the 2D MMP approach. It can help to analyze disrupted series of structure-activity relationships or extend the 2D MMP concept with scaffold hopping. One of the most powerful features is the high confidence structure-activity relationship transfer between series of analogues. Several research groups have approached the problem from different directions. The models vary especially in the 3D similarity measure used and complexity of the applied descriptor selected or designed. Nonetheless, all approaches have increased the amount of information available by incorporating 3D structural information.
Collapse
Affiliation(s)
- Emanuel S R Ehmki
- Center for Bioinformatics, Universität Hamburg, Bundesstraße 43, 20146, Hamburg, Germany
| | - Matthias Rarey
- Center for Bioinformatics, Universität Hamburg, Bundesstraße 43, 20146, Hamburg, Germany
| |
Collapse
|