1
|
Bagdad Y, Miteva MA. Recent Applications of Artificial Intelligence in Discovery of New Antibacterial Agents. Adv Appl Bioinform Chem 2024; 17:139-157. [PMID: 39650228 PMCID: PMC11624680 DOI: 10.2147/aabc.s484321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Accepted: 10/25/2024] [Indexed: 12/11/2024] Open
Abstract
Antimicrobial resistance (AMR) represents today a major challenge for global public health, compromising the effectiveness of treatments against a multitude of bacterial infections. In recent decades, artificial intelligence (AI) has emerged as a promising technology for the identification and development of new antibacterial agents. This review focuses on AI methodologies applied to discover new antibacterial candidates. Case studies that identified small molecules and peptides showing antimicrobial activity and demonstrating efficiency against pathogenic resistant bacteria by employing AI are summarized. We also discuss the challenges and opportunities offered by AI, highlighting the importance of AI progress for the identification of new promising antibacterial drug candidates to combat the AMR.
Collapse
Affiliation(s)
- Youcef Bagdad
- Université Paris Cité, CNRS UMR 8038 CiTCoM, Inserm U1268 MCTR, Paris, France
| | - Maria A Miteva
- Université Paris Cité, CNRS UMR 8038 CiTCoM, Inserm U1268 MCTR, Paris, France
| |
Collapse
|
2
|
Linciano P, Quotadamo A, Luciani R, Santucci M, Zorn KM, Foil DH, Lane TR, Cordeiro da Silva A, Santarem N, B Moraes C, Freitas-Junior L, Wittig U, Mueller W, Tonelli M, Ferrari S, Venturelli A, Gul S, Kuzikov M, Ellinger B, Reinshagen J, Ekins S, Costi MP. High-Throughput Phenotypic Screening and Machine Learning Methods Enabled the Selection of Broad-Spectrum Low-Toxicity Antitrypanosomatidic Agents. J Med Chem 2023; 66:15230-15255. [PMID: 37921561 PMCID: PMC10683024 DOI: 10.1021/acs.jmedchem.3c01322] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 10/14/2023] [Accepted: 10/18/2023] [Indexed: 11/04/2023]
Abstract
Broad-spectrum anti-infective chemotherapy agents with activity against Trypanosomes, Leishmania, and Mycobacterium tuberculosis species were identified from a high-throughput phenotypic screening program of the 456 compounds belonging to the Ty-Box, an in-house industry database. Compound characterization using machine learning approaches enabled the identification and synthesis of 44 compounds with broad-spectrum antiparasitic activity and minimal toxicity against Trypanosoma brucei, Leishmania Infantum, and Trypanosoma cruzi. In vitro studies confirmed the predictive models identified in compound 40 which emerged as a new lead, featured by an innovative N-(5-pyrimidinyl)benzenesulfonamide scaffold and promising low micromolar activity against two parasites and low toxicity. Given the volume and complexity of data generated by the diverse high-throughput screening assays performed on the compounds of the Ty-Box library, the chemoinformatic and machine learning tools enabled the selection of compounds eligible for further evaluation of their biological and toxicological activities and aided in the decision-making process toward the design and optimization of the identified lead.
Collapse
Affiliation(s)
- Pasquale Linciano
- Department
of Life Sciences, University of Modena and
Reggio Emilia, Via Campi 103, 41125 Modena, Italy
| | - Antonio Quotadamo
- Department
of Life Sciences, University of Modena and
Reggio Emilia, Via Campi 103, 41125 Modena, Italy
| | - Rosaria Luciani
- Department
of Life Sciences, University of Modena and
Reggio Emilia, Via Campi 103, 41125 Modena, Italy
| | - Matteo Santucci
- Department
of Life Sciences, University of Modena and
Reggio Emilia, Via Campi 103, 41125 Modena, Italy
| | - Kimberley M. Zorn
- Collaborations
Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Daniel H. Foil
- Collaborations
Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Thomas R. Lane
- Collaborations
Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Anabela Cordeiro da Silva
- Institute
for Molecular and Cell Biology, 4150-180 Porto, Portugal
- Instituto
de Investigaçao e Inovaçao em Saúde, Universidade do Porto and Institute for Molecular
and Cell Biology, 4150-180 Porto, Portugal
| | - Nuno Santarem
- Institute
for Molecular and Cell Biology, 4150-180 Porto, Portugal
- Instituto
de Investigaçao e Inovaçao em Saúde, Universidade do Porto and Institute for Molecular
and Cell Biology, 4150-180 Porto, Portugal
| | - Carolina B Moraes
- Brazilian
Biosciences National Laboratory (LNBio), Brazilian Center for Research in Energy and Materials (CNPEM), 13083-970 Campinas, São Paulo, Brazil
| | - Lucio Freitas-Junior
- Brazilian
Biosciences National Laboratory (LNBio), Brazilian Center for Research in Energy and Materials (CNPEM), 13083-970 Campinas, São Paulo, Brazil
| | - Ulrike Wittig
- Scientific
Databases and Visualization Group and Molecular and Cellular Modelling
Group, Heidelberg Institute for Theoretical
Studies (HITS), D-69118 Heidelberg, Germany
| | - Wolfgang Mueller
- Scientific
Databases and Visualization Group and Molecular and Cellular Modelling
Group, Heidelberg Institute for Theoretical
Studies (HITS), D-69118 Heidelberg, Germany
| | - Michele Tonelli
- Department
of Pharmacy, University of Genoa, Viale Benedetto XV n.3, 16132 Genoa, Italy
| | - Stefania Ferrari
- Department
of Life Sciences, University of Modena and
Reggio Emilia, Via Campi 103, 41125 Modena, Italy
| | - Alberto Venturelli
- Department
of Life Sciences, University of Modena and
Reggio Emilia, Via Campi 103, 41125 Modena, Italy
- TYDOCK
PHARMA S.r.l., Strada
Gherbella 294/b, 41126 Modena, Italy
| | - Sheraz Gul
- Fraunhofer
Translational Medicine and Pharmacology, Schnackenburgallee 114, D-22525 Hamburg, Germany
- Fraunhofer Cluster of Excellence Immune-Mediated Diseases
CIMD, Schnackenburgallee
114, D-22525 Hamburg, Germany
| | - Maria Kuzikov
- Fraunhofer
Translational Medicine and Pharmacology, Schnackenburgallee 114, D-22525 Hamburg, Germany
- Fraunhofer Cluster of Excellence Immune-Mediated Diseases
CIMD, Schnackenburgallee
114, D-22525 Hamburg, Germany
| | - Bernhard Ellinger
- Fraunhofer
Translational Medicine and Pharmacology, Schnackenburgallee 114, D-22525 Hamburg, Germany
- Fraunhofer Cluster of Excellence Immune-Mediated Diseases
CIMD, Schnackenburgallee
114, D-22525 Hamburg, Germany
| | - Jeanette Reinshagen
- Fraunhofer
Translational Medicine and Pharmacology, Schnackenburgallee 114, D-22525 Hamburg, Germany
- Fraunhofer Cluster of Excellence Immune-Mediated Diseases
CIMD, Schnackenburgallee
114, D-22525 Hamburg, Germany
| | - Sean Ekins
- Collaborations
Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Maria Paola Costi
- Department
of Life Sciences, University of Modena and
Reggio Emilia, Via Campi 103, 41125 Modena, Italy
| |
Collapse
|
3
|
Naidu A, Nayak SS, Lulu S S, Sundararajan V. Advances in computational frameworks in the fight against TB: The way forward. Front Pharmacol 2023; 14:1152915. [PMID: 37077815 PMCID: PMC10106641 DOI: 10.3389/fphar.2023.1152915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Accepted: 03/20/2023] [Indexed: 04/05/2023] Open
Abstract
Around 1.6 million people lost their life to Tuberculosis in 2021 according to WHO estimates. Although an intensive treatment plan exists against the causal agent, Mycobacterium Tuberculosis, evolution of multi-drug resistant strains of the pathogen puts a large number of global populations at risk. Vaccine which can induce long-term protection is still in the making with many candidates currently in different phases of clinical trials. The COVID-19 pandemic has further aggravated the adversities by affecting early TB diagnosis and treatment. Yet, WHO remains adamant on its "End TB" strategy and aims to substantially reduce TB incidence and deaths by the year 2035. Such an ambitious goal would require a multi-sectoral approach which would greatly benefit from the latest computational advancements. To highlight the progress of these tools against TB, through this review, we summarize recent studies which have used advanced computational tools and algorithms for-early TB diagnosis, anti-mycobacterium drug discovery and in the designing of the next-generation of TB vaccines. At the end, we give an insight on other computational tools and Machine Learning approaches which have successfully been applied in biomedical research and discuss their prospects and applications against TB.
Collapse
Affiliation(s)
| | | | | | - Vino Sundararajan
- Department of Biotechnology, School of Bio Sciences and Technology, VIT University, Vellore, India
| |
Collapse
|
4
|
Harigua-Souiai E, Oualha R, Souiai O, Abdeljaoued-Tej I, Guizani I. Applied Machine Learning Toward Drug Discovery Enhancement: Leishmaniases as a Case Study. Bioinform Biol Insights 2022; 16:11779322221090349. [PMID: 35478992 PMCID: PMC9036323 DOI: 10.1177/11779322221090349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 03/04/2022] [Indexed: 11/25/2022] Open
Abstract
Drug discovery (DD) research is a complex field with a high attrition rate. Machine learning (ML) approaches combined to chemoinformatics are of valuable input to this field. We, herein, focused on implementing multiple ML algorithms that shall learn from different molecular fingerprints (FPs) of 65 057 molecules that have been identified as active or inactive against Leishmania major promastigotes. We sought to build a classifier able to predict whether a given molecule has the potential of being anti-leishmanial or not. Using the RDkit library, we calculated 5 molecular FPs of the molecules. Then, we implemented 4 ML algorithms that we trained and tested for their ability to classify the molecules into active/inactive classes based on their chemical structure, encoded by the molecular FPs. Best performers were random forest (RF) and support vector machine (SVM), while atom-pair and topology torsion FPs were the best embedding functions. Both models were further assessed on different stratification levels of the dataset and showed stable performances. At last, we used them to predict the potential of molecules within the Food and Drug Administration (FDA)-approved drugs collection to present anti-Leishmania effects. We ranked these drugs according to their anti-Leishmanial probability and obtained in total seven anti-Leishmania agents, previously described in the literature, within the top 10 of each model. This validates the robustness of the approach, the algorithms, and FPs choices as well as the importance of the dataset size and content. We further engaged these molecules into reverse docking experiments on 3D crystal structures of seven well-studied Leishmania drug targets and could predict the molecular targets for 4 drugs. The results bring novel insights into anti-Leishmania compounds.
Collapse
Affiliation(s)
- Emna Harigua-Souiai
- Laboratory of Molecular Epidemiology and Experimental Pathology-LR16IPT04, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Rafeh Oualha
- Laboratory of Molecular Epidemiology and Experimental Pathology-LR16IPT04, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Oussama Souiai
- Laboratory of Bioinformatics, BioMathematics and BioStatistics LR20IPT09, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Ines Abdeljaoued-Tej
- Laboratory of Bioinformatics, BioMathematics and BioStatistics LR20IPT09, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia.,Engineering School of Statistics and Information Analysis, University of Carthage, Ariana, Tunisia
| | - Ikram Guizani
- Laboratory of Molecular Epidemiology and Experimental Pathology-LR16IPT04, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| |
Collapse
|
5
|
Lane TR, Urbina F, Rank L, Gerlach J, Riabova O, Lepioshkin A, Kazakova E, Vocat A, Tkachenko V, Cole S, Makarov V, Ekins S. Machine Learning Models for Mycobacterium tuberculosisIn Vitro Activity: Prediction and Target Visualization. Mol Pharm 2022; 19:674-689. [PMID: 34964633 PMCID: PMC9121329 DOI: 10.1021/acs.molpharmaceut.1c00791] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Tuberculosis (TB) is a major global health challenge, with approximately 1.4 million deaths per year. There is still a need to develop novel treatments for patients infected with Mycobacterium tuberculosis (Mtb). There have been many large-scale phenotypic screens that have led to the identification of thousands of new compounds. Yet, there is very limited investment in TB drug discovery which points to the need for new methods to increase the efficiency of drug discovery against Mtb. We have used machine learning approaches to learn from the public Mtb data, resulting in many data sets and models with robust enrichment and hit rates leading to the discovery of new active compounds. Recently, we have curated predominantly small-molecule Mtb data and developed new machine learning classification models with 18 886 molecules at different activity cutoffs. We now describe the further validation of these Bayesian models using a library of over 1000 molecules synthesized as part of EU-funded New Medicines for TB and More Medicines for TB programs. We highlight molecular features which are enriched in these active compounds. In addition, we provide new regression and classification models that can be used for scoring compound libraries or used to design new molecules. We have also visualized these molecules in the context of known molecular targets and identified clusters in chemical property space, which may aid in future target identification efforts. Finally, we are also making these data sets publicly available, representing a significant increase to the available Mtb inhibition data in the public domain.
Collapse
Affiliation(s)
- Thomas R. Lane
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
| | - Fabio Urbina
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
| | - Laura Rank
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
| | - Jacob Gerlach
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
| | - Olga Riabova
- Research Center of Biotechnology RAS, 119071 Moscow, Russia
| | | | - Elena Kazakova
- Research Center of Biotechnology RAS, 119071 Moscow, Russia
| | - Anthony Vocat
- Global Health Institute, Ecole Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland
| | - Valery Tkachenko
- Science Data Experts, 14909 Forest Landing Cir, Rockville, MD 20850
| | | | - Vadim Makarov
- Research Center of Biotechnology RAS, 119071 Moscow, Russia
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
| |
Collapse
|
6
|
Schmalstig AA, Zorn KM, Murci S, Robinson A, Savina S, Komarova E, Makarov V, Braunstein M, Ekins S. Mycobacterium abscessus drug discovery using machine learning. Tuberculosis (Edinb) 2022; 132:102168. [PMID: 35077930 PMCID: PMC8855326 DOI: 10.1016/j.tube.2022.102168] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 10/30/2021] [Accepted: 01/14/2022] [Indexed: 01/22/2023]
Abstract
The prevalence of infections by nontuberculous mycobacteria is increasing, having surpassed tuberculosis in the United States and much of the developed world. Nontuberculous mycobacteria occur naturally in the environment and are a significant problem for patients with underlying lung diseases such as bronchiectasis, chronic obstructive pulmonary disease, and cystic fibrosis. Current treatment regimens are lengthy, complicated, toxic and they are often unsuccessful as seen by disease recurrence. Mycobacterium abscessus is one of the most commonly encountered organisms in nontuberculous mycobacteria disease and it is the most difficult to eradicate. There is currently no systematically proven regimen that is effective for treating M. abscessus infections. Our approach to drug discovery integrates machine learning, medicinal chemistry and in vitro testing and has been previously applied to Mycobacterium tuberculosis. We have now identified several novel 1-(phenylsulfonyl)-1H-benzimidazol-2-amines that have weak activity on M. abscessus in vitro but may represent a starting point for future further medicinal chemistry optimization. We also address limitations still to be overcome with the machine learning approach for M. abscessus.
Collapse
Affiliation(s)
- Alan A. Schmalstig
- Department of Microbiology and Immunology, School of Medicine, University of North Carolina at Chapel Hill, North Carolina, 27599, USA
| | - Kimberley M. Zorn
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive Lab 3510, Raleigh, North Carolina, 27606, USA
| | - Sebastian Murci
- Department of Microbiology and Immunology, School of Medicine, University of North Carolina at Chapel Hill, North Carolina, 27599, USA
| | - Andrew Robinson
- Department of Microbiology and Immunology, School of Medicine, University of North Carolina at Chapel Hill, North Carolina, 27599, USA
| | - Svetlana Savina
- Research Center of Biotechnology RAS, Moscow, 119071, Russia
| | - Elena Komarova
- Research Center of Biotechnology RAS, Moscow, 119071, Russia
| | - Vadim Makarov
- Research Center of Biotechnology RAS, Moscow, 119071, Russia
| | - Miriam Braunstein
- Department of Microbiology and Immunology, School of Medicine, University of North Carolina at Chapel Hill, North Carolina, 27599, USA
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive Lab 3510, Raleigh, North Carolina, 27606, USA.,Corresponding author: Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive Lab 3510, Raleigh, North Carolina, 27606, USA.
| |
Collapse
|
7
|
Haldar R, Narayanan SJ. A novel ensemble based recommendation approach using network based analysis for identification of effective drugs for Tuberculosis. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:873-891. [PMID: 34903017 DOI: 10.3934/mbe.2022040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Tuberculosis (TB) is a fatal infectious disease which affected millions of people worldwide for many decades and now with mutating drug resistant strains, it poses bigger challenges in treatment of the patients. Computational techniques might play a crucial role in rapidly developing new or modified anti-tuberculosis drugs which can tackle these mutating strains of TB. This research work applied a computational approach to generate a unique recommendation list of possible TB drugs as an alternate to a popular drug, EMB, by first securing an initial list of drugs from a popular online database, PubChem, and thereafter applying an ensemble of ranking mechanisms. As a novelty, both the pharmacokinetic properties and some network based attributes of the chemical structure of the drugs are considered for generating separate recommendation lists. The work also provides customized modifications on a popular and traditional ensemble ranking technique to cater to the specific dataset and requirements. The final recommendation list provides established chemical structures along with their ranks, which could be used as alternatives to EMB. It is believed that the incorporation of both pharmacokinetic and network based properties in the ensemble ranking process added to the effectiveness and relevance of the final recommendation.
Collapse
Affiliation(s)
- Rishin Haldar
- School of Computer Science and Engineering, Vellore Institute of Technology (VIT), Vellore - 632014, Tamil Nadu, India
| | - Swathi Jamjala Narayanan
- School of Computer Science and Engineering, Vellore Institute of Technology (VIT), Vellore - 632014, Tamil Nadu, India
| |
Collapse
|
8
|
Computational Study on Potential Novel Anti-Ebola Virus Protein VP35 Natural Compounds. Biomedicines 2021; 9:biomedicines9121796. [PMID: 34944612 PMCID: PMC8698941 DOI: 10.3390/biomedicines9121796] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 10/27/2021] [Accepted: 11/05/2021] [Indexed: 12/13/2022] Open
Abstract
Ebola virus (EBOV) is one of the most lethal pathogens that can infect humans. The Ebola viral protein VP35 (EBOV VP35) inhibits host IFN-α/β production by interfering with host immune responses to viral invasion and is thus considered as a plausible drug target. The aim of this study was to identify potential novel lead compounds against EBOV VP35 using computational techniques in drug discovery. The 3D structure of the EBOV VP35 with PDB ID: 3FKE was used for molecular docking studies. An integrated library of 7675 African natural product was pre-filtered using ADMET risk, with a threshold of 7 and, as a result, 1470 ligands were obtained for the downstream molecular docking using AutoDock Vina, after an energy minimization of the protein via GROMACS. Five known inhibitors, namely, amodiaquine, chloroquine, gossypetin, taxifolin and EGCG were used as standard control compounds for this study. The area under the curve (AUC) value, evaluating the docking protocol obtained from the receiver operating characteristic (ROC) curve, generated was 0.72, which was considered to be acceptable. The four identified potential lead compounds of NANPDB4048, NANPDB2412, ZINC000095486250 and NANPDB2476 had binding affinities of −8.2, −8.2, −8.1 and −8.0 kcal/mol, respectively, and were predicted to possess desirable antiviral activity including the inhibition of RNA synthesis and membrane permeability, with the probable activity (Pa) being greater than the probable inactivity (Pi) values. The predicted anti-EBOV inhibition efficiency values (IC50), found using a random forest classifier, ranged from 3.35 to 11.99 μM, while the Ki values ranged from 0.97 to 1.37 μM. The compounds NANPDB4048 and NANPDB2412 had the lowest binding energy of −8.2 kcal/mol, implying a higher binding affinity to EBOV VP35 which was greater than those of the known inhibitors. The compounds were predicted to possess a low toxicity risk and to possess reasonably good pharmacological profiles. Molecular dynamics (MD) simulations of the protein–ligand complexes, lasting 50 ns, and molecular mechanisms Poisson-Boltzmann surface area (MM-PBSA) calculations corroborated the binding affinities of the identified compounds and identified novel critical interacting residues. The antiviral potential of the molecules could be confirmed experimentally, while the scaffolds could be optimized for the design of future novel anti-EBOV chemotherapeutics.
Collapse
|
9
|
Winkler DA. Use of Artificial Intelligence and Machine Learning for Discovery of Drugs for Neglected Tropical Diseases. Front Chem 2021; 9:614073. [PMID: 33791277 PMCID: PMC8005575 DOI: 10.3389/fchem.2021.614073] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 01/18/2021] [Indexed: 12/11/2022] Open
Abstract
Neglected tropical diseases continue to create high levels of morbidity and mortality in a sizeable fraction of the world’s population, despite ongoing research into new treatments. Some of the most important technological developments that have accelerated drug discovery for diseases of affluent countries have not flowed down to neglected tropical disease drug discovery. Pharmaceutical development business models, cost of developing new drug treatments and subsequent costs to patients, and accessibility of technologies to scientists in most of the affected countries are some of the reasons for this low uptake and slow development relative to that for common diseases in developed countries. Computational methods are starting to make significant inroads into discovery of drugs for neglected tropical diseases due to the increasing availability of large databases that can be used to train ML models, increasing accuracy of these methods, lower entry barrier for researchers, and widespread availability of public domain machine learning codes. Here, the application of artificial intelligence, largely the subset called machine learning, to modelling and prediction of biological activities and discovery of new drugs for neglected tropical diseases is summarized. The pathways for the development of machine learning methods in the short to medium term and the use of other artificial intelligence methods for drug discovery is discussed. The current roadblocks to, and likely impacts of, synergistic new technological developments on the use of ML methods for neglected tropical disease drug discovery in the future are also discussed.
Collapse
Affiliation(s)
- David A Winkler
- Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, VIC, Australia.,Latrobe Institute for Molecular Science, La Trobe University, Bundoora, VIC, Australia.,School of Pharmacy, University of Nottingham, Nottingham, United Kingdom.,CSIRO Data61, Pullenvale, QLD, Australia
| |
Collapse
|
10
|
Pereira JC, Daher SS, Zorn KM, Sherwood M, Russo R, Perryman AL, Wang X, Freundlich MJ, Ekins S, Freundlich JS. Machine Learning Platform to Discover Novel Growth Inhibitors of Neisseria gonorrhoeae. Pharm Res 2020; 37:141. [PMID: 32661900 DOI: 10.1007/s11095-020-02876-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Accepted: 07/06/2020] [Indexed: 12/17/2022]
Abstract
PURPOSE To advance fundamental biological and translational research with the bacterium Neisseria gonorrhoeae through the prediction of novel small molecule growth inhibitors via naïve Bayesian modeling methodology. METHODS Inspection and curation of data from the publicly available ChEMBL web site for small molecule growth inhibition data of the bacterium Neisseria gonorrhoeae resulted in a training set for the construction of machine learning models. A naïve Bayesian model for bacterial growth inhibition was utilized in a workflow to predict novel antibacterial agents against this bacterium of global health relevance from a commercial library of >105 drug-like small molecules. Follow-up efforts involved empirical assessment of the predictions and validation of the hits. RESULTS Specifically, two small molecules were found that exhibited promising activity profiles and represent novel chemotypes for agents against N. gonorrrhoeae. CONCLUSIONS This represents, to the best of our knowledge, the first machine learning approach to successfully predict novel growth inhibitors of this bacterium. To assist the chemical tool and drug discovery fields, we have made our curated training set available as part of the Supplementary Material and the Bayesian model is accessible via the web. Graphical Abstract.
Collapse
Affiliation(s)
- Janaina Cruz Pereira
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University New Jersey Medical School, I-503 185 South Orange Avenue, Newark, NJ, 07103, USA
| | - Samer S Daher
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University New Jersey Medical School, I-503 185 South Orange Avenue, Newark, NJ, 07103, USA
| | - Kimberley M Zorn
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - Matthew Sherwood
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University New Jersey Medical School, I-503 185 South Orange Avenue, Newark, NJ, 07103, USA
| | - Riccardo Russo
- Division of Infectious Disease, Department of Medicine and the Ruy V. Lourenço Center for the Study of Emerging and Re-emerging Pathogens, Rutgers University New Jersey Medical School, I-503 185 South Orange Avenue, Newark, NJ, 07103, USA
| | - Alexander L Perryman
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University New Jersey Medical School, I-503 185 South Orange Avenue, Newark, NJ, 07103, USA.,Repare Therapeutics,, 7210 Rue Frederick-Banting Suite 100, Montreal, QC, H4S 2A1, Canada
| | - Xin Wang
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University New Jersey Medical School, I-503 185 South Orange Avenue, Newark, NJ, 07103, USA.,Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Madeleine J Freundlich
- Stuart Country Day School of the Sacred Heart, 1200 Stuart Road, Princeton, NJ, 08540, USA
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA.,Collaborations in Chemistry, Inc. 5616 Hilltop Needmore Road, Fuquay-, Varina, NC, 27526, USA
| | - Joel S Freundlich
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University New Jersey Medical School, I-503 185 South Orange Avenue, Newark, NJ, 07103, USA. .,Division of Infectious Disease, Department of Medicine and the Ruy V. Lourenço Center for the Study of Emerging and Re-emerging Pathogens, Rutgers University New Jersey Medical School, I-503 185 South Orange Avenue, Newark, NJ, 07103, USA.
| |
Collapse
|
11
|
Makarov V, Salina E, Reynolds RC, Kyaw Zin PP, Ekins S. Molecule Property Analyses of Active Compounds for Mycobacterium tuberculosis. J Med Chem 2020; 63:8917-8955. [PMID: 32259446 DOI: 10.1021/acs.jmedchem.9b02075] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Tuberculosis (TB) continues to claim the lives of around 1.7 million people per year. Most concerning are the reports of multidrug drug resistance. Paradoxically, this global health pandemic is demanding new therapies when resources and interest are waning. However, continued tuberculosis drug discovery is critical to address the global health need and burgeoning multidrug resistance. Many diverse classes of antitubercular compounds have been identified with activity in vitro and in vivo. Our analyses of over 100 active leads are representative of thousands of active compounds generated over the past decade, suggests that they come from few chemical classes or natural product sources. We are therefore repeatedly identifying compounds that are similar to those that preceded them. Our molecule-centered cheminformatics analyses point to the need to dramatically increase the diversity of chemical libraries tested and get outside of the historic Mtb property space if we are to generate novel improved antitubercular leads.
Collapse
Affiliation(s)
- Vadim Makarov
- FRC Fundamentals of Biotechnology, Russian Academy of Science, Moscow 119071, Russia
| | - Elena Salina
- FRC Fundamentals of Biotechnology, Russian Academy of Science, Moscow 119071, Russia
| | - Robert C Reynolds
- Department of Medicine, Division of Hematology and Oncology, University of Alabama at Birmingham, NP 2540 J, 1720 Second Avenue South, Birmingham, Alabama 35294-3300, United States
| | - Phyo Phyo Kyaw Zin
- Department of Chemistry, North Carolina State University, Raleigh, North Carolina 27695, United States.,Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina 27695, United States
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510 Raleigh, North Carolina 27606, United States
| |
Collapse
|
12
|
Wang X, Perryman AL, Li SG, Paget SD, Stratton TP, Lemenze A, Olson AJ, Ekins S, Kumar P, Freundlich JS. Intrabacterial Metabolism Obscures the Successful Prediction of an InhA Inhibitor of Mycobacterium tuberculosis. ACS Infect Dis 2019; 5:2148-2163. [PMID: 31625383 DOI: 10.1021/acsinfecdis.9b00295] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Tuberculosis, caused by Mycobacterium tuberculosis (M. tuberculosis), kills 1.6 million people annually. To bridge the gap between structure- and cell-based drug discovery strategies, we are pioneering a computer-aided discovery paradigm that merges structure-based virtual screening with ligand-based, machine learning methods trained with cell-based data. This approach successfully identified N-(3-methoxyphenyl)-7-nitrobenzo[c][1,2,5]oxadiazol-4-amine (JSF-2164) as an inhibitor of purified InhA with whole-cell efficacy versus in vitro cultured M. tuberculosis. When the intrabacterial drug metabolism (IBDM) platform was leveraged, mechanistic studies demonstrated that JSF-2164 underwent a rapid F420H2-dependent biotransformation within M. tuberculosis to afford intrabacterial nitric oxide and two amines, identified as JSF-3616 and JSF-3617. Thus, metabolism of JSF-2164 obscured the InhA inhibition phenotype within cultured M. tuberculosis. This study demonstrates a new docking/Bayesian computational strategy to combine cell- and target-based drug screening and the need to probe intrabacterial metabolism when clarifying the antitubercular mechanism of action.
Collapse
Affiliation(s)
- Xin Wang
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University−New Jersey Medical School, Medical Sciences Building, 185 South Orange Avenue, Newark, New Jersey 07103, United States
| | - Alexander L. Perryman
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University−New Jersey Medical School, Medical Sciences Building, 185 South Orange Avenue, Newark, New Jersey 07103, United States
| | - Shao-Gang Li
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University−New Jersey Medical School, Medical Sciences Building, 185 South Orange Avenue, Newark, New Jersey 07103, United States
| | - Steve D. Paget
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University−New Jersey Medical School, Medical Sciences Building, 185 South Orange Avenue, Newark, New Jersey 07103, United States
| | - Thomas P. Stratton
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University−New Jersey Medical School, Medical Sciences Building, 185 South Orange Avenue, Newark, New Jersey 07103, United States
| | - Alex Lemenze
- Division of Infectious Disease, Department of Medicine, and the Ruy V. Lourenço Center for the Study of Emerging and Reemerging Pathogens, Rutgers University−New Jersey Medical School, Medical Sciences Building, 185 South Orange Avenue, Newark, New Jersey 07103, United States
| | - Arthur J. Olson
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, Room MB112/Mail Drop MB5, 10550 North Torrey Pines Road, La Jolla, California 92037, United States
| | - Sean Ekins
- Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States
| | - Pradeep Kumar
- Division of Infectious Disease, Department of Medicine, and the Ruy V. Lourenço Center for the Study of Emerging and Reemerging Pathogens, Rutgers University−New Jersey Medical School, Medical Sciences Building, 185 South Orange Avenue, Newark, New Jersey 07103, United States
| | - Joel S. Freundlich
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University−New Jersey Medical School, Medical Sciences Building, 185 South Orange Avenue, Newark, New Jersey 07103, United States
- Division of Infectious Disease, Department of Medicine, and the Ruy V. Lourenço Center for the Study of Emerging and Reemerging Pathogens, Rutgers University−New Jersey Medical School, Medical Sciences Building, 185 South Orange Avenue, Newark, New Jersey 07103, United States
| |
Collapse
|
13
|
Kwofie SK, Broni E, Teye J, Quansah E, Issah I, Wilson MD, Miller WA, Tiburu EK, Bonney JHK. Pharmacoinformatics-based identification of potential bioactive compounds against Ebola virus protein VP24. Comput Biol Med 2019; 113:103414. [PMID: 31536833 DOI: 10.1016/j.compbiomed.2019.103414] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Revised: 08/23/2019] [Accepted: 08/24/2019] [Indexed: 02/06/2023]
Abstract
BACKGROUND The impact of Ebola virus disease (EVD) is devastating with concomitant high fatalities. Currently, various drugs and vaccines are at different stages of development, corroborating the need to identify new therapeutic molecules. The VP24 protein of the Ebola virus (EBOV) plays a key role in the pathology and replication of the EVD. The VP24 protein interferes with the host immune response to viral infections and promotes nucleocapsid formation, thus making it a viable drug target. This study sought to identify putative lead compounds from the African flora with potential to inhibit the activity of the EBOV VP24 protein using pharmacoinformatics and molecular docking. METHODS An integrated library of 7675 natural products originating from Africa obtained from the AfroDB and NANPDB databases, as well as known inhibitors were screened against VP24 (PDB ID: 4M0Q) utilising AutoDock Vina after energy minimization using GROMACS. The top 19 compounds were physicochemically and pharmacologically profiled using ADMET Predictor™, SwissADME and DataWarrior. The mechanisms of binding between the molecules and EBOV VP24 were characterised using LigPlot+. The performance of the molecular docking was evaluated by generating a receiver operating characteristic (ROC) by screening known inhibitors and decoys against EBOV VP24. The prediction of activity spectra for substances (PASS) and machine learning-based Open Bayesian models were used to predict the anti-viral and anti-Ebola activity of the molecules, respectively. RESULTS Four natural products, namely, ZINC000095486070, ZINC000003594643, ZINC000095486008 and sarcophine were found to be potential EBOV VP24-inhibitiory molecules. The molecular docking results showed that ZINC000095486070 had high binding affinity of -9.7 kcal/mol with EBOV VP24, which was greater than those of the known VP24-inhibitors used as standards in the study including Ouabain, Nilotinib, Clomiphene, Torimefene, Miglustat and BCX4430. The area under the curve of the generated ROC for evaluating the performance of the molecular docking was 0.77, which was considered acceptable. The predicted promising molecules were also validated using induced-fit docking with the receptor using Schrödinger and molecular mechanics Poisson-Boltzmann surface area (MM-PBSA) calculations. The molecules had better binding mechanisms and were pharmacologically profiled to have plausible efficacies, negligible toxicity as well as suitable for designing anti-Ebola scaffolds. ZINC000095486008 and sarcophine (NANPDB135) were predicted to possess anti-viral activity, while ZINC000095486070 and ZINC000003594643 to be anti-Ebola compounds. CONCLUSION The identified compounds are potential inhibitors worthy of further development as EBOV biotherapeutic agents. The scaffolds of the compounds could also serve as building blocks for designing novel Ebola inhibitors.
Collapse
Affiliation(s)
- Samuel K Kwofie
- Department of Biomedical Engineering, School of Engineering Sciences, College of Basic & Applied Sciences, University of Ghana, PMB LG 77, Legon, Accra, Ghana; West African Center for Cell Biology of Infectious Pathogens, Department of Biochemistry, Cell and Molecular Biology, College of Basic and Applied Sciences, University of Ghana, Accra, Ghana; Department of Medicine, Loyola University Medical Center, Maywood, IL, 60153, USA.
| | - Emmanuel Broni
- Department of Biomedical Engineering, School of Engineering Sciences, College of Basic & Applied Sciences, University of Ghana, PMB LG 77, Legon, Accra, Ghana
| | - Joshua Teye
- Department of Biomedical Engineering, School of Engineering Sciences, College of Basic & Applied Sciences, University of Ghana, PMB LG 77, Legon, Accra, Ghana
| | - Erasmus Quansah
- Department of Parasitology, Noguchi Memorial Institute for Medical Research (NMIMR), College of Health Sciences (CHS), University of Ghana, P.O. Box LG 581, Legon, Accra, Ghana
| | - Ibrahim Issah
- Department of Biomedical Engineering, School of Engineering Sciences, College of Basic & Applied Sciences, University of Ghana, PMB LG 77, Legon, Accra, Ghana
| | - Michael D Wilson
- Department of Medicine, Loyola University Medical Center, Maywood, IL, 60153, USA; Department of Parasitology, Noguchi Memorial Institute for Medical Research (NMIMR), College of Health Sciences (CHS), University of Ghana, P.O. Box LG 581, Legon, Accra, Ghana
| | - Whelton A Miller
- Department of Medicine, Loyola University Medical Center, Maywood, IL, 60153, USA; Department of Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Elvis K Tiburu
- Department of Biomedical Engineering, School of Engineering Sciences, College of Basic & Applied Sciences, University of Ghana, PMB LG 77, Legon, Accra, Ghana; West African Center for Cell Biology of Infectious Pathogens, Department of Biochemistry, Cell and Molecular Biology, College of Basic and Applied Sciences, University of Ghana, Accra, Ghana
| | - Joseph H K Bonney
- Department of Virology, Noguchi Memorial Institute for Medical Research (NMIMR), College of Health Sciences (CHS), University of Ghana, P.O. Box LG 581, Legon, Accra, Ghana
| |
Collapse
|
14
|
Lane T, Russo DP, Zorn KM, Clark AM, Korotcov A, Tkachenko V, Reynolds RC, Perryman AL, Freundlich JS, Ekins AS. Comparing and Validating Machine Learning Models for Mycobacterium tuberculosis Drug Discovery. Mol Pharm 2018; 15:4346-4360. [PMID: 29672063 PMCID: PMC6167198 DOI: 10.1021/acs.molpharmaceut.8b00083] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Tuberculosis is a global health dilemma. In 2016, the WHO reported 10.4 million incidences and 1.7 million deaths. The need to develop new treatments for those infected with Mycobacterium tuberculosis ( Mtb) has led to many large-scale phenotypic screens and many thousands of new active compounds identified in vitro. However, with limited funding, efforts to discover new active molecules against Mtb needs to be more efficient. Several computational machine learning approaches have been shown to have good enrichment and hit rates. We have curated small molecule Mtb data and developed new models with a total of 18,886 molecules with activity cutoffs of 10 μM, 1 μM, and 100 nM. These data sets were used to evaluate different machine learning methods (including deep learning) and metrics and to generate predictions for additional molecules published in 2017. One Mtb model, a combined in vitro and in vivo data Bayesian model at a 100 nM activity yielded the following metrics for 5-fold cross validation: accuracy = 0.88, precision = 0.22, recall = 0.91, specificity = 0.88, kappa = 0.31, and MCC = 0.41. We have also curated an evaluation set ( n = 153 compounds) published in 2017, and when used to test our model, it showed the comparable statistics (accuracy = 0.83, precision = 0.27, recall = 1.00, specificity = 0.81, kappa = 0.36, and MCC = 0.47). We have also compared these models with additional machine learning algorithms showing Bayesian machine learning models constructed with literature Mtb data generated by different laboratories generally were equivalent to or outperformed deep neural networks with external test sets. Finally, we have also compared our training and test sets to show they were suitably diverse and different in order to represent useful evaluation sets. Such Mtb machine learning models could help prioritize compounds for testing in vitro and in vivo.
Collapse
Affiliation(s)
- Thomas Lane
- Collaborations Pharmaceuticals, Inc., Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Daniel P. Russo
- Collaborations Pharmaceuticals, Inc., Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ, 08102, USA
| | - Kimberley M. Zorn
- Collaborations Pharmaceuticals, Inc., Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
| | - Alex M. Clark
- Molecular Materials Informatics, Inc., 1900 St. Jacques #302, Montreal H3J 2S1, Quebec, Canada
| | - Alexandru Korotcov
- Science Data Software, LLC, 14914 Bradwill Court, Rockville, MD 20850, USA
| | - Valery Tkachenko
- Science Data Software, LLC, 14914 Bradwill Court, Rockville, MD 20850, USA
| | - Robert C. Reynolds
- Department of Medicine, Division of Hematology and Oncology, University of Alabama at Birmingham, NP 2540 J, 1720 2Avenue South, Birmingham, AL 35294-3300, USA
| | - Alexander L. Perryman
- Department of Pharmacology, Physiology and Neuroscience, Rutgers University-New Jersey Medical School, Newark, New Jersey 07103, USA
| | - Joel S. Freundlich
- Department of Pharmacology, Physiology and Neuroscience, Rutgers University-New Jersey Medical School, Newark, New Jersey 07103, USA
- Division of Infectious Diseases, Department of Medicine, and the Ruy V. Lourenço Center for the Study of Emerging and Re-emerging Pathogens, Rutgers University–New Jersey Medical School, Newark, New Jersey 07103, USA
| | - and Sean Ekins
- Collaborations Pharmaceuticals, Inc., Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
| |
Collapse
|
15
|
Ekins S, Clark AM, Dole K, Gregory K, Mcnutt AM, Spektor AC, Weatherall C, Litterman NK, Bunin BA. Data Mining and Computational Modeling of High-Throughput Screening Datasets. Methods Mol Biol 2018; 1755:197-221. [PMID: 29671272 DOI: 10.1007/978-1-4939-7724-6_14] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
We are now seeing the benefit of investments made over the last decade in high-throughput screening (HTS) that is resulting in large structure activity datasets entering public and open databases such as ChEMBL and PubChem. The growth of academic HTS screening centers and the increasing move to academia for early stage drug discovery suggests a great need for the informatics tools and methods to mine such data and learn from it. Collaborative Drug Discovery, Inc. (CDD) has developed a number of tools for storing, mining, securely and selectively sharing, as well as learning from such HTS data. We present a new web based data mining and visualization module directly within the CDD Vault platform for high-throughput drug discovery data that makes use of a novel technology stack following modern reactive design principles. We also describe CDD Models within the CDD Vault platform that enables researchers to share models, share predictions from models, and create models from distributed, heterogeneous data. Our system is built on top of the Collaborative Drug Discovery Vault Activity and Registration data repository ecosystem which allows users to manipulate and visualize thousands of molecules in real time. This can be performed in any browser on any platform. In this chapter we present examples of its use with public datasets in CDD Vault. Such approaches can complement other cheminformatics tools, whether open source or commercial, in providing approaches for data mining and modeling of HTS data.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA.
| | - Alex M Clark
- Collaborative Drug Discovery, Inc., Burlingame, CA, USA
- Molecular Materials Informatics, Inc., Montreal, QC, Canada
| | - Krishna Dole
- Collaborative Drug Discovery, Inc., Burlingame, CA, USA
| | | | | | | | | | | | - Barry A Bunin
- Collaborative Drug Discovery, Inc., Burlingame, CA, USA
| |
Collapse
|
16
|
Collaborative drug discovery for More Medicines for Tuberculosis (MM4TB). Drug Discov Today 2016; 22:555-565. [PMID: 27884746 DOI: 10.1016/j.drudis.2016.10.009] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2016] [Revised: 10/11/2016] [Accepted: 10/21/2016] [Indexed: 01/30/2023]
Abstract
Neglected disease drug discovery is generally poorly funded compared with major diseases and hence there is an increasing focus on collaboration and precompetitive efforts such as public-private partnerships (PPPs). The More Medicines for Tuberculosis (MM4TB) project is one such collaboration funded by the EU with the goal of discovering new drugs for tuberculosis. Collaborative Drug Discovery has provided a commercial web-based platform called CDD Vault which is a hosted collaborative solution for securely sharing diverse chemistry and biology data. Using CDD Vault alongside other commercial and free cheminformatics tools has enabled support of this and other large collaborative projects, aiding drug discovery efforts and fostering collaboration. We will describe CDD's efforts in assisting with the MM4TB project.
Collapse
|
17
|
Ekins S, Perryman AL, Clark AM, Reynolds RC, Freundlich JS. Machine Learning Model Analysis and Data Visualization with Small Molecules Tested in a Mouse Model of Mycobacterium tuberculosis Infection (2014-2015). J Chem Inf Model 2016; 56:1332-43. [PMID: 27335215 PMCID: PMC4962118 DOI: 10.1021/acs.jcim.6b00004] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
![]()
The
renewed urgency to develop new treatments for Mycobacterium
tuberculosis (Mtb)
infection has resulted in large-scale phenotypic screening and thousands
of new active compounds in vitro. The next challenge
is to identify candidates to pursue in a mouse in vivo efficacy model as a step to predicting clinical efficacy. We previously
analyzed over 70 years of this mouse in vivo efficacy
data, which we used to generate and validate machine learning models.
Curation of 60 additional small molecules with in vivo data published in 2014 and 2015 was undertaken to further test these
models. This represents a much larger test set than for the previous
models. Several computational approaches have now been applied to
analyze these molecules and compare their molecular properties beyond
those attempted previously. Our previous machine learning models have
been updated, and a novel aspect has been added in the form of mouse
liver microsomal half-life (MLM t1/2)
and in vitro-based Mtb models incorporating
cytotoxicity data that were used to predict in vivo activity for comparison. Our best Mtbin
vivo models possess fivefold ROC values > 0.7, sensitivity
> 80%, and concordance > 60%, while the best specificity value
is
>40%. Use of an MLM t1/2 Bayesian model
affords comparable results for scoring the 60 compounds tested. Combining
MLM stability and in vitroMtb models
in a novel consensus workflow in the best cases has a positive predicted
value (hit rate) > 77%. Our results indicate that Bayesian models
constructed with literature in vivoMtb data generated by different laboratories in various mouse models
can have predictive value and may be used alongside MLM t1/2 and in vitro-based Mtb models to assist in selecting antitubercular compounds with desirable in vivo efficacy. We demonstrate for the first time that
consensus models of any kind can be used to predict in vivo activity for Mtb. In addition, we describe a new
clustering method for data visualization and apply this to the in vivo training and test data, ultimately making the method
accessible in a mobile app.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborative Drug Discovery , 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States.,Collaborations in Chemistry , 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States
| | - Alexander L Perryman
- Department of Pharmacology, Physiology and Neuroscience, Rutgers University-New Jersey Medical School , Newark, New Jersey 07103, United States
| | - Alex M Clark
- Molecular Materials Informatics, Inc. , 1900 St. Jacques #302, Montreal, Quebec H3J 2S1, Canada
| | - Robert C Reynolds
- Division of Hematology and Oncology, Department of Medicine, and Department of Chemistry, College of Arts and Sciences, University of Alabama at Birmingham , 1530 Third Avenue South, Birmingham, Alabama 35294-1240, United States
| | - Joel S Freundlich
- Department of Pharmacology, Physiology and Neuroscience, Rutgers University-New Jersey Medical School , Newark, New Jersey 07103, United States.,Division of Infectious Diseases, Department of Medicine, and the Ruy V. Lourenço Center for the Study of Emerging and Re-emerging Pathogens, Rutgers University-New Jersey Medical School , Newark, New Jersey 07103, United States
| |
Collapse
|
18
|
Perryman AL, Stratton TP, Ekins S, Freundlich JS. Predicting Mouse Liver Microsomal Stability with "Pruned" Machine Learning Models and Public Data. Pharm Res 2016; 33:433-49. [PMID: 26415647 PMCID: PMC4712113 DOI: 10.1007/s11095-015-1800-5] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Accepted: 09/22/2015] [Indexed: 02/07/2023]
Abstract
PURPOSE Mouse efficacy studies are a critical hurdle to advance translational research of potential therapeutic compounds for many diseases. Although mouse liver microsomal (MLM) stability studies are not a perfect surrogate for in vivo studies of metabolic clearance, they are the initial model system used to assess metabolic stability. Consequently, we explored the development of machine learning models that can enhance the probability of identifying compounds possessing MLM stability. METHODS Published assays on MLM half-life values were identified in PubChem, reformatted, and curated to create a training set with 894 unique small molecules. These data were used to construct machine learning models assessed with internal cross-validation, external tests with a published set of antitubercular compounds, and independent validation with an additional diverse set of 571 compounds (PubChem data on percent metabolism). RESULTS "Pruning" out the moderately unstable / moderately stable compounds from the training set produced models with superior predictive power. Bayesian models displayed the best predictive power for identifying compounds with a half-life ≥1 h. CONCLUSIONS Our results suggest the pruning strategy may be of general benefit to improve test set enrichment and provide machine learning models with enhanced predictive value for the MLM stability of small organic molecules. This study represents the most exhaustive study to date of using machine learning approaches with MLM data from public sources.
Collapse
Affiliation(s)
- Alexander L Perryman
- Division of Infectious Disease, Department of Medicine, and the Ruy V. Lourenço Center for the Study of Emerging and Re-emerging Pathogens, Rutgers University-New Jersey Medical School, Newark, New Jersey, 07103, USA
| | - Thomas P Stratton
- Department of Pharmacology & Physiology, Rutgers University-New Jersey Medical School, Medical Sciences Building, I-503, 185 South Orange Ave., Newark, New Jersey, 07103, USA
| | - Sean Ekins
- Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC, 27526, USA
- Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA, 94010, USA
| | - Joel S Freundlich
- Division of Infectious Disease, Department of Medicine, and the Ruy V. Lourenço Center for the Study of Emerging and Re-emerging Pathogens, Rutgers University-New Jersey Medical School, Newark, New Jersey, 07103, USA.
- Department of Pharmacology & Physiology, Rutgers University-New Jersey Medical School, Medical Sciences Building, I-503, 185 South Orange Ave., Newark, New Jersey, 07103, USA.
| |
Collapse
|
19
|
Clark AM, Dole K, Ekins S. Open Source Bayesian Models. 3. Composite Models for Prediction of Binned Responses. J Chem Inf Model 2016; 56:275-85. [PMID: 26750305 PMCID: PMC4764945 DOI: 10.1021/acs.jcim.5b00555] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
![]()
Bayesian models constructed from
structure-derived fingerprints
have been a popular and useful method for drug discovery research
when applied to bioactivity measurements that can be effectively classified
as active or inactive. The results can be used to rank candidate structures
according to their probability of activity, and this ranking benefits
from the high degree of interpretability when structure-based fingerprints
are used, making the results chemically intuitive. Besides selecting
an activity threshold, building a Bayesian model is fast and requires
few or no parameters or user intervention. The method also does not
suffer from such acute overtraining problems as quantitative structure–activity
relationships or quantitative structure–property relationships
(QSAR/QSPR). This makes it an approach highly suitable for automated
workflows that are independent of user expertise or prior knowledge
of the training data. We now describe a new method for creating a
composite group of Bayesian models to extend the method to work with
multiple states, rather than just binary. Incoming activities are
divided into bins, each covering a mutually exclusive range of activities.
For each of these bins, a Bayesian model is created to model whether
or not the compound belongs in the bin. Analyzing putative molecules
using the composite model involves making a prediction for each bin
and examining the relative likelihood for each assignment, for example,
highest value wins. The method has been evaluated on a collection
of hundreds of data sets extracted from ChEMBL v20 and validated data
sets for ADME/Tox and bioactivity.
Collapse
Affiliation(s)
- Alex M Clark
- Molecular Materials Informatics, Inc. , 1900 St. Jacques #302, Montreal H3J 2S1, Quebec, Canada
| | - Krishna Dole
- Collaborative Drug Discovery, Inc. , 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States
| | - Sean Ekins
- Collaborative Drug Discovery, Inc. , 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States.,Collaborations in Chemistry , 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States
| |
Collapse
|
20
|
Ekins S, Madrid PB, Sarker M, Li SG, Mittal N, Kumar P, Wang X, Stratton TP, Zimmerman M, Talcott C, Bourbon P, Travers M, Yadav M, Freundlich JS. Combining Metabolite-Based Pharmacophores with Bayesian Machine Learning Models for Mycobacterium tuberculosis Drug Discovery. PLoS One 2015; 10:e0141076. [PMID: 26517557 PMCID: PMC4627656 DOI: 10.1371/journal.pone.0141076] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Accepted: 10/05/2015] [Indexed: 12/15/2022] Open
Abstract
Integrated computational approaches for Mycobacterium tuberculosis (Mtb) are useful to identify new molecules that could lead to future tuberculosis (TB) drugs. Our approach uses information derived from the TBCyc pathway and genome database, the Collaborative Drug Discovery TB database combined with 3D pharmacophores and dual event Bayesian models of whole-cell activity and lack of cytotoxicity. We have prioritized a large number of molecules that may act as mimics of substrates and metabolites in the TB metabolome. We computationally searched over 200,000 commercial molecules using 66 pharmacophores based on substrates and metabolites from Mtb and further filtering with Bayesian models. We ultimately tested 110 compounds in vitro that resulted in two compounds of interest, BAS 04912643 and BAS 00623753 (MIC of 2.5 and 5 μg/mL, respectively). These molecules were used as a starting point for hit-to-lead optimization. The most promising class proved to be the quinoxaline di-N-oxides, evidenced by transcriptional profiling to induce mRNA level perturbations most closely resembling known protonophores. One of these, SRI58 exhibited an MIC = 1.25 μg/mL versus Mtb and a CC50 in Vero cells of >40 μg/mL, while featuring fair Caco-2 A-B permeability (2.3 x 10−6 cm/s), kinetic solubility (125 μM at pH 7.4 in PBS) and mouse metabolic stability (63.6% remaining after 1 h incubation with mouse liver microsomes). Despite demonstration of how a combined bioinformatics/cheminformatics approach afforded a small molecule with promising in vitro profiles, we found that SRI58 did not exhibit quantifiable blood levels in mice.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborative Drug Discovery Inc., 1633 Bayshore Highway, Suite 342, Burlingame, CA, 94010, United States of America
- Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC, 27526, United States of America
- * E-mail: (SE); (PBM); (JSF)
| | - Peter B. Madrid
- SRI International, 333 Ravenswood Avenue, Menlo Park, CA, 94025, United States of America
- * E-mail: (SE); (PBM); (JSF)
| | - Malabika Sarker
- SRI International, 333 Ravenswood Avenue, Menlo Park, CA, 94025, United States of America
| | - Shao-Gang Li
- Departments of Pharmacology & Physiology and Medicine, Center for Emerging and Reemerging Pathogens, Rutgers University–New Jersey Medical School, 185 South Orange Avenue, Newark, NJ, 07103, United States of America
| | - Nisha Mittal
- Departments of Pharmacology & Physiology and Medicine, Center for Emerging and Reemerging Pathogens, Rutgers University–New Jersey Medical School, 185 South Orange Avenue, Newark, NJ, 07103, United States of America
| | - Pradeep Kumar
- Department of Medicine, Center for Emerging and Reemerging Pathogens, Rutgers University–New Jersey Medical School, 185 South Orange Avenue, Newark, NJ, 07103, United States of America
| | - Xin Wang
- Departments of Pharmacology & Physiology and Medicine, Center for Emerging and Reemerging Pathogens, Rutgers University–New Jersey Medical School, 185 South Orange Avenue, Newark, NJ, 07103, United States of America
| | - Thomas P. Stratton
- Departments of Pharmacology & Physiology and Medicine, Center for Emerging and Reemerging Pathogens, Rutgers University–New Jersey Medical School, 185 South Orange Avenue, Newark, NJ, 07103, United States of America
| | - Matthew Zimmerman
- Public Health Research Institute, Rutgers University–New Jersey Medical School, Newark, NJ, 07103, United States of America
| | - Carolyn Talcott
- SRI International, 333 Ravenswood Avenue, Menlo Park, CA, 94025, United States of America
| | - Pauline Bourbon
- SRI International, 333 Ravenswood Avenue, Menlo Park, CA, 94025, United States of America
| | - Mike Travers
- Collaborative Drug Discovery Inc., 1633 Bayshore Highway, Suite 342, Burlingame, CA, 94010, United States of America
| | - Maneesh Yadav
- SRI International, 333 Ravenswood Avenue, Menlo Park, CA, 94025, United States of America
| | - Joel S. Freundlich
- Departments of Pharmacology & Physiology and Medicine, Center for Emerging and Reemerging Pathogens, Rutgers University–New Jersey Medical School, 185 South Orange Avenue, Newark, NJ, 07103, United States of America
- * E-mail: (SE); (PBM); (JSF)
| |
Collapse
|
21
|
Ekins S, Lage de Siqueira-Neto J, McCall LI, Sarker M, Yadav M, Ponder EL, Kallel EA, Kellar D, Chen S, Arkin M, Bunin BA, McKerrow JH, Talcott C. Machine Learning Models and Pathway Genome Data Base for Trypanosoma cruzi Drug Discovery. PLoS Negl Trop Dis 2015; 9:e0003878. [PMID: 26114876 PMCID: PMC4482694 DOI: 10.1371/journal.pntd.0003878] [Citation(s) in RCA: 63] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2015] [Accepted: 06/05/2015] [Indexed: 12/21/2022] Open
Abstract
Background Chagas disease is a neglected tropical disease (NTD) caused by the eukaryotic parasite Trypanosoma cruzi. The current clinical and preclinical pipeline for T. cruzi is extremely sparse and lacks drug target diversity. Methodology/Principal Findings In the present study we developed a computational approach that utilized data from several public whole-cell, phenotypic high throughput screens that have been completed for T. cruzi by the Broad Institute, including a single screen of over 300,000 molecules in the search for chemical probes as part of the NIH Molecular Libraries program. We have also compiled and curated relevant biological and chemical compound screening data including (i) compounds and biological activity data from the literature, (ii) high throughput screening datasets, and (iii) predicted metabolites of T. cruzi metabolic pathways. This information was used to help us identify compounds and their potential targets. We have constructed a Pathway Genome Data Base for T. cruzi. In addition, we have developed Bayesian machine learning models that were used to virtually screen libraries of compounds. Ninety-seven compounds were selected for in vitro testing, and 11 of these were found to have EC50 < 10μM. We progressed five compounds to an in vivo mouse efficacy model of Chagas disease and validated that the machine learning model could identify in vitro active compounds not in the training set, as well as known positive controls. The antimalarial pyronaridine possessed 85.2% efficacy in the acute Chagas mouse model. We have also proposed potential targets (for future verification) for this compound based on structural similarity to known compounds with targets in T. cruzi. Conclusions/ Significance We have demonstrated how combining chemoinformatics and bioinformatics for T. cruzi drug discovery can bring interesting in vivo active molecules to light that may have been overlooked. The approach we have taken is broadly applicable to other NTDs. Chagas disease is a neglected tropical disease (NTD) caused by the eukaryotic parasite Trypanosoma cruzi. The disease is endemic to Latin America but is increasingly found in North America and Europe, primarily through immigration, and the spread of this disease is bringing new attention to the need for novel, safe, and effective therapeutics to treat T. cruzi infection. We have used data from a phenotypic screen to build Bayesian models to predict anti-parasitic activity against T. cruzi in vitro. These models were used to score various small libraries of molecules. We selected less than 100 compounds for testing and found in vitro actives, some of which were tested in an in vivo efficacy model. We identified the antimalarial pyronaridine as having in vivo efficacy and provides us with a new starting point for further investigation and optimization.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborative Drug Discovery, Burlingame, California, United States of America
- Collaborations in Chemistry, Fuquay-Varina, North Carolina, United States of America
- * E-mail:
| | - Jair Lage de Siqueira-Neto
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, San Diego, California, United States of America
| | - Laura-Isobel McCall
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, San Diego, California, United States of America
| | - Malabika Sarker
- SRI International, Menlo Park, California, United States of America
| | - Maneesh Yadav
- SRI International, Menlo Park, California, United States of America
| | - Elizabeth L. Ponder
- Chemistry, Engineering & Medicine for Human Health (ChEM-H), Stanford, California, United States of America
| | - E. Adam Kallel
- Collaborative Drug Discovery, Burlingame, California, United States of America
| | - Danielle Kellar
- Department of Pathology, University of California, San Francisco, San Francisco, California, United States of America
| | - Steven Chen
- Small Molecule Discovery Center and Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California, United States of America
| | - Michelle Arkin
- Small Molecule Discovery Center and Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California, United States of America
| | - Barry A. Bunin
- Collaborative Drug Discovery, Burlingame, California, United States of America
| | - James H. McKerrow
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, San Diego, California, United States of America
| | - Carolyn Talcott
- SRI International, Menlo Park, California, United States of America
| |
Collapse
|
22
|
Mugumbate G, Abrahams KA, Cox JAG, Papadatos G, van Westen G, Lelièvre J, Calus ST, Loman NJ, Ballell L, Barros D, Overington JP, Besra GS. Mycobacterial dihydrofolate reductase inhibitors identified using chemogenomic methods and in vitro validation. PLoS One 2015; 10:e0121492. [PMID: 25799414 PMCID: PMC4370846 DOI: 10.1371/journal.pone.0121492] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2014] [Accepted: 02/01/2015] [Indexed: 01/21/2023] Open
Abstract
The lack of success in target-based screening approaches to the discovery of antibacterial agents has led to reemergence of phenotypic screening as a successful approach of identifying bioactive, antibacterial compounds. A challenge though with this route is then to identify the molecular target(s) and mechanism of action of the hits. This target identification, or deorphanization step, is often essential in further optimization and validation studies. Direct experimental identification of the molecular target of a screening hit is often complex, precisely because the properties and specificity of the hit are not yet optimized against that target, and so many false positives are often obtained. An alternative is to use computational, predictive, approaches to hypothesize a mechanism of action, which can then be validated in a more directed and efficient manner. Specifically here we present experimental validation of an in silico prediction from a large-scale screen performed against Mycobacterium tuberculosis (Mtb), the causative agent of tuberculosis. The two potent anti-tubercular compounds studied in this case, belonging to the tetrahydro-1,3,5-triazin-2-amine (THT) family, were predicted and confirmed to be an inhibitor of dihydrofolate reductase (DHFR), a known essential Mtb gene, and already clinically validated as a drug target. Given the large number of similar screening data sets shared amongst the community, this in vitro validation of these target predictions gives weight to computational approaches to establish the mechanism of action (MoA) of novel screening hit.
Collapse
Affiliation(s)
- Grace Mugumbate
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Katherine A. Abrahams
- Institute of Microbiology and Infection (IMI), School of Biosciences, University of Birmingham, Edgbaston, Birmingham, United Kingdom
| | - Jonathan A. G. Cox
- Institute of Microbiology and Infection (IMI), School of Biosciences, University of Birmingham, Edgbaston, Birmingham, United Kingdom
| | - George Papadatos
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Gerard van Westen
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Joël Lelièvre
- Diseases of the Developing World, GlaxoSmithKline, Severo Ochoa 2, 28760 Tres Cantos, Madrid, Spain
| | - Szymon T. Calus
- Institute of Microbiology and Infection (IMI), School of Biosciences, University of Birmingham, Edgbaston, Birmingham, United Kingdom
| | - Nicholas J. Loman
- Institute of Microbiology and Infection (IMI), School of Biosciences, University of Birmingham, Edgbaston, Birmingham, United Kingdom
| | - Lluis Ballell
- Diseases of the Developing World, GlaxoSmithKline, Severo Ochoa 2, 28760 Tres Cantos, Madrid, Spain
| | - David Barros
- Diseases of the Developing World, GlaxoSmithKline, Severo Ochoa 2, 28760 Tres Cantos, Madrid, Spain
| | - John P. Overington
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
- * E-mail: (JPO); (GSB)
| | - Gurdyal S. Besra
- Institute of Microbiology and Infection (IMI), School of Biosciences, University of Birmingham, Edgbaston, Birmingham, United Kingdom
- * E-mail: (JPO); (GSB)
| |
Collapse
|
23
|
Clark AM, Sarker M, Ekins S. New target prediction and visualization tools incorporating open source molecular fingerprints for TB Mobile 2.0. J Cheminform 2014; 6:38. [PMID: 25302078 PMCID: PMC4190048 DOI: 10.1186/s13321-014-0038-2] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2014] [Accepted: 06/30/2014] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND We recently developed a freely available mobile app (TB Mobile) for both iOS and Android platforms that displays Mycobacterium tuberculosis (Mtb) active molecule structures and their targets with links to associated data. The app was developed to make target information available to as large an audience as possible. RESULTS We now report a major update of the iOS version of the app. This includes enhancements that use an implementation of ECFP_6 fingerprints that we have made open source. Using these fingerprints, the user can propose compounds with possible anti-TB activity, and view the compounds within a cluster landscape. Proposed compounds can also be compared to existing target data, using a näive Bayesian scoring system to rank probable targets. We have curated an additional 60 new compounds and their targets for Mtb and added these to the original set of 745 compounds. We have also curated 20 further compounds (many without targets in TB Mobile) to evaluate this version of the app with 805 compounds and associated targets. CONCLUSIONS TB Mobile can now manage a small collection of compounds that can be imported from external sources, or exported by various means such as email or app-to-app inter-process communication. This means that TB Mobile can be used as a node within a growing ecosystem of mobile apps for cheminformatics. It can also cluster compounds and use internal algorithms to help identify potential targets based on molecular similarity. TB Mobile represents a valuable dataset, data-visualization aid and target prediction tool.
Collapse
Affiliation(s)
- Alex M Clark
- Molecular Materials Informatics, 1900 St. Jacques #302, Montreal H3J 2S1, Quebec, Canada
| | - Malabika Sarker
- SRI International, 333 Ravenswood Avenue, Menlo Park 94025, CA, USA
| | - Sean Ekins
- Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame 94010, CA, USA
- Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina 27526, NC, USA
| |
Collapse
|
24
|
Ekins S, Freundlich JS, Reynolds RC. Are bigger data sets better for machine learning? Fusing single-point and dual-event dose response data for Mycobacterium tuberculosis. J Chem Inf Model 2014; 54:2157-65. [PMID: 24968215 DOI: 10.1021/ci500264r] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Tuberculosis is a major, neglected disease for which the quest to find new treatments continues. There is an abundance of data from large phenotypic screens in the public domain against Mycobacterium tuberculosis (Mtb). Since machine learning methods can learn from past data, we were interested in addressing whether more data builds better models. We now describe using Bayesian machine learning to assess whether we can improve our models by combining the large quantities of single-point data with the much smaller (higher quality) dual-event data sets, which use both dose-response data for both whole-cell antitubercular activity and Vero cell cytotoxicity. We have evaluated 12 models ranging from different single-point, dual-event dose-response, single-point and dual-event dose-response as well as combined data sets for three distinct data sets from the same laboratory. We used a fourth data set of active and inactive compounds from the same group as well as a smaller set of 177 active compounds from GlaxoSmithKline as test sets. Our data suggest combining single-point with dual-event dose-response data does not diminish the internal or external predictive ability of the models based on the receiver operator curve (ROC) for these models (internal ROC range 0.83-0.91, external ROC range 0.62-0.83) compared to the orders of magnitude smaller dual-event models (internal ROC range 0.6-0.83 and external ROC 0.54-0.83). In conclusion, models developed with 1200-5000 compounds appear to be as predictive as those generated with 25 000-350 000 molecules. Our results have implications for justifying further high-throughput screening versus focused testing based on model predictions.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations in Chemistry , 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States
| | | | | |
Collapse
|
25
|
Ekins S, Pottorf R, Reynolds R, Williams AJ, Clark AM, Freundlich JS. Looking back to the future: predicting in vivo efficacy of small molecules versus Mycobacterium tuberculosis. J Chem Inf Model 2014; 54:1070-82. [PMID: 24665947 PMCID: PMC4004261 DOI: 10.1021/ci500077v] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2014] [Indexed: 02/07/2023]
Abstract
Selecting and translating in vitro leads for a disease into molecules with in vivo activity in an animal model of the disease is a challenge that takes considerable time and money. As an example, recent years have seen whole-cell phenotypic screens of millions of compounds yielding over 1500 inhibitors of Mycobacterium tuberculosis (Mtb). These must be prioritized for testing in the mouse in vivo assay for Mtb infection, a validated model utilized to select compounds for further testing. We demonstrate learning from in vivo active and inactive compounds using machine learning classification models (Bayesian, support vector machines, and recursive partitioning) consisting of 773 compounds. The Bayesian model predicted 8 out of 11 additional in vivo actives not included in the model as an external test set. Curation of 70 years of Mtb data can therefore provide statistically robust computational models to focus resources on in vivo active small molecule antituberculars. This highlights a cost-effective predictor for in vivo testing elsewhere in other diseases.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborative
Drug Discovery, 1633
Bayshore Highway, Suite 342, Burlingame, California 94010, United States
- Collaborations
in Chemistry, 5616 Hilltop
Needmore Road, Fuquay-Varina, North Carolina 27526, United States
| | - Richard Pottorf
- Department
of Pharmacology & Physiology, Rutgers
University − New Jersey Medical School, 185 South Orange Avenue, Newark, New Jersey 07103, United States
| | - Robert
C. Reynolds
- Department
of Chemistry, University of Alabama at Birmingham, 1530 Third Avenue South, Birmingham, Alabama 35294-1240, United States
| | - Antony J. Williams
- Royal
Society of Chemistry, 904 Tamaras Circle, Wake Forest, North Carolina 27587, United States
| | - Alex M. Clark
- Molecular
Materials Informatics, 1900 St. Jacques #302, Montreal, Quebec, Canada H3J 2S1
| | - Joel S. Freundlich
- Department
of Pharmacology & Physiology, Rutgers
University − New Jersey Medical School, 185 South Orange Avenue, Newark, New Jersey 07103, United States
- Department
of Medicine, Center for Emerging and Reemerging
Pathogens, Rutgers University − New
Jersey Medical School, 185 South Orange Avenue, Newark, New Jersey 07103, United States
| |
Collapse
|