1
|
Zin PPK. Paths to cheminformatics: Q&A with Phyo Phyo Kyaw Zin. J Cheminform 2023; 15:21. [PMID: 36782266 PMCID: PMC9923633 DOI: 10.1186/s13321-022-00668-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2023] Open
|
2
|
Gawriljuk VO, Zin PPK, Puhl AC, Zorn KM, Foil DH, Lane TR, Hurst B, Tavella TA, Costa FTM, Lakshmanane P, Bernatchez J, Godoy AS, Oliva G, Siqueira-Neto JL, Madrid PB, Ekins S. Machine Learning Models Identify Inhibitors of SARS-CoV-2. J Chem Inf Model 2021; 61:4224-4235. [PMID: 34387990 DOI: 10.1021/acs.jcim.1c00683] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
With the rapidly evolving SARS-CoV-2 variants of concern, there is an urgent need for the discovery of further treatments for the coronavirus disease (COVID-19). Drug repurposing is one of the most rapid strategies for addressing this need, and numerous compounds have already been selected for in vitro testing by several groups. These have led to a growing database of molecules with in vitro activity against the virus. Machine learning models can assist drug discovery through prediction of the best compounds based on previously published data. Herein, we have implemented several machine learning methods to develop predictive models from recent SARS-CoV-2 in vitro inhibition data and used them to prioritize additional FDA-approved compounds for in vitro testing selected from our in-house compound library. From the compounds predicted with a Bayesian machine learning model, lumefantrine, an antimalarial was selected for testing and showed limited antiviral activity in cell-based assays while demonstrating binding (Kd 259 nM) to the spike protein using microscale thermophoresis. Several other compounds which we prioritized have since been tested by others and were also found to be active in vitro. This combined machine learning and in vitro testing approach can be expanded to virtually screen available molecules with predicted activity against SARS-CoV-2 reference WIV04 strain and circulating variants of concern. In the process of this work, we have created multiple iterations of machine learning models that can be used as a prioritization tool for SARS-CoV-2 antiviral drug discovery programs. The very latest model for SARS-CoV-2 with over 500 compounds is now freely available at www.assaycentral.org.
Collapse
Affiliation(s)
- Victor O Gawriljuk
- São Carlos Institute of Physics, University of São Paulo, Av. João Dagnone, 1100-Santa Angelina, São Carlos, São Paulo 13563-120, Brazil
| | - Phyo Phyo Kyaw Zin
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Ana C Puhl
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Kimberley M Zorn
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Daniel H Foil
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Thomas R Lane
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Brett Hurst
- Institute for Antiviral Research, Utah State University, Logan, Utah 84322-5600, United States.,Department of Animal, Dairy and Veterinary Sciences, Utah State University, Logan, Utah 84322-4815, United States
| | - Tatyana Almeida Tavella
- Laboratory of Tropical Diseases-Prof. Dr. Luiz Jacinto da Silva, Department of Genetics, Evolution, Microbiology and Immunology, University of Campinas-UNICAMP, Campinas, São Paulo, Brazil
| | - Fabio Trindade Maranhão Costa
- Laboratory of Tropical Diseases-Prof. Dr. Luiz Jacinto da Silva, Department of Genetics, Evolution, Microbiology and Immunology, University of Campinas-UNICAMP, Campinas, São Paulo, Brazil
| | - Premkumar Lakshmanane
- Department of Microbiology and Immunology, University of North Carolina School of Medicine, Chapel Hill North Carolina 27599, United States
| | - Jean Bernatchez
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, San Diego, California 92093, United States
| | - Andre S Godoy
- São Carlos Institute of Physics, University of São Paulo, Av. João Dagnone, 1100-Santa Angelina, São Carlos, São Paulo 13563-120, Brazil
| | - Glaucius Oliva
- São Carlos Institute of Physics, University of São Paulo, Av. João Dagnone, 1100-Santa Angelina, São Carlos, São Paulo 13563-120, Brazil
| | - Jair L Siqueira-Neto
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, San Diego, California 92093, United States
| | - Peter B Madrid
- SRI International, 333 Ravenswood Avenue, Menlo Park, California 94025, United States
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| |
Collapse
|
3
|
Abstract
Imatinib, a 2-phenylaminopyridine-based BCR-ABL tyrosine kinase inhibitor, is a highly effective drug for treating Chronic Myeloid Leukemia (CML). However, cases of drug resistance are constantly emerging due to various mutations in the ABL kinase domain; thus, it is crucial to identify novel bioactive analogues. Reliable QSAR models and molecular docking protocols have been shown to facilitate the discovery of new compounds from chemical libraries prior to experimental testing. However, as the vast majority of QSAR models strictly relies on 2D descriptors, the rise of 3D descriptors directly computed from molecular dynamics simulations offers new opportunities to potentially augment the reliability of QSAR models. Herein, we employed molecular docking and molecular dynamics on a large series of Imatinib derivatives and developed an ensemble of QSAR models relying on deep neural nets (DNN) and hybrid sets of 2D/3D/MD descriptors in order to predict the binding affinity and inhibition potencies of those compounds. Through rigorous validation tests, we showed that our DNN regression models achieved excellent external prediction performances for the pKi data set (n = 555, R2 ≥ 0.71. and MAE ≤ 0.85), and the pIC50 data set (n = 306, R2 ≥ 0.54. and MAE ≤ 0.71) with strict validation protocols based on external test sets and 10-fold native and nested cross validations. Interestingly, the best DNN and random forest models performed similarly across all descriptor sets. In fact, for this particular series of compounds, our external test results suggest that incorporating additional 3D protein-ligand binding site fingerprint, descriptors, or even MD time-series descriptors did not significantly improve the overall R2 but lowered the MAE of DNN QSAR models. Those augmented models could still help in identifying and understanding the key dynamic protein-ligand interactions to be optimized for further molecular design.
Collapse
Affiliation(s)
- Phyo Phyo Kyaw Zin
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina 27695, United States
| | - Alexandre Borrel
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina 27695, United States
| | - Denis Fourches
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina 27695, United States
| |
Collapse
|
4
|
Abstract
Tuberculosis (TB) continues to claim the lives of around 1.7 million people per year. Most concerning are the reports of multidrug drug resistance. Paradoxically, this global health pandemic is demanding new therapies when resources and interest are waning. However, continued tuberculosis drug discovery is critical to address the global health need and burgeoning multidrug resistance. Many diverse classes of antitubercular compounds have been identified with activity in vitro and in vivo. Our analyses of over 100 active leads are representative of thousands of active compounds generated over the past decade, suggests that they come from few chemical classes or natural product sources. We are therefore repeatedly identifying compounds that are similar to those that preceded them. Our molecule-centered cheminformatics analyses point to the need to dramatically increase the diversity of chemical libraries tested and get outside of the historic Mtb property space if we are to generate novel improved antitubercular leads.
Collapse
Affiliation(s)
- Vadim Makarov
- FRC Fundamentals of Biotechnology, Russian Academy of Science, Moscow 119071, Russia
| | - Elena Salina
- FRC Fundamentals of Biotechnology, Russian Academy of Science, Moscow 119071, Russia
| | - Robert C Reynolds
- Department of Medicine, Division of Hematology and Oncology, University of Alabama at Birmingham, NP 2540 J, 1720 Second Avenue South, Birmingham, Alabama 35294-3300, United States
| | - Phyo Phyo Kyaw Zin
- Department of Chemistry, North Carolina State University, Raleigh, North Carolina 27695, United States.,Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina 27695, United States
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510 Raleigh, North Carolina 27606, United States
| |
Collapse
|
5
|
Abstract
Macrolactones, macrocyclic lactones with at least twelve atoms within the core ring, include diverse natural products such as macrolides with potent bioactivities (e.g. antibiotics) and useful drug-like characteristics. We have developed MacrolactoneDB, which integrates nearly 14,000 existing macrolactones and their bioactivity information from different public databases, and new molecular descriptors to better characterize macrolide structures. The chemical distribution of MacrolactoneDB was analyzed in terms of important molecular properties and we have utilized three targets of interest (Plasmodium falciparum, Hepatitis C virus and T-cells) to demonstrate the value of compiling this data. Regression machine learning models were generated to predict biological endpoints using seven molecular descriptor sets and eight machine learning algorithms. Our results show that merging descriptors yields the best predictive power with Random Forest models, often boosted by consensus or hybrid modeling approaches. Our study provides cheminformatics insights into this privileged, underexplored structural class of compounds with high therapeutic potential.
Collapse
Affiliation(s)
- Phyo Phyo Kyaw Zin
- Department of Chemistry, North Carolina State University, Raleigh, NC, USA
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA
| | - Gavin J Williams
- Department of Chemistry, North Carolina State University, Raleigh, NC, USA
- Comparative Medicine Institute, North Carolina State University, Raleigh, NC, USA
| | - Sean Ekins
- Comparative Medicine Institute, North Carolina State University, Raleigh, NC, USA.
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA.
| |
Collapse
|
6
|
Zin PPK, Williams G, Fourches D. SIME: synthetic insight-based macrolide enumerator to generate the V1B library of 1 billion macrolides. J Cheminform 2020; 12:23. [PMID: 33431002 PMCID: PMC7146965 DOI: 10.1186/s13321-020-00427-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Accepted: 03/27/2020] [Indexed: 11/24/2022] Open
Abstract
We report on a new cheminformatics enumeration technology—SIME, synthetic insight-based macrolide enumerator—a new and improved software technology. SIME can enumerate fully assembled macrolides with synthetic feasibility by utilizing the constitutional and structural knowledge extracted from biosynthetic aspects of macrolides. Taken into account by the software are key information such as positions in macrolide structures at which chemical components can be inserted, and the types of structural motifs and sugars of interest that can be synthesized and incorporated at those positions. Additionally, we report on the chemical distribution analysis of the newly SIME-generated V1B (virtual 1 billion) library of macrolides. Those compounds were built based on the core of the Erythromycin structure, 13 structural motifs and a library of sugars derived from eighteen bioactive macrolides. This new enumeration technology can be coupled with cheminformatics approaches such as QSAR modeling and molecular docking to aid in drug discovery for rational designing of next generation macrolide therapeutics with desirable pharmacokinetic properties.![]()
Collapse
Affiliation(s)
- Phyo Phyo Kyaw Zin
- Department of Chemistry, North Carolina State University, Raleigh, NC, USA.,Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA
| | - Gavin Williams
- Department of Chemistry, North Carolina State University, Raleigh, NC, USA.,Comparative Medicine Institute, North Carolina State University, Raleigh, NC, USA
| | - Denis Fourches
- Department of Chemistry, North Carolina State University, Raleigh, NC, USA. .,Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA. .,Comparative Medicine Institute, North Carolina State University, Raleigh, NC, USA.
| |
Collapse
|
7
|
Zin PPK, Williams G, Fourches D. Cheminformatics-based enumeration and analysis of large libraries of macrolide scaffolds. J Cheminform 2018; 10:53. [PMID: 30421084 PMCID: PMC6755550 DOI: 10.1186/s13321-018-0307-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Accepted: 11/01/2018] [Indexed: 11/10/2022] Open
Abstract
We report on the development of a cheminformatics enumeration technology and the analysis of a resulting large dataset of virtual macrolide scaffolds. Although macrolides have been shown to have valuable biological properties, there is no ready-to-screen virtual library of diverse macrolides in the public domain. Conducting molecular modeling (especially virtual screening) of these complex molecules is highly relevant as the organic synthesis of these compounds, when feasible, typically requires many synthetic steps, and thus dramatically slows the discovery of new bioactive macrolides. Herein, we introduce a cheminformatics approach and associated software that allows for designing and generating libraries of virtual macrocycle/macrolide scaffolds with user-defined constitutional and structural constraints (e.g., types and numbers of structural motifs to be included in the macrocycle, ring size, maximum number of compounds generated). To study the chemical diversity of such generated molecules, we enumerated V1M (Virtual 1 million Macrolide scaffolds) library, each containing twelve common structural motifs. For each macrolide scaffold, we calculated several key properties, such as molecular weight, hydrogen bond donors/acceptors, topological polar surface area. In this study, we discuss (1) the initial concept and current features of our PKS (polyketides) Enumerator software, (2) the chemical diversity and distribution of structural motifs in V1M library, and (3) the unique opportunities for future virtual screening of such enumerated ensembles of macrolides. Importantly, V1M is provided in the Supplementary Material of this paper allowing other researchers to conduct any type of molecular modeling and virtual screening studies. Therefore, this technology for enumerating extremely large libraries of macrolide scaffolds could hold a unique potential in the field of computational chemistry and drug discovery for rational designing of new antibiotics and anti-cancer agents.
Collapse
Affiliation(s)
- Phyo Phyo Kyaw Zin
- Department of Chemistry, North Carolina State University, Raleigh, NC, USA
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA
| | - Gavin Williams
- Department of Chemistry, North Carolina State University, Raleigh, NC, USA
- Comparative Medicine Institute, North Carolina State University, Raleigh, NC, USA
| | - Denis Fourches
- Department of Chemistry, North Carolina State University, Raleigh, NC, USA.
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA.
- Comparative Medicine Institute, North Carolina State University, Raleigh, NC, USA.
| |
Collapse
|