1
|
Zhang Y, Zhang Z, Ke D, Pan X, Wang X, Xiao X, Ji C. FragGrow: A Web Server for Structure-Based Drug Design by Fragment Growing within Constraints. J Chem Inf Model 2024; 64:3970-3976. [PMID: 38725251 DOI: 10.1021/acs.jcim.4c00154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2024]
Abstract
Fragment growing is an important ligand design strategy in drug discovery. In this study, we present FragGrow, a web server that facilitates structure-based drug design by fragment growing. FragGrow offers two working modes: one for growing molecules through the direct replacement of hydrogen atoms or substructures and the other for growing via virtual synthesis. FragGrow works by searching for suitable fragments that meet a set of constraints from an indexed 3D fragment database and using them to create new compounds in 3D space. The users can set a range of constraints when searching for their desired fragment, including the fragment's ability to interact with specific protein sites; its size, topology, and physicochemical properties; and the presence of particular heteroatoms and functional groups within the fragment. We hope that FragGrow will serve as a useful tool for medicinal chemists in ligand design. The FragGrow server is freely available to researchers and can be accessed at https://fraggrow.xundrug.cn.
Collapse
Affiliation(s)
- Yueqing Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics & New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai, 200062, China
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, 200062, China
| | - Zhihan Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics & New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai, 200062, China
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, 200062, China
| | - Dongliang Ke
- Shanghai Engineering Research Center of Molecular Therapeutics & New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai, 200062, China
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, 200062, China
| | - Xiaolin Pan
- Shanghai Engineering Research Center of Molecular Therapeutics & New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai, 200062, China
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, 200062, China
| | - Xingyu Wang
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, 200062, China
| | - Xudong Xiao
- Shanghai Engineering Research Center of Molecular Therapeutics & New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai, 200062, China
| | - Changge Ji
- Shanghai Engineering Research Center of Molecular Therapeutics & New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai, 200062, China
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, 200062, China
| |
Collapse
|
2
|
Gadiya Y, Shetty S, Hofmann-Apitius M, Gribbon P, Zaliani A. Exploring SureChEMBL from a drug discovery perspective. Sci Data 2024; 11:507. [PMID: 38755219 PMCID: PMC11099139 DOI: 10.1038/s41597-024-03371-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Accepted: 05/13/2024] [Indexed: 05/18/2024] Open
Abstract
In the pharmaceutical industry, the patent protection of drugs and medicines is accorded importance because of the high costs involved in the development of novel drugs. Over the years, researchers have analyzed patent documents to identify freedom-to-operate spaces for novel drug candidates. To assist this, several well-established public patent document data repositories have enabled automated methodologies for extracting information on therapeutic agents. In this study, we delve into one such publicly available patent database, SureChEMBL, which catalogues patent documents related to life sciences. Our exploration begins by identifying patent compounds across public chemical data resources, followed by pinpointing sections in patent documents where the chemical annotations were found. Next, we exhibit the potential of compounds to serve as drug candidates by evaluating their conformity to drug-likeness criteria. Lastly, we examine the drug development stage reported for these compounds to understand their clinical success. In summary, our investigation aims at providing a comprehensive overview of the patent compounds catalogued in SureChEMBL, assessing their relevance to pharmaceutical drug discovery.
Collapse
Affiliation(s)
- Yojana Gadiya
- Fraunhofer Institute for Translational Medicine and Pharmacology (ITMP), Schnackenburgallee 114, 22525, Hamburg, Germany.
- Fraunhofer Cluster of Excellence for Immune-Mediated Diseases (CIMD), Theodor Stern Kai 7, 60590, Frankfurt, Germany.
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53113, Bonn, Germany.
| | - Simran Shetty
- Fraunhofer Institute for Translational Medicine and Pharmacology (ITMP), Schnackenburgallee 114, 22525, Hamburg, Germany
- Fraunhofer Cluster of Excellence for Immune-Mediated Diseases (CIMD), Theodor Stern Kai 7, 60590, Frankfurt, Germany
- Hamburg University of Applied Sciences (HAW), 20099, Hamburg, Germany
| | - Martin Hofmann-Apitius
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53113, Bonn, Germany
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53757, Sankt Augustin, Germany
| | - Philip Gribbon
- Fraunhofer Institute for Translational Medicine and Pharmacology (ITMP), Schnackenburgallee 114, 22525, Hamburg, Germany
- Fraunhofer Cluster of Excellence for Immune-Mediated Diseases (CIMD), Theodor Stern Kai 7, 60590, Frankfurt, Germany
| | - Andrea Zaliani
- Fraunhofer Institute for Translational Medicine and Pharmacology (ITMP), Schnackenburgallee 114, 22525, Hamburg, Germany
- Fraunhofer Cluster of Excellence for Immune-Mediated Diseases (CIMD), Theodor Stern Kai 7, 60590, Frankfurt, Germany
| |
Collapse
|
3
|
Oprea TI, Bologa C, Holmes J, Mathias S, Metzger VT, Waller A, Yang JJ, Leach AR, Jensen LJ, Kelleher KJ, Sheils TK, Mathé E, Avram S, Edwards JS. Overview of the Knowledge Management Center for Illuminating the Druggable Genome. Drug Discov Today 2024; 29:103882. [PMID: 38218214 PMCID: PMC10939799 DOI: 10.1016/j.drudis.2024.103882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 12/22/2023] [Accepted: 01/09/2024] [Indexed: 01/15/2024]
Abstract
The Knowledge Management Center (KMC) for the Illuminating the Druggable Genome (IDG) project aims to aggregate, update, and articulate protein-centric data knowledge for the entire human proteome, with emphasis on the understudied proteins from the three IDG protein families. KMC collates and analyzes data from over 70 resources to compile the Target Central Resource Database (TCRD), which is the web-based informatics platform (Pharos). These data include experimental, computational, and text-mined information on protein structures, compound interactions, and disease and phenotype associations. Based on this knowledge, proteins are classified into different Target Development Levels (TDLs) for identification of understudied targets. Additional work by the KMC focuses on enriching target knowledge and producing DrugCentral and other data visualization tools for expanding investigation of understudied targets.
Collapse
Affiliation(s)
- Tudor I Oprea
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Cristian Bologa
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Jayme Holmes
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Stephen Mathias
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Vincent T Metzger
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Anna Waller
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Jeremy J Yang
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Andrew R Leach
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Keith J Kelleher
- National Center for Advancing Translational Sciences (NCATS), NIH, Bethesda, MD, USA
| | - Timothy K Sheils
- National Center for Advancing Translational Sciences (NCATS), NIH, Bethesda, MD, USA
| | - Ewy Mathé
- National Center for Advancing Translational Sciences (NCATS), NIH, Bethesda, MD, USA
| | - Sorin Avram
- Coriolan Dragulescu Institute of Chemistry, Timisoara, Romania
| | - Jeremy S Edwards
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA; Department of Chemistry and Chemical Biology, University of New Mexico, Albuquerque, NM, USA.
| |
Collapse
|
4
|
Gómez-Sacristán P, Simeon S, Tran-Nguyen VK, Patil S, Ballester PJ. Inactive-enriched machine-learning models exploiting patent data improve structure-based virtual screening for PDL1 dimerizers. J Adv Res 2024:S2090-1232(24)00037-7. [PMID: 38280715 DOI: 10.1016/j.jare.2024.01.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 12/01/2023] [Accepted: 01/21/2024] [Indexed: 01/29/2024] Open
Abstract
INTRODUCTION Small-molecule Programmable Cell Death Protein 1/Programmable Death-Ligand 1 (PD1/PDL1) inhibition via PDL1 dimerization has the potential to lead to inexpensive drugs with better cancer patient outcomes and milder side effects. However, this therapeutic approach has proven challenging, with only one PDL1 dimerizer reaching early clinical trials so far. There is hence a need for fast and accurate methods to develop alternative PDL1 dimerizers. OBJECTIVES We aim to show that structure-based virtual screening (SBVS) based on PDL1-specific machine-learning (ML) scoring functions (SFs) is a powerful drug design tool for detecting PD1/PDL1 inhibitors via PDL1 dimerization. METHODS By incorporating the latest MLSF advances, we generated and evaluated PDL1-specific MLSFs (classifiers and inactive-enriched regressors) on two demanding test sets. RESULTS 60 PDL1-specific MLSFs (30 classifiers and 30 regressors) were generated. Our large-scale analysis provides highly predictive PDL1-specific MLSFs that benefitted from training with large volumes of docked inactives and enabling inactive-enriched regression. CONCLUSION PDL1-specific MLSFs strongly outperformed generic SFs of various types on this target and are released here without restrictions.
Collapse
Affiliation(s)
| | - Saw Simeon
- Centre de Recherche en Cancérologie de Marseille, Marseille 13009, France
| | | | - Sachin Patil
- NanoBio Laboratory, Widener University, Chester, PA 19013, USA
| | - Pedro J Ballester
- Department of Bioengineering, Imperial College London, London SW7 2AZ, UK.
| |
Collapse
|
5
|
Zhu TF, Qian R, Wei X, Lu AP, Cao DS. PatentNetML: A Novel Framework for Predicting Key Compounds in Patents Using Network Science and Machine Learning. J Med Chem 2024; 67:1347-1359. [PMID: 38181431 DOI: 10.1021/acs.jmedchem.3c01893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2024]
Abstract
Patents play a crucial role in drug research and development, providing early access to unpublished data and offering unique insights. Identifying key compounds in patents is essential to finding novel lead compounds. This study collected a comprehensive data set comprising 1555 patents, encompassing 1000 key compounds, to explore innovative approaches for predicting these key compounds. Our novel PatentNetML framework integrated network science and machine learning algorithms, combining network measures, ADMET properties, and physicochemical properties, to construct robust classification models to identify key compounds. Through a model interpretation and an analysis of three compelling case studies, we showcase the potential of PatentNetML in unveiling hidden patterns and connections within diverse patents. While our framework is pioneering, we acknowledge its limitations when applied to patents that deviate from the assumed central pattern. This work serves as a promising foundation for future research endeavors aimed at efficiently identifying promising drug candidates and expediting drug discovery in the pharmaceutical industry.
Collapse
Affiliation(s)
- Ting-Fei Zhu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410003, Hunan, China
- School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, China
| | - Rong Qian
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410003, Hunan, China
- School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, China
| | - Xiao Wei
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410003, Hunan, China
| | - Ai-Ping Lu
- School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, China
- Guangdong-Hong Kong-Macau Joint Lab on Chinese Medicine and Immune Disease Research, Guangzhou 510000, China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410003, Hunan, China
- School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, China
| |
Collapse
|
6
|
Zdrazil B, Felix E, Hunter F, Manners EJ, Blackshaw J, Corbett S, de Veij M, Ioannidis H, Lopez DM, Mosquera J, Magarinos M, Bosc N, Arcila R, Kizilören T, Gaulton A, Bento A, Adasme M, Monecke P, Landrum G, Leach A. The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Res 2024; 52:D1180-D1192. [PMID: 37933841 PMCID: PMC10767899 DOI: 10.1093/nar/gkad1004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Revised: 10/09/2023] [Accepted: 10/23/2023] [Indexed: 11/08/2023] Open
Abstract
ChEMBL (https://www.ebi.ac.uk/chembl/) is a manually curated, high-quality, large-scale, open, FAIR and Global Core Biodata Resource of bioactive molecules with drug-like properties, previously described in the 2012, 2014, 2017 and 2019 Nucleic Acids Research Database Issues. Since its introduction in 2009, ChEMBL's content has changed dramatically in size and diversity of data types. Through incorporation of multiple new datasets from depositors since the 2019 update, ChEMBL now contains slightly more bioactivity data from deposited data vs data extracted from literature. In collaboration with the EUbOPEN consortium, chemical probe data is now regularly deposited into ChEMBL. Release 27 made curated data available for compounds screened for potential anti-SARS-CoV-2 activity from several large-scale drug repurposing screens. In addition, new patent bioactivity data have been added to the latest ChEMBL releases, and various new features have been incorporated, including a Natural Product likeness score, updated flags for Natural Products, a new flag for Chemical Probes, and the initial annotation of the action type for ∼270 000 bioactivity measurements.
Collapse
Affiliation(s)
- Barbara Zdrazil
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Eloy Felix
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Fiona Hunter
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Emma J Manners
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - James Blackshaw
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Sybilla Corbett
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Marleen de Veij
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Harris Ioannidis
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - David Mendez Lopez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Juan F Mosquera
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Maria Paula Magarinos
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Nicolas Bosc
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Ricardo Arcila
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Tevfik Kizilören
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Anna Gaulton
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - A Patrícia Bento
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Melissa F Adasme
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Peter Monecke
- Sanofi, R&D, Preclinical Safety, Industriepark Höchst, 65926 Frankfurt am Main, Germany
| | - Gregory A Landrum
- Department of Chemistry and Applied Biosciences, ETH Zürich, 8093 Zürich, Switzerland
| | - Andrew R Leach
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
7
|
Martinez-Sevillano M, Falaguera MJ, Mestres J. CIPSI: An open chemical intellectual property service for medicinal chemists. Mol Inform 2024; 43:e202300221. [PMID: 38010631 DOI: 10.1002/minf.202300221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 11/22/2023] [Accepted: 11/23/2023] [Indexed: 11/29/2023]
Abstract
The availability of patent chemical data offers public access to a chemical space that is not well covered by other sources collecting small molecules from scholarly literature. However, open applications to facilitate the search and analysis of biologically-relevant molecular structures present in patents are still largely missing. We have developed CIPSI, an open Chemical Intellectual Property Service @ IMIM to assist medicinal chemists in searching and analysing molecules in SureChEMBL patents. The current version contains 6,240,500 molecules from 236,689 pharmacological patents, of which 5,949,214 are confidently assigned to core chemical structures reminiscent of the Markush structure in the patent claim. The platform includes some graphical tools to facilitate comparative patent analyses between drugs, chemical substructures, and company assignees. CIPSI is available at https://cipsi.org.
Collapse
Affiliation(s)
- Maria Martinez-Sevillano
- Systems Pharmacology, Research Group on Biomedical Informatics (GRIB), IMIM Hospital del Mar Medical Research Institute, Doctor Aiguader 88, 08028, Barcelona, Spain
| | - Maria J Falaguera
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
- Open Targets, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Jordi Mestres
- Systems Pharmacology, Research Group on Biomedical Informatics (GRIB), IMIM Hospital del Mar Medical Research Institute, Doctor Aiguader 88, 08028, Barcelona, Spain
- Institut de Quimica Computacional i Catalisi, Facultat de Ciencies, Universitat de Girona, Maria Aurelia Capmany 69, 17003, Girona, Spain
| |
Collapse
|
8
|
Laha A, Sarkar A, Panja AS, Bandopadhyay R. Screening of Prospective Antiallergic Compound as FcεRI Inhibitors and Its Antiallergic Efficacy Through Immunoinformatics Approaches. Mol Biotechnol 2024; 66:26-33. [PMID: 36988875 DOI: 10.1007/s12033-023-00728-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Accepted: 03/21/2023] [Indexed: 03/30/2023]
Abstract
The occurrence of allergy, a type I hypersensitivity reaction, is rising exponentially all over the world. Sometimes, allergy proves to be fatal for atopic patients, due to the occurrence of anaphylaxis. This study is aimed to find an anti-allergic agent that can inhibit the binding of IgE to Human High Affinity IgE Receptor (FCεRI), thereby preventing the degranulation of mast cells. A considerable number of potential anti-allergic compounds were assessed for their inhibitory strength through ADMET studies. AUTODOCK was used for estimating the binding energy between anti-allergic compounds and FCεRI, along with the interacting amino acids. The docked pose showing favorable binding energy was subjected to molecular dynamics simulation study. Marrubiin, a diterpenoid lactone from Lamiaceae, and epicatechin-3-gallate appears to be effective in blocking the Human High Affinity IgE Receptor (FCεRI). This in-silico study proposes the use of marrubiin and epicatechin-3-gallate, in the downregulation of allergic responses. Due to the better inhibition constant, future direction of this study is to analyze the safety and efficacy of marrubiin in anti-allergic activities through in-vivo clinical human trials.
Collapse
Affiliation(s)
- Anubhab Laha
- UGC Centre for Advanced Study, Department of Botany, The University of Burdwan, Golapbag, Burdwan, West Bengal, 713104, India
- Department of Botany, Chandernagore College, Chandernagore, Hooghly, West Bengal, 712136, India
| | - Aniket Sarkar
- Post-Graduate Department of Biotechnology, Oriental Institute of Science and Technology, Vidyasagar University, Midnapore, West Bengal, India
| | - Anindya Sundar Panja
- Department of Biotechnology, Molecular Informatics Laboratory, Oriental Institute of Science and Technology, Vidyasagar University, Midnapore, West Bengal, 721102, India
| | - Rajib Bandopadhyay
- UGC Centre for Advanced Study, Department of Botany, The University of Burdwan, Golapbag, Burdwan, West Bengal, 713104, India.
| |
Collapse
|
9
|
Kosonocky CW, Wilke CO, Marcotte EM, Ellington AD. Mining Patents with Large Language Models Elucidates the Chemical Function Landscape. ARXIV 2023:arXiv:2309.08765v2. [PMID: 38196747 PMCID: PMC10775343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]
Abstract
The fundamental goal of small molecule discovery is to generate chemicals with target functionality. While this often proceeds through structure-based methods, we set out to investigate the practicality of orthogonal methods that leverage the extensive corpus of chemical literature. We hypothesize that a sufficiently large text-derived chemical function dataset would mirror the actual landscape of chemical functionality. Such a landscape would implicitly capture complex physical and biological interactions given that chemical function arises from both a molecule's structure and its interacting partners. To evaluate this hypothesis, we built a Chemical Function (CheF) dataset of patent-derived functional labels. This dataset, comprising 631K molecule-function pairs, was created using an LLM- and embedding-based method to obtain functional labels for approximately 100K molecules from their corresponding 188K unique patents. We carry out a series of analyses demonstrating that the CheF dataset contains a semantically coherent textual representation of the functional landscape congruent with chemical structural relationships, thus approximating the actual chemical function landscape. We then demonstrate that this text-based functional landscape can be leveraged to identify drugs with target functionality using a model able to predict functional profiles from structure alone. We believe that functional label-guided molecular discovery may serve as an orthogonal approach to traditional structure-based methods in the pursuit of designing novel functional molecules.
Collapse
|
10
|
Insana G, Ignatchenko A, Martin M, Bateman A. MBDBMetrics: an online metrics tool to measure the impact of biological data resources. BIOINFORMATICS ADVANCES 2023; 3:vbad180. [PMID: 38130879 PMCID: PMC10733715 DOI: 10.1093/bioadv/vbad180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 11/13/2023] [Indexed: 12/23/2023]
Abstract
Motivation There now exist thousands of molecular biology databases covering every aspect of biological data. This database infrastructure takes significant effort and funding to develop and maintain. The creators of these databases need to make strong justifications to funders to prove their impact or importance. There are many publication metrics and tools available such as Google Scholar to measure citation impact or AltMetrics covering multiple measures including social media coverage. Results In this article, we describe a series of novel impact metrics that have been applied initially to the UniProt database, and now made available via a Google Colab to enable any molecular biology resource to gain several additional metrics. These metrics, powered by freely available APIs from Europe PubMedCentral and SureCHEMBL cover mentions of the resource in full text articles, including which section of the paper the mention occurs in, grant acknowledgements and mentions in patent applications. This tool, that we call MBDBMetrics, is a useful adjunct to existing tools. Availability and implementation The MBDBMetrics tool is available at the following locations: https://colab.research.google.com/drive/1aEmSQR9DGQIZmHAIuQV9mLv7Mw9Ppkin and https://github.com/g-insana/MBDBMetrics.
Collapse
Affiliation(s)
- Giuseppe Insana
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| | - Alex Ignatchenko
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| | - Maria Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| |
Collapse
|
11
|
Shimizu Y, Ohta M, Ishida S, Terayama K, Osawa M, Honma T, Ikeda K. AI-driven molecular generation of not-patented pharmaceutical compounds using world open patent data. J Cheminform 2023; 15:120. [PMID: 38093324 PMCID: PMC10716930 DOI: 10.1186/s13321-023-00791-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 12/02/2023] [Indexed: 12/17/2023] Open
Abstract
Developing compounds with novel structures is important for the production of new drugs. From an intellectual perspective, confirming the patent status of newly developed compounds is essential, particularly for pharmaceutical companies. The generation of a large number of compounds has been made possible because of the recent advances in artificial intelligence (AI). However, confirming the patent status of these generated molecules has been a challenge because there are no free and easy-to-use tools that can be used to determine the novelty of the generated compounds in terms of patents in a timely manner; additionally, there are no appropriate reference databases for pharmaceutical patents in the world. In this study, two public databases, SureChEMBL and Google Patents Public Datasets, were used to create a reference database of drug-related patented compounds using international patent classification. An exact structure search system was constructed using InChIKey and a relational database system to rapidly search for compounds in the reference database. Because drug-related patented compounds are a good source for generative AI to learn useful chemical structures, they were used as the training data. Furthermore, molecule generation was successfully directed by increasing and decreasing the number of generated patented compounds through incorporation of patent status (i.e., patented or not) into learning. The use of patent status enabled generation of novel molecules with high drug-likeness. The generation using generative AI with patent information would help efficiently propose novel compounds in terms of pharmaceutical patents. Scientific contribution: In this study, a new molecule-generation method that takes into account the patent status of molecules, which has rarely been considered but is an important feature in drug discovery, was developed. The method enables the generation of novel molecules based on pharmaceutical patents with high drug-likeness and will help in the efficient development of effective drug compounds.
Collapse
Affiliation(s)
- Yugo Shimizu
- HPC- and AI-driven Drug Development Platform Division, RIKEN Center for Computational Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa, 230-0045, Japan
- Division of Physics for Life Functions, Keio University Faculty of Pharmacy, 1-5-30 Shibakoen, Minato-ku, Tokyo, 105-8512, Japan
| | - Masateru Ohta
- HPC- and AI-driven Drug Development Platform Division, RIKEN Center for Computational Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa, 230-0045, Japan
| | - Shoichi Ishida
- Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa, 230-0045, Japan
| | - Kei Terayama
- Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa, 230-0045, Japan
| | - Masanori Osawa
- Division of Physics for Life Functions, Keio University Faculty of Pharmacy, 1-5-30 Shibakoen, Minato-ku, Tokyo, 105-8512, Japan
| | - Teruki Honma
- RIKEN Center for Biosystems Dynamics Research, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa, 230-0045, Japan
| | - Kazuyoshi Ikeda
- HPC- and AI-driven Drug Development Platform Division, RIKEN Center for Computational Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa, 230-0045, Japan.
- Division of Physics for Life Functions, Keio University Faculty of Pharmacy, 1-5-30 Shibakoen, Minato-ku, Tokyo, 105-8512, Japan.
| |
Collapse
|
12
|
Kosonocky CW, Feller AL, Wilke CO, Ellington AD. Using alternative SMILES representations to identify novel functional analogues in chemical similarity vector searches. PATTERNS (NEW YORK, N.Y.) 2023; 4:100865. [PMID: 38106612 PMCID: PMC10724362 DOI: 10.1016/j.patter.2023.100865] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 08/09/2023] [Accepted: 10/06/2023] [Indexed: 12/19/2023]
Abstract
Chemical similarity searches are a widely used family of in silico methods for identifying pharmaceutical leads. These methods historically relied on structure-based comparisons to compute similarity. Here, we use a chemical language model to create a vector-based chemical search. We extend previous implementations by creating a prompt engineering strategy that utilizes two different chemical string representation algorithms: one for the query and the other for the database. We explore this method by reviewing search results from nine queries with diverse targets. We find that the method identifies molecules with similar patent-derived functionality to the query, as determined by our validated LLM-assisted patent summarization pipeline. Further, many of these functionally similar molecules have different structures and scaffolds from the query, making them unlikely to be found with traditional chemical similarity searches. This method may serve as a new tool for the discovery of novel molecular structural classes that achieve target functionality.
Collapse
Affiliation(s)
- Clayton W. Kosonocky
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78705, USA
| | - Aaron L. Feller
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78705, USA
| | - Claus O. Wilke
- Department of Integrative Biology, University of Texas at Austin, Austin, TX 78705, USA
| | - Andrew D. Ellington
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78705, USA
- Center for Systems and Synthetic Biology, University of Texas at Austin, Austin, TX 78705, USA
| |
Collapse
|
13
|
Tran-Nguyen VK, Junaid M, Simeon S, Ballester PJ. A practical guide to machine-learning scoring for structure-based virtual screening. Nat Protoc 2023; 18:3460-3511. [PMID: 37845361 DOI: 10.1038/s41596-023-00885-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 07/03/2023] [Indexed: 10/18/2023]
Abstract
Structure-based virtual screening (SBVS) via docking has been used to discover active molecules for a range of therapeutic targets. Chemical and protein data sets that contain integrated bioactivity information have increased both in number and in size. Artificial intelligence and, more concretely, its machine-learning (ML) branch, including deep learning, have effectively exploited these data sets to build scoring functions (SFs) for SBVS against targets with an atomic-resolution 3D model (e.g., generated by X-ray crystallography or predicted by AlphaFold2). Often outperforming their generic and non-ML counterparts, target-specific ML-based SFs represent the state of the art for SBVS. Here, we present a comprehensive and user-friendly protocol to build and rigorously evaluate these new SFs for SBVS. This protocol is organized into four sections: (i) using a public benchmark of a given target to evaluate an existing generic SF; (ii) preparing experimental data for a target from public repositories; (iii) partitioning data into a training set and a test set for subsequent target-specific ML modeling; and (iv) generating and evaluating target-specific ML SFs by using the prepared training-test partitions. All necessary code and input/output data related to three example targets (acetylcholinesterase, HMG-CoA reductase, and peroxisome proliferator-activated receptor-α) are available at https://github.com/vktrannguyen/MLSF-protocol , can be run by using a single computer within 1 week and make use of easily accessible software/programs (e.g., Smina, CNN-Score, RF-Score-VS and DeepCoy) and web resources. Our aim is to provide practical guidance on how to augment training data to enhance SBVS performance, how to identify the most suitable supervised learning algorithm for a data set, and how to build an SF with the highest likelihood of discovering target-active molecules within a given compound library.
Collapse
Affiliation(s)
| | - Muhammad Junaid
- Centre de Recherche en Cancérologie de Marseille, Marseille, France
| | - Saw Simeon
- Centre de Recherche en Cancérologie de Marseille, Marseille, France
| | | |
Collapse
|
14
|
John L, Nagamani S, Mahanta HJ, Vaikundamani S, Kumar N, Kumar A, Jamir E, Priyadarsinee L, Sastry GN. Molecular Property Diagnostic Suite Compound Library (MPDS-CL): a structure-based classification of the chemical space. Mol Divers 2023:10.1007/s11030-023-10752-1. [PMID: 37902900 DOI: 10.1007/s11030-023-10752-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Accepted: 10/17/2023] [Indexed: 11/01/2023]
Abstract
Molecular Property Diagnostic Suite Compound Library (MPDS-CL) is an open-source Galaxy-based cheminformatics web portal which presents a structure-based classification of the molecules. A structure-based classification of nearly 150 million unique compounds, obtained from 42 publicly available databases and curated for redundancy removal through 97 hierarchically well-defined atom composition-based portions, has been done. These are further subjected to 56-bit fingerprint-based classification algorithm which led to the formation of 56 structurally well-defined classes. The classes thus obtained were further divided into clusters based on their molecular weight. Thus, the entire set of molecules was put into 56 different classes and 625 clusters. This led to the assignment of a unique ID, named as MPDS-AadharID, for each of these 149,169,443 molecules. MPDS-AadharID is akin to the unique number given to citizens in India (similar to SSN in the US and NINO in the UK). The unique features of MPDS-CL are (a) several search options, such as exact structure search, substructure search, property-based search, fingerprint-based search, using SMILES, InChIKey and key-in; (b) automatic generation of information for the processing for MPDS and other galaxy tools; (c) providing the class and cluster of a molecule which makes it easier and fast to search for similar molecules and (d) information related to the presence of the molecules in multiple databases. The MPDS-CL can be accessed at https://mpds.neist.res.in:8086/ .
Collapse
Affiliation(s)
- Lijo John
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Selvaraman Nagamani
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Hridoy Jyoti Mahanta
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - S Vaikundamani
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
| | - Nandan Kumar
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Asheesh Kumar
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
| | - Esther Jamir
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Lipsa Priyadarsinee
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - G Narahari Sastry
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India.
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India.
| |
Collapse
|
15
|
Medina J, White AD. Bloom filters for molecules. J Cheminform 2023; 15:95. [PMID: 37828615 PMCID: PMC10571468 DOI: 10.1186/s13321-023-00765-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 09/25/2023] [Indexed: 10/14/2023] Open
Abstract
Ultra-large chemical libraries are reaching 10s to 100s of billions of molecules. A challenge for these libraries is to efficiently check if a proposed molecule is present. Here we propose and study Bloom filters for testing if a molecule is present in a set using either string or fingerprint representations. Bloom filters are small enough to hold billions of molecules in just a few GB of memory and check membership in sub milliseconds. We found string representations can have a false positive rate below 1% and require significantly less storage than using fingerprints. Canonical SMILES with Bloom filters with the simple FNV (Fowler-Noll-Voll) hashing function provide fast and accurate membership tests with small memory requirements. We provide a general implementation and specific filters for detecting if a molecule is purchasable, patented, or a natural product according to existing databases at https://github.com/whitead/molbloom .
Collapse
Affiliation(s)
- Jorge Medina
- Department of Chemical Engineering, University of Rochester, Rochester, NY, USA
| | - Andrew D White
- Department of Chemical Engineering, University of Rochester, Rochester, NY, USA.
| |
Collapse
|
16
|
Takács G, Havasi D, Sándor M, Dohánics Z, Balogh GT, Kiss R. DIY Virtual Chemical Libraries - Novel Starting Points for Drug Discovery. ACS Med Chem Lett 2023; 14:1188-1197. [PMID: 37736187 PMCID: PMC10510501 DOI: 10.1021/acsmedchemlett.3c00146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 08/28/2023] [Indexed: 09/23/2023] Open
Abstract
The advancement of in silico technologies such as library enumeration and synthetic feasibility prediction has made drug discovery pipelines rely more and more on virtual libraries, which provide a significantly larger pool of compounds than in-stock supplier catalogs. Virtual libraries from external sources, however, may be associated with long delivery time and high cost. In this study, we present a Do-It-Yourself (DIY) combinatorial chemistry library containing over 14 million almost completely novel products built from 1000 low-cost building blocks based on robust reactions frequently applied at medicinal chemistry laboratories. The applicability of the DIY library for various drug discovery approaches is demonstrated by extensive physicochemical property, structural diversity profiling, and the generation of focused libraries. We found that internally built DIY chemical libraries present a viable alternative of external virtual catalogs by providing access to a large number of low-cost and quickly accessible potential chemical starting points for drug discovery.
Collapse
Affiliation(s)
- Gergely Takács
- Department
of Chemical and Environmental Process Engineering, Faculty of Chemical
Technology and Biotechnology, Budapest University
of Technology and Economics, Műegyetem rakpart 3, Budapest 1111, Hungary
- Mcule.com
Kft, Bartók Béla
út 105-113, Budapest 1115, Hungary
| | - Dávid Havasi
- Department
of Chemical and Environmental Process Engineering, Faculty of Chemical
Technology and Biotechnology, Budapest University
of Technology and Economics, Műegyetem rakpart 3, Budapest 1111, Hungary
- Mcule.com
Kft, Bartók Béla
út 105-113, Budapest 1115, Hungary
| | - Márk Sándor
- Mcule.com
Kft, Bartók Béla
út 105-113, Budapest 1115, Hungary
| | - Zsolt Dohánics
- Mcule.com
Kft, Bartók Béla
út 105-113, Budapest 1115, Hungary
| | - György T. Balogh
- Department
of Chemical and Environmental Process Engineering, Faculty of Chemical
Technology and Biotechnology, Budapest University
of Technology and Economics, Műegyetem rakpart 3, Budapest 1111, Hungary
- Department
of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Semmelweis University, Hőgyes Endre utca 7-9, Budapest 1092, Hungary
| | - Róbert Kiss
- Mcule.com
Kft, Bartók Béla
út 105-113, Budapest 1115, Hungary
| |
Collapse
|
17
|
Clyde A, Liu X, Brettin T, Yoo H, Partin A, Babuji Y, Blaiszik B, Mohd-Yusof J, Merzky A, Turilli M, Jha S, Ramanathan A, Stevens R. AI-accelerated protein-ligand docking for SARS-CoV-2 is 100-fold faster with no significant change in detection. Sci Rep 2023; 13:2105. [PMID: 36747041 PMCID: PMC9901402 DOI: 10.1038/s41598-023-28785-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 01/24/2023] [Indexed: 02/08/2023] Open
Abstract
Protein-ligand docking is a computational method for identifying drug leads. The method is capable of narrowing a vast library of compounds down to a tractable size for downstream simulation or experimental testing and is widely used in drug discovery. While there has been progress in accelerating scoring of compounds with artificial intelligence, few works have bridged these successes back to the virtual screening community in terms of utility and forward-looking development. We demonstrate the power of high-speed ML models by scoring 1 billion molecules in under a day (50 k predictions per GPU seconds). We showcase a workflow for docking utilizing surrogate AI-based models as a pre-filter to a standard docking workflow. Our workflow is ten times faster at screening a library of compounds than the standard technique, with an error rate less than 0.01% of detecting the underlying best scoring 0.1% of compounds. Our analysis of the speedup explains that another order of magnitude speedup must come from model accuracy rather than computing speed. In order to drive another order of magnitude of acceleration, we share a benchmark dataset consisting of 200 million 3D complex structures and 2D structure scores across a consistent set of 13 million "in-stock" molecules over 15 receptors, or binding sites, across the SARS-CoV-2 proteome. We believe this is strong evidence for the community to begin focusing on improving the accuracy of surrogate models to improve the ability to screen massive compound libraries 100 × or even 1000 × faster than current techniques and reduce missing top hits. The technique outlined aims to be a fast drop-in replacement for docking for screening billion-scale molecular libraries.
Collapse
Affiliation(s)
- Austin Clyde
- Argonne National Laboratory, Data Science and Learning Division, Chicago, Lemont, 60439, USA.
- Department of Computer Science, University of Chicago, Chicago, 60637, USA.
| | - Xuefeng Liu
- Department of Computer Science, University of Chicago, Chicago, 60637, USA
| | - Thomas Brettin
- Department of Computer Science, University of Chicago, Chicago, 60637, USA
- Argonne National Laboratory, Computing, Environment, and Life Sciences Directorate, Lemont, 60439, USA
| | - Hyunseung Yoo
- Argonne National Laboratory, Data Science and Learning Division, Chicago, Lemont, 60439, USA
| | - Alexander Partin
- Argonne National Laboratory, Data Science and Learning Division, Chicago, Lemont, 60439, USA
| | - Yadu Babuji
- Department of Computer Science, University of Chicago, Chicago, 60637, USA
| | - Ben Blaiszik
- Argonne National Laboratory, Data Science and Learning Division, Chicago, Lemont, 60439, USA
- University of Chicago, Globus, Chicago, 60637, USA
| | - Jamaludin Mohd-Yusof
- Los Alamos National Laboratory, Computer, Computational, and Statistical Sciences, Los Alamos, 87545, USA
| | - Andre Merzky
- Department of Electrical and Computer Engineering, Rutgers University, Piscataway, 08854, USA
- Brookhaven National Laboratory, Computational Sciences Initiative, Upton, 11973, USA
| | - Matteo Turilli
- Department of Electrical and Computer Engineering, Rutgers University, Piscataway, 08854, USA
- Brookhaven National Laboratory, Computational Sciences Initiative, Upton, 11973, USA
| | - Shantenu Jha
- Department of Electrical and Computer Engineering, Rutgers University, Piscataway, 08854, USA
- Brookhaven National Laboratory, Computational Sciences Initiative, Upton, 11973, USA
| | - Arvind Ramanathan
- Argonne National Laboratory, Data Science and Learning Division, Chicago, Lemont, 60439, USA
| | - Rick Stevens
- Department of Computer Science, University of Chicago, Chicago, 60637, USA
- Argonne National Laboratory, Computing, Environment, and Life Sciences Directorate, Lemont, 60439, USA
| |
Collapse
|
18
|
Gadiya Y, Zaliani A, Gribbon P, Hofmann-Apitius M. PEMT: a patent enrichment tool for drug discovery. Bioinformatics 2023; 39:btac716. [PMID: 36322820 PMCID: PMC9805556 DOI: 10.1093/bioinformatics/btac716] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 10/10/2022] [Accepted: 11/01/2022] [Indexed: 11/11/2022] Open
Abstract
MOTIVATION Drug discovery practitioners in industry and academia use semantic tools to extract information from online scientific literature to generate new insights into targets, therapeutics and diseases. However, due to complexities in access and analysis, patent-based literature is often overlooked as a source of information. As drug discovery is a highly competitive field, naturally, tools that tap into patent literature can provide any actor in the field an advantage in terms of better informed decision-making. Hence, we aim to facilitate access to patent literature through the creation of an automatic tool for extracting information from patents described in existing public resources. RESULTS Here, we present PEMT, a novel patent enrichment tool, that takes advantage of public databases like ChEMBL and SureChEMBL to extract relevant patent information linked to chemical structures and/or gene names described through FAIR principles and metadata annotations. PEMT aims at supporting drug discovery and research by establishing a patent landscape around genes of interest. The pharmaceutical focus of the tool is mainly due to the subselection of International Patent Classification codes, but in principle, it can be used for other patent fields, provided that a link between a concept and chemical structure is investigated. Finally, we demonstrate a use-case in rare diseases by generating a gene-patent list based on the epidemiological prevalence of these diseases and exploring their underlying patent landscapes. AVAILABILITY AND IMPLEMENTATION PEMT is an open-source Python tool and its source code and PyPi package are available at https://github.com/Fraunhofer-ITMP/PEMT and https://pypi.org/project/PEMT/, respectively. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yojana Gadiya
- Fraunhofer Institute for Translational Medicine and Pharmacology (ITMP), Hamburg 22525, Germany
- Fraunhofer Cluster of Excellence for Immune-Mediated Diseases (CIMD), Frankfurt 60590, Germany
| | - Andrea Zaliani
- Fraunhofer Institute for Translational Medicine and Pharmacology (ITMP), Hamburg 22525, Germany
- Fraunhofer Cluster of Excellence for Immune-Mediated Diseases (CIMD), Frankfurt 60590, Germany
| | - Philip Gribbon
- Fraunhofer Institute for Translational Medicine and Pharmacology (ITMP), Hamburg 22525, Germany
- Fraunhofer Cluster of Excellence for Immune-Mediated Diseases (CIMD), Frankfurt 60590, Germany
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin 53754, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, Bonn 53113, Germany
| |
Collapse
|
19
|
Magariños MP, Gaulton A, Félix E, Kiziloren T, Arcila R, Oprea TI, Leach AR. Illuminating the druggable genome through patent bioactivity data. PeerJ 2023; 11:e15153. [PMID: 37151295 PMCID: PMC10162037 DOI: 10.7717/peerj.15153] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Accepted: 03/10/2023] [Indexed: 05/09/2023] Open
Abstract
The patent literature is a potentially valuable source of bioactivity data. In this article we describe a process to prioritise 3.7 million life science relevant patents obtained from the SureChEMBL database (https://www.surechembl.org/), according to how likely they were to contain bioactivity data for potent small molecules on less-studied targets, based on the classification developed by the Illuminating the Druggable Genome (IDG) project. The overall goal was to select a smaller number of patents that could be manually curated and incorporated into the ChEMBL database. Using relatively simple annotation and filtering pipelines, we have been able to identify a substantial number of patents containing quantitative bioactivity data for understudied targets that had not previously been reported in the peer-reviewed medicinal chemistry literature. We quantify the added value of such methods in terms of the numbers of targets that are so identified, and provide some specific illustrative examples. Our work underlines the potential value in searching the patent corpus in addition to the more traditional peer-reviewed literature. The small molecules found in these patents, together with their measured activity against the targets, are now accessible via the ChEMBL database.
Collapse
Affiliation(s)
| | - Anna Gaulton
- EMBL-EBI, Hinxton, United Kingdom
- Exscientia, Oxford, United Kingdom
| | | | | | | | - Tudor I. Oprea
- Translational informatics Division, Department of Internal Medicine, School of Medicine, University of New Mexico, Albuquerque, United States
| | | |
Collapse
|
20
|
Ahmad I, Kuznetsov AE, Pirzada AS, Alsharif KF, Daglia M, Khan H. Computational pharmacology and computational chemistry of 4-hydroxyisoleucine: Physicochemical, pharmacokinetic, and DFT-based approaches. Front Chem 2023; 11:1145974. [PMID: 37123881 PMCID: PMC10133580 DOI: 10.3389/fchem.2023.1145974] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 03/21/2023] [Indexed: 05/02/2023] Open
Abstract
Computational pharmacology and chemistry of drug-like properties along with pharmacokinetic studies have made it more amenable to decide or predict a potential drug candidate. 4-Hydroxyisoleucine is a pharmacologically active natural product with prominent antidiabetic properties. In this study, ADMETLab 2.0 was used to determine its important drug-related properties. 4-Hydroxyisoleucine is compliant with important drug-like physicochemical properties and pharma giants' drug-ability rules like Lipinski's, Pfizer, and GlaxoSmithKline (GSK) rules. Pharmacokinetically, it has been predicted to have satisfactory cell permeability. Blood-brain barrier permeation may add central nervous system (CNS) effects, while a very slight probability of being CYP2C9 substrate exists. None of the well-known toxicities were predicted in silico, being congruent with wet lab results, except for a "very slight risk" for respiratory toxicity predicted. The molecule is non ecotoxic as analyzed with common indicators such as bioconcentration and LC50 for fathead minnow and daphnia magna. The toxicity parameters identified 4-hydroxyisoleucine as non-toxic to androgen receptors, PPAR-γ, mitochondrial membrane receptor, heat shock element, and p53. However, out of seven parameters, not even a single toxicophore was found. The density functional theory (DFT) study provided support to the findings obtained from drug-like property predictions. Hence, it is a very logical approach to proceed further with a detailed pharmacokinetics and drug development process for 4-hydroxyisoleucine.
Collapse
Affiliation(s)
- Imad Ahmad
- Department of Pharmacy, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Aleksey E. Kuznetsov
- Department of Chemistry, Universidad Tecnica Federico Santa Maria, Santiago, Chile
| | | | - Khalaf F. Alsharif
- Department of Clinical Laboratory, College of Applied Medical Science, Taif University, Taif, Saudi Arabia
| | - Maria Daglia
- Department of Pharmacy, University of Naples Federico II, Naples, Italy
- International Research Centre for Food Nutrition and Safety, Jiangsu University, Zhenjiang, China
| | - Haroon Khan
- Department of Pharmacy, Abdul Wali Khan University Mardan, Mardan, Pakistan
- *Correspondence: Haroon Khan,
| |
Collapse
|
21
|
Jama M, Ahmed M, Jutla A, Wiethan C, Kumar J, Moon TC, West F, Overduin M, Barakat KH. Discovery of allosteric SHP2 inhibitors through ensemble-based consensus molecular docking, endpoint and absolute binding free energy calculations. Comput Biol Med 2023; 152:106442. [PMID: 36566625 DOI: 10.1016/j.compbiomed.2022.106442] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Revised: 12/05/2022] [Accepted: 12/15/2022] [Indexed: 12/24/2022]
Abstract
SHP2 (Src homology-2 domain-containing protein tyrosine phosphatase-2) is a cytoplasmic protein -tyrosine phosphatase encoded by the gene PTPN11. It plays a crucial role in regulating cell growth and differentiation. Specifically, SHP2 is an oncoprotein associated with developmental pathologies and several different cancer types, including gastric, leukemia and breast cancer and is of great therapeutic interest. Given these roles, current research efforts have focused on developing SHP2 inhibitors. Allosteric SHP2 inhibitors have been shown to be more selective and pharmacologically appealing compared to competitive catalytic inhibitors targeting SHP2. Nevertheless, there remains a need for novel allosteric inhibitor scaffolds targeting SHP2 to develop compounds with improved selectivity, cell permeability, and bioavailability. Towards this goal, this study applied various computational tools to screen over 6 million compounds against the allosteric site within SHP2. The top-ranked hits from our in-silico screening were validated using protein thermal shift and biolayer interferometry assays, revealing three potent compounds. Kinetic binding assays were employed to measure the binding affinities of the top-ranked compounds and demonstrated that they all bind to SHP2 with a nanomolar affinity. Hence the compounds and the computational workflow described herein provide an effective approach for identifying and designing a generation of improved allosteric inhibitors of SHP2.
Collapse
Affiliation(s)
- Maryam Jama
- Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Canada
| | - Marawan Ahmed
- Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Canada
| | - Anna Jutla
- Department of Biochemistry, Faculty of Medicine and Dentistry, University of Alberta, Canada
| | | | - Jitendra Kumar
- Department of Biochemistry, Faculty of Medicine and Dentistry, University of Alberta, Canada
| | - Tae Chul Moon
- Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Canada
| | - Frederick West
- Department of Chemistry, University of Alberta, Canada; Department of Oncology and Cancer Research Institute of Northern Alberta, University of Alberta, Canada
| | - Michael Overduin
- Department of Biochemistry, Faculty of Medicine and Dentistry, University of Alberta, Canada
| | - Khaled H Barakat
- Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Canada; Li Ka Shing Institute of Virology, University of Alberta, Canada.
| |
Collapse
|
22
|
Ciray F, Doğan T. Machine learning-based prediction of drug approvals using molecular, physicochemical, clinical trial, and patent-related features. Expert Opin Drug Discov 2022; 17:1425-1441. [PMID: 36444655 DOI: 10.1080/17460441.2023.2153830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
BACKGROUND Drug development productivity has been declining lately due to elevated costs and reduced discovery rates. Therefore, pharmaceutical companies have been seeking alternative ways to determine and evaluate drug candidates. RESEARCH DESIGN AND METHODS In this work, we proposed a new computational approach to directly predict the regulatory approval of drug candidates, and implemented it as a method called 'DrugApp.' To accomplish this task, we employed multiple types of features including molecular and physicochemical properties of drug candidates, together with clinical trial and patent-related features, which are then processed by random forest classifiers to train our disease group-specific approval prediction models. RESULTS Our evaluations indicated DrugApp has a high and robust prediction performance. Within a use-case study, we showed our method can predict phase IV trial drugs that are later withdrawn from the market due to severe side effects. Finally, we used DrugApp models to forecast the approval of drug candidates that are currently in phases I/II/III of clinical trials. CONCLUSIONS We hope that our study will aid the research community in terms of evaluating and improving the process of drug development. The datasets, source code, results, and pre-trained models of DrugApp are freely available at https://github.com/HUBioDataLab/DrugApp.
Collapse
Affiliation(s)
- Fulya Ciray
- Biological Data Science Laboratory, Department of Computer Engineering, Hacettepe University, Ankara, Turkey.,Department of Health Informatics, Graduate School of Informatics, METU, Ankara, Turkey
| | - Tunca Doğan
- Biological Data Science Laboratory, Department of Computer Engineering, Hacettepe University, Ankara, Turkey.,Department of Health Informatics, Institute of Informatics, Hacettepe University, Ankara, Turkey.,Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey
| |
Collapse
|
23
|
Lim S, Lee S, Piao Y, Choi M, Bang D, Gu J, Kim S. On modeling and utilizing chemical compound information with deep learning technologies: A task-oriented approach. Comput Struct Biotechnol J 2022; 20:4288-4304. [PMID: 36051875 PMCID: PMC9399946 DOI: 10.1016/j.csbj.2022.07.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 07/29/2022] [Accepted: 07/29/2022] [Indexed: 11/22/2022] Open
Abstract
A large number of chemical compounds are available in databases such as PubChem and ZINC. However, currently known compounds, though large, represent only a fraction of possible compounds, which is known as chemical space. Many of these compounds in the databases are annotated with properties and assay data that can be used for drug discovery efforts. For this goal, a number of machine learning algorithms have been developed and recent deep learning technologies can be effectively used to navigate chemical space, especially for unknown chemical compounds, in terms of drug-related tasks. In this article, we survey how deep learning technologies can model and utilize chemical compound information in a task-oriented way by exploiting annotated properties and assay data in the chemical compounds databases. We first compile what kind of tasks are trying to be accomplished by machine learning methods. Then, we survey deep learning technologies to show their modeling power and current applications for accomplishing drug related tasks. Next, we survey deep learning techniques to address the insufficiency issue of annotated data for more effective navigation of chemical space. Chemical compound information alone may not be powerful enough for drug related tasks, thus we survey what kind of information, such as assay and gene expression data, can be used to improve the prediction power of deep learning models. Finally, we conclude this survey with four important newly developed technologies that are yet to be fully incorporated into computational analysis of chemical information.
Collapse
Affiliation(s)
- Sangsoo Lim
- Bioinformatics Institute, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Sangseon Lee
- Institute of Computer Technology, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Yinhua Piao
- Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - MinGyu Choi
- Department of Chemistry, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
- AIGENDRUG Co., Ltd., Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Dongmin Bang
- Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Jeonghyeon Gu
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| | - Sun Kim
- Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
- MOGAM Institute for Biomedical Research, Yong-in 16924, South Korea
- AIGENDRUG Co., Ltd., Gwanak-ro 1, Gwanak-gu, Seoul 08826, South Korea
| |
Collapse
|
24
|
Artificial intelligence and machine-learning approaches in structure and ligand-based discovery of drugs affecting central nervous system. Mol Divers 2022; 27:959-985. [PMID: 35819579 DOI: 10.1007/s11030-022-10489-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 06/21/2022] [Indexed: 12/11/2022]
Abstract
CNS disorders are indications with a very high unmet medical needs, relatively smaller number of available drugs, and a subpar satisfaction level among patients and caregiver. Discovery of CNS drugs is extremely expensive affair with its own unique challenges leading to extremely high attrition rates and low efficiency. With explosion of data in information age, there is hardly any aspect of life that has not been touched by data driven technologies such as artificial intelligence (AI) and machine learning (ML). Drug discovery is no exception, emergence of big data via genomic, proteomic, biological, and chemical technologies has driven pharmaceutical giants to collaborate with AI oriented companies to revolutionise drug discovery, with the goal of increasing the efficiency of the process. In recent years many examples of innovative applications of AI and ML techniques in CNS drug discovery has been reported. Research on therapeutics for diseases such as schizophrenia, Alzheimer's and Parkinsonism has been provided with a new direction and thrust from these developments. AI and ML has been applied to both ligand-based and structure-based drug discovery and design of CNS therapeutics. In this review, we have summarised the general aspects of AI and ML from the perspective of drug discovery followed by a comprehensive coverage of the recent developments in the applications of AI/ML techniques in CNS drug discovery.
Collapse
|
25
|
Shearer J, Castro JL, Lawson ADG, MacCoss M, Taylor RD. Rings in Clinical Trials and Drugs: Present and Future. J Med Chem 2022; 65:8699-8712. [PMID: 35730680 PMCID: PMC9289879 DOI: 10.1021/acs.jmedchem.2c00473] [Citation(s) in RCA: 79] [Impact Index Per Article: 39.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We present a comprehensive analysis of all ring systems (both heterocyclic and nonheterocyclic) in clinical trial compounds and FDA-approved drugs. We show 67% of small molecules in clinical trials comprise only ring systems found in marketed drugs, which mirrors previously published findings for newly approved drugs. We also show there are approximately 450 000 unique ring systems derived from 2.24 billion molecules currently available in synthesized chemical space, and molecules in clinical trials utilize only 0.1% of this available pool. Moreover, there are fewer ring systems in drugs compared with those in clinical trials, but this is balanced by the drug ring systems being reused more often. Furthermore, systematic changes of up to two atoms on existing drug and clinical trial ring systems give a set of 3902 future clinical trial ring systems, which are predicted to cover approximately 50% of the novel ring systems entering clinical trials.
Collapse
Affiliation(s)
| | | | | | - Malcolm MacCoss
- Bohicket Pharma Consulting Limited Liability Company, 2556 Seabrook Island Road, Seabrook Island, South Carolina29455, United States
| | | |
Collapse
|
26
|
Kojima E, Iimuro A, Nakajima M, Kinuta H, Asada N, Sako Y, Nakata Z, Uemura K, Arita S, Miki S, Wakasa-Morimoto C, Tachibana Y. Pocket-to-Lead: Structure-Based De Novo Design of Novel Non-peptidic HIV-1 Protease Inhibitors Using the Ligand Binding Pocket as a Template. J Med Chem 2022; 65:6157-6170. [PMID: 35416651 DOI: 10.1021/acs.jmedchem.1c02217] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A novel strategy for lead identification that we have dubbed the "Pocket-to-Lead" strategy is demonstrated using HIV-1 protease as a model target. Sometimes, it is difficult to obtain hit compounds because of the difficulties in satisfying the complex pharmacophoric features. In this study, a virtual fragment hit which does not match all of the pharmacophore features but has key interactions and vectors that could grow into remaining pharmacophore features was optimized in silico. The designed compound 9 demonstrated weak but evident inhibitory activity (IC50 = 54 μM), and the design concept was proven by the co-crystal structure. Then, structure-based drug design promptly gave compound 14 (IC50 = 0.0071 μM, EC50 = 0.86 μM), an almost 10,000-fold improvement in activity from 9. The structure of the designed molecules proved to be novel with high synthetic feasibility, indicating the usefulness of this strategy to tackle tough targets with complex pharmacophore.
Collapse
Affiliation(s)
- Eiichi Kojima
- Shionogi Pharmaceutical Research Center, 3-1-1 Futaba-cho, Toyonaka, Osaka 561-0825, Japan
| | - Atsuhiro Iimuro
- Shionogi Pharmaceutical Research Center, 3-1-1 Futaba-cho, Toyonaka, Osaka 561-0825, Japan
| | - Mado Nakajima
- Shionogi Pharmaceutical Research Center, 3-1-1 Futaba-cho, Toyonaka, Osaka 561-0825, Japan
| | - Hirotaka Kinuta
- Shionogi Pharmaceutical Research Center, 3-1-1 Futaba-cho, Toyonaka, Osaka 561-0825, Japan
| | - Naoya Asada
- Shionogi Pharmaceutical Research Center, 3-1-1 Futaba-cho, Toyonaka, Osaka 561-0825, Japan
| | - Yusuke Sako
- Shionogi Pharmaceutical Research Center, 3-1-1 Futaba-cho, Toyonaka, Osaka 561-0825, Japan
| | - Zenzaburo Nakata
- Shionogi Pharmaceutical Research Center, 3-1-1 Futaba-cho, Toyonaka, Osaka 561-0825, Japan
| | - Kentaro Uemura
- Shionogi Pharmaceutical Research Center, 3-1-1 Futaba-cho, Toyonaka, Osaka 561-0825, Japan
| | - Shuhei Arita
- Shionogi Pharmaceutical Research Center, 3-1-1 Futaba-cho, Toyonaka, Osaka 561-0825, Japan
| | - Shinobu Miki
- Shionogi Pharmaceutical Research Center, 3-1-1 Futaba-cho, Toyonaka, Osaka 561-0825, Japan
| | - Chiaki Wakasa-Morimoto
- Shionogi Pharmaceutical Research Center, 3-1-1 Futaba-cho, Toyonaka, Osaka 561-0825, Japan
| | - Yuki Tachibana
- Shionogi Pharmaceutical Research Center, 3-1-1 Futaba-cho, Toyonaka, Osaka 561-0825, Japan
| |
Collapse
|
27
|
Wang PH, Chen JH, Tseng YJ. Intelligent pharmaceutical patent search on a near-term gate-based quantum computer. Sci Rep 2022; 12:175. [PMID: 34997034 PMCID: PMC8742058 DOI: 10.1038/s41598-021-04031-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 12/14/2021] [Indexed: 12/03/2022] Open
Abstract
Pharmaceutical patent analysis is the key to product protection for pharmaceutical companies. In patent claims, a Markush structure is a standard chemical structure drawing with variable substituents. Overlaps between apparently dissimilar Markush structures are nearly unrecognizable when the structures span a broad chemical space. We propose a quantum search-based method which performs an exact comparison between two non-enumerated Markush structures with a constraint satisfaction oracle. The quantum circuit is verified with a quantum simulator and the real effect of noise is estimated using a five-qubit superconductivity-based IBM quantum computer. The possibilities of measuring the correct states can be increased by improving the connectivity of the most computation intensive qubits. Depolarizing error is the most influential error. The quantum method to exactly compares two patents is hard to simulate classically and thus creates a quantum advantage in patent analysis.
Collapse
Affiliation(s)
- Pei-Hua Wang
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, No. 1 Sec. 4, Roosevelt Road, Taipei, 106, Taiwan
| | - Jen-Hao Chen
- Department of Computer Science and Information Engineering, National Taiwan University, No. 1 Sec. 4, Roosevelt Road, Taipei, 106, Taiwan.,Chunghwa Telecom Co., Ltd, Taipei, 106, Taiwan
| | - Yufeng Jane Tseng
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, No. 1 Sec. 4, Roosevelt Road, Taipei, 106, Taiwan. .,Department of Computer Science and Information Engineering, National Taiwan University, No. 1 Sec. 4, Roosevelt Road, Taipei, 106, Taiwan.
| |
Collapse
|
28
|
Choi J, Lee J. V-Dock: Fast Generation of Novel Drug-like Molecules Using Machine-Learning-Based Docking Score and Molecular Optimization. Int J Mol Sci 2021; 22:11635. [PMID: 34769065 PMCID: PMC8584000 DOI: 10.3390/ijms222111635] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 10/13/2021] [Accepted: 10/24/2021] [Indexed: 02/06/2023] Open
Abstract
We propose a computational workflow to design novel drug-like molecules by combining the global optimization of molecular properties and protein-ligand docking with machine learning. However, most existing methods depend heavily on experimental data, and many targets do not have sufficient data to train reliable activity prediction models. To overcome this limitation, protein-ligand docking calculations must be performed using the limited data available. Such docking calculations during molecular generation require considerable computational time, preventing extensive exploration of the chemical space. To address this problem, we trained a machine-learning-based model that predicted the docking energy using SMILES to accelerate the molecular generation process. Docking scores could be accurately predicted using only a SMILES string. We combined this docking score prediction model with the global molecular property optimization approach, MolFinder, to find novel molecules exhibiting the desired properties with high values of predicted docking scores. We named this design approach V-dock. Using V-dock, we efficiently generated many novel molecules with high docking scores for a target protein, a similarity to the reference molecule, and desirable drug-like and bespoke properties, such as QED. The predicted docking scores of the generated molecules were verified by correlating them with the actual docking scores.
Collapse
Affiliation(s)
- Jieun Choi
- Department of Chemistry, Division of Chemistry and Biochemistry, Kangwon National University, Chuncheon 24341, Korea;
| | - Juyong Lee
- Department of Chemistry, Division of Chemistry and Biochemistry, Kangwon National University, Chuncheon 24341, Korea;
- Arontier Co., Seoul 06735, Korea
| |
Collapse
|
29
|
Ohms J. Current methodologies for chemical compound searching in patents: A case study. WORLD PATENT INFORMATION 2021. [DOI: 10.1016/j.wpi.2021.102055] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
30
|
Congenericity of Claimed Compounds in Patent Applications. Molecules 2021; 26:molecules26175253. [PMID: 34500686 PMCID: PMC8433967 DOI: 10.3390/molecules26175253] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 08/17/2021] [Accepted: 08/18/2021] [Indexed: 12/04/2022] Open
Abstract
A method is presented to analyze quantitatively the degree of congenericity of claimed compounds in patent applications. The approach successfully differentiates patents exemplified with highly congeneric compounds of a structurally compact and well defined chemical series from patents containing a more diverse set of compounds around a more vaguely described patent claim. An application to 750 common patents available in SureChEMBL, SureChEMBLccs and ChEMBL is presented and the congenericity of patent compounds in those different sources discussed.
Collapse
|
31
|
Grisoni F, Huisman BJH, Button AL, Moret M, Atz K, Merk D, Schneider G. Combining generative artificial intelligence and on-chip synthesis for de novo drug design. SCIENCE ADVANCES 2021; 7:eabg3338. [PMID: 34117066 PMCID: PMC8195470 DOI: 10.1126/sciadv.abg3338] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2020] [Accepted: 04/23/2021] [Indexed: 05/24/2023]
Abstract
Automating the molecular design-make-test-analyze cycle accelerates hit and lead finding for drug discovery. Using deep learning for molecular design and a microfluidics platform for on-chip chemical synthesis, liver X receptor (LXR) agonists were generated from scratch. The computational pipeline was tuned to explore the chemical space of known LXRα agonists and generate novel molecular candidates. To ensure compatibility with automated on-chip synthesis, the chemical space was confined to the virtual products obtainable from 17 one-step reactions. Twenty-five de novo designs were successfully synthesized in flow. In vitro screening of the crude reaction products revealed 17 (68%) hits, with up to 60-fold LXR activation. The batch resynthesis, purification, and retesting of 14 of these compounds confirmed that 12 of them were potent LXR agonists. These results support the suitability of the proposed design-make-test-analyze framework as a blueprint for automated drug design with artificial intelligence and miniaturized bench-top synthesis.
Collapse
Affiliation(s)
- Francesca Grisoni
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland.
- Eindhoven University of Technology, Department of Biomedical Engineering, Eindhoven, Netherlands
| | - Berend J H Huisman
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland
| | - Alexander L Button
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland
- University of Lausanne, Department of Computational Biology, Lausanne, Switzerland
| | - Michael Moret
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland
| | - Kenneth Atz
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland
| | - Daniel Merk
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland.
- Goethe University Frankfurt, Institute of Pharmaceutical Chemistry, Frankfurt, Germany
| | - Gisbert Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland.
- ETH Singapore SEC Ltd, Singapore, Singapore
| |
Collapse
|
32
|
Active Learning and the Potential of Neural Networks Accelerate Molecular Screening for the Design of a New Molecule Effective against SARS-CoV-2. BIOMED RESEARCH INTERNATIONAL 2021; 2021:6696012. [PMID: 34124259 PMCID: PMC8172298 DOI: 10.1155/2021/6696012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2020] [Revised: 05/07/2021] [Accepted: 05/15/2021] [Indexed: 12/04/2022]
Abstract
A global pandemic has emerged following the appearance of the new severe acute respiratory virus whose official name is the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), strongly affecting the health sector as well as the world economy. Indeed, following the emergence of this new virus, despite the existence of a few approved and known effective vaccines at the time of writing this original study, a sense of urgency has emerged worldwide to discover new technical tools and new drugs as soon as possible. In this context, many studies and researches are currently underway to develop new tools and therapies against SARS CoV-2 and other viruses, using different approaches. The 3-chymotrypsin (3CL) protease, which is directly involved in the cotranslational and posttranslational modifications of viral polyproteins essential for the existence and replication of the virus in the host, is one of the coronavirus target proteins that has been the subject of these extensive studies. Currently, the majority of these studies are aimed at repurposing already known and clinically approved drugs against this new virus, but this approach is not really successful. Recently, different studies have successfully demonstrated the effectiveness of artificial intelligence-based techniques to understand existing chemical spaces and generate new small molecules that are both effective and efficient. In this framework and for our study, we combined a generative recurrent neural network model with transfer learning methods and active learning-based algorithms to design novel small molecules capable of effectively inhibiting the 3CL protease in human cells. We then analyze these small molecules to find the correct binding site that matches the structure of the 3CL protease of our target virus as well as other analyses performed in this study. Based on these screening results, some molecules have achieved a good binding score close to -18 kcal/mol, which we can consider as good potential candidates for further synthesis and testing against SARS-CoV-2.
Collapse
|
33
|
Falaguera MJ, Mestres J. Identification of the Core Chemical Structure in SureChEMBL Patents. J Chem Inf Model 2021; 61:2241-2247. [PMID: 33929850 DOI: 10.1021/acs.jcim.1c00151] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The SureChEMBL database provides open access to 17 million chemical entities mentioned in 14 million patents published since 1970. However, alongside with molecules covered by patent claims, the database is full of starting materials and intermediate products of little pharmacological relevance. Herein, we introduce a new filtering protocol to automatically select the core chemical structures best representing a congeneric series of pharmacologically relevant molecules in patents. The protocol is first validated against a selection of 890 SureChEMBL patents for which a total of 51,738 manually curated molecules are deposited in ChEMBL. Our protocol was able to select 92.5% of the molecules in ChEMBL from all 270,968 molecules in SureChEMBL for those patents. Subsequently, the protocol was applied to all 240,988 US pharmacological patents for which 9,111,706 molecules are available in SureChEMBL. The unsupervised filtering process selected 5,949,214 molecules (65.3% of the total number of molecules) that form highly congeneric chemical series in 188,795 of those patents (78.3% of the total number of patents). A SureChEMBL version enriched with molecules of pharmacological relevance is available for download at https://ftp.ebi.ac.uk/pub/databases/chembl/SureChEMBLccs.
Collapse
Affiliation(s)
- Maria J Falaguera
- Research Group on Systems Pharmacology, Research Program on Biomedical Informatics (GRIB), IMIM Hospital del Mar Medical Research Institute and University Pompeu Fabra, Parc de Recerca Biomèdica (PRBB), Doctor Aiguader 88, 08003 Barcelona, Catalonia, Spain
| | - Jordi Mestres
- Research Group on Systems Pharmacology, Research Program on Biomedical Informatics (GRIB), IMIM Hospital del Mar Medical Research Institute and University Pompeu Fabra, Parc de Recerca Biomèdica (PRBB), Doctor Aiguader 88, 08003 Barcelona, Catalonia, Spain
| |
Collapse
|
34
|
Yang ZY, Yang ZJ, Zhao Y, Yin MZ, Lu AP, Chen X, Liu S, Hou TJ, Cao DS. PySmash: Python package and individual executable program for representative substructure generation and application. Brief Bioinform 2021; 22:6168498. [PMID: 33709154 DOI: 10.1093/bib/bbab017] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2020] [Revised: 01/06/2021] [Accepted: 01/12/2021] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Substructure screening is widely applied to evaluate the molecular potency and ADMET properties of compounds in drug discovery pipelines, and it can also be used to interpret QSAR models for the design of new compounds with desirable physicochemical and biological properties. With the continuous accumulation of more experimental data, data-driven computational systems which can derive representative substructures from large chemical libraries attract more attention. Therefore, the development of an integrated and convenient tool to generate and implement representative substructures is urgently needed. RESULTS In this study, PySmash, a user-friendly and powerful tool to generate different types of representative substructures, was developed. The current version of PySmash provides both a Python package and an individual executable program, which achieves ease of operation and pipeline integration. Three types of substructure generation algorithms, including circular, path-based and functional group-based algorithms, are provided. Users can conveniently customize their own requirements for substructure size, accuracy and coverage, statistical significance and parallel computation during execution. Besides, PySmash provides the function for external data screening. CONCLUSION PySmash, a user-friendly and integrated tool for the automatic generation and implementation of representative substructures, is presented. Three screening examples, including toxicophore derivation, privileged motif detection and the integration of substructures with machine learning (ML) models, are provided to illustrate the utility of PySmash in safety profile evaluation, therapeutic activity exploration and molecular optimization, respectively. Its executable program and Python package are available at https://github.com/kotori-y/pySmash.
Collapse
Affiliation(s)
- Zi-Yi Yang
- Department of Pharmacy, Xiangya Hospital, Central South University and the Xiangya School of Pharmaceutical Sciences, Central South University, Sichuan, China
| | - Zhi-Jiang Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Hunan, China
| | - Yue Zhao
- Xiangya School of Pharmaceutical Sciences, Central South University (Changsha), Sichuan, China
| | - Ming-Zhu Yin
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Hunan
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong
| | - Xiang Chen
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Hunan
| | - Shao Liu
- Department of Pharmacy, Xiangya Hospital, Central South University, Hunan
| | - Ting-Jun Hou
- College of Pharmaceutical Sciences, Zhejiang University, China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, China
| |
Collapse
|
35
|
Bian Y, Xie XQ. Generative chemistry: drug discovery with deep learning generative models. J Mol Model 2021; 27:71. [PMID: 33543405 PMCID: PMC10984615 DOI: 10.1007/s00894-021-04674-8] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 01/13/2021] [Indexed: 12/15/2022]
Abstract
The de novo design of molecular structures using deep learning generative models introduces an encouraging solution to drug discovery in the face of the continuously increased cost of new drug development. From the generation of original texts, images, and videos, to the scratching of novel molecular structures the creativity of deep learning generative models exhibits the height machine intelligence can achieve. The purpose of this paper is to review the latest advances in generative chemistry which relies on generative modeling to expedite the drug discovery process. This review starts with a brief history of artificial intelligence in drug discovery to outline this emerging paradigm. Commonly used chemical databases, molecular representations, and tools in cheminformatics and machine learning are covered as the infrastructure for generative chemistry. The detailed discussions on utilizing cutting-edge generative architectures, including recurrent neural network, variational autoencoder, adversarial autoencoder, and generative adversarial network for compound generation are focused. Challenges and future perspectives follow.
Collapse
Affiliation(s)
- Yuemin Bian
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA, 15261, USA
- NIH National Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, Pittsburgh, PA, 15261, USA
| | - Xiang-Qun Xie
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA, 15261, USA.
- NIH National Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, Pittsburgh, PA, 15261, USA.
- Drug Discovery Institute, University of Pittsburgh, 335 Sutherland Drive, 206 Salk Pavilion, Pittsburgh, PA, 15261, USA.
- Departments of Computational Biology and Structural Biology, School of Medicine, University of Pittsburgh, PA, 15261, Pittsburgh, USA.
| |
Collapse
|
36
|
Awale M, Hert J, Guasch L, Riniker S, Kramer C. The Playbooks of Medicinal Chemistry Design Moves. J Chem Inf Model 2021; 61:729-742. [PMID: 33522806 DOI: 10.1021/acs.jcim.0c01143] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Large databases of biologically relevant molecules, such as ChEMBL, SureChEMBL, or compound collections of pharmaceutical or agrochemical companies, are invaluable sources of medicinal chemistry information, albeit implicit. We developed a modified matched molecular pair approach to systematically and exhaustively extract the transformations in these databases and distill them into snippets of explicit design knowledge that are easily interpretable and directly applicable. The resulting "playbooks of medicinal chemistry design moves" capture the collective pharmaceutical and agrochemical research expertise across multiple chemists, companies, targets, and projects. They can be queried in an automated fashion for systematic prospective design and compound generation. The ChEMBL playbook and an application to exploit it are available at https://github.com/mahendra-awale/medchem_moves.
Collapse
Affiliation(s)
- Mahendra Awale
- Computer-Aided Drug Design/Therapeutic Modalities, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070 Basel, Switzerland
| | - Jérôme Hert
- Computer-Aided Drug Design/Therapeutic Modalities, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070 Basel, Switzerland
| | - Laura Guasch
- Computer-Aided Drug Design/Therapeutic Modalities, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070 Basel, Switzerland
| | - Sereina Riniker
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Christian Kramer
- Computer-Aided Drug Design/Therapeutic Modalities, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070 Basel, Switzerland
| |
Collapse
|
37
|
Machine learning, artificial intelligence, and data science breaking into drug design and neglected diseases. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2021. [DOI: 10.1002/wcms.1513] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
38
|
Coley CW, Eyke NS, Jensen KF. Autonome Entdeckung in den chemischen Wissenschaften, Teil II: Ausblick. Angew Chem Int Ed Engl 2020. [DOI: 10.1002/ange.201909989] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Connor W. Coley
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Natalie S. Eyke
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Klavs F. Jensen
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
39
|
Langevin M, Minoux H, Levesque M, Bianciotto M. Scaffold-Constrained Molecular Generation. J Chem Inf Model 2020; 60:5637-5646. [PMID: 33301333 DOI: 10.1021/acs.jcim.0c01015] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
One of the major applications of generative models for drug discovery targets the lead-optimization phase. During the optimization of a lead series, it is common to have scaffold constraints imposed on the structure of the molecules designed. Without enforcing such constraints, the probability of generating molecules with the required scaffold is extremely low and hinders the practicality of generative models for de novo drug design. To tackle this issue, we introduce a new algorithm, named SAMOA (Scaffold Constrained Molecular Generation), to perform scaffold-constrained in silico molecular design. We build on the well-known SMILES-based Recurrent Neural Network (RNN) generative model, with a modified sampling procedure to achieve scaffold-constrained generation. We directly benefit from the associated reinforcement learning methods, allowing to design molecules optimized for different properties while exploring only the relevant chemical space. We showcase the method's ability to perform scaffold-constrained generation on various tasks: designing novel molecules around scaffolds extracted from SureChEMBL chemical series, generating novel active molecules on the Dopamine Receptor D2 (DRD2) target, and finally, designing predicted actives on the MMP-12 series, an industrial lead-optimization project.
Collapse
Affiliation(s)
- Maxime Langevin
- PASTEUR, Département de chimie, École Normale Supérieure, PSL University, Sorbonne Université, CNRS, 75005 Paris, France.,Molecular Design Sciences - Integrated Drug Discovery, Sanofi R&D, 94400 Vitry-sur-Seine, France
| | - Hervé Minoux
- Molecular Design Sciences - Integrated Drug Discovery, Sanofi R&D, 94400 Vitry-sur-Seine, France
| | - Maximilien Levesque
- PASTEUR, Département de chimie, École Normale Supérieure, PSL University, Sorbonne Université, CNRS, 75005 Paris, France.,Aqemia, 75001 Paris, France
| | - Marc Bianciotto
- Molecular Design Sciences - Integrated Drug Discovery, Sanofi R&D, 94400 Vitry-sur-Seine, France
| |
Collapse
|
40
|
Takeuchi K, Kunimoto R, Bajorath J. Global Assessment of Substituents on the Basis of Analogue Series. J Med Chem 2020; 63:15013-15020. [PMID: 33253557 DOI: 10.1021/acs.jmedchem.0c01607] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
While bioisosteric replacements have been extensively investigated, comprehensive analyses of R-/functional groups have thus far been rare in medicinal chemistry. We introduce a new analysis concept for the exploration of chemical substituent space that is based upon bioactive analogue series as a source. From ∼24,000 analogue series, more than 19,000 substituents were isolated that were differently distributed. A subset of ∼400 substituent fragments occurred most frequently in different structural contexts. These substituents contained well-known R-groups as well as novel structures. Substitution site-specific replacement and network analysis revealed that chemically similar substituents preferentially occurred at given sites and identified intuitive substitution pathways that can be explored for compound design. Taken together, the results of our analysis provide new insights into substituent space and identify preferred substituents on the basis of analogue series. As a part of our study, all the data reported are made freely available.
Collapse
Affiliation(s)
- Kosuke Takeuchi
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Endenicher Allee 19c, Rheinische Friedrich-Wilhelms-Universität, D-53115 Bonn, Germany
| | - Ryo Kunimoto
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Endenicher Allee 19c, Rheinische Friedrich-Wilhelms-Universität, D-53115 Bonn, Germany
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Endenicher Allee 19c, Rheinische Friedrich-Wilhelms-Universität, D-53115 Bonn, Germany
| |
Collapse
|
41
|
Zivanovic S, Bayarri G, Colizzi F, Moreno D, Gelpí JL, Soliva R, Hospital A, Orozco M. Bioactive Conformational Ensemble Server and Database. A Public Framework to Speed Up In Silico Drug Discovery. J Chem Theory Comput 2020; 16:6586-6597. [PMID: 32786900 DOI: 10.1021/acs.jctc.0c00305] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Modern high-throughput structure-based drug discovery algorithms consider ligand flexibility, but typically with low accuracy, which results in a loss of performance in the derived models. Here we present the bioactive conformational ensemble (BCE) server and its associated database. The server creates conformational ensembles of drug-like ligands and stores them in the BCE database, where a variety of analyses are offered to the user. The workflow implemented in the BCE server combines enhanced sampling molecular dynamics with self-consistent reaction field quantum mechanics (SCRF/QM) calculations. The server automatizes all of the steps to transform one-dimensional (1D) or 2D representation of drugs into 3D molecules, which are then titrated, parametrized, hydrated, and optimized before being subjected to Hamiltonian replica-exchange (HREX) molecular dynamics simulations. Ensembles are collected and subjected to a clustering procedure to derive representative conformers, which are then analyzed at the SCRF/QM level of theory. All structural data are organized in a noSQL database accessible through a graphical interface and in a programmatic manner through a REST API. The server allows the user to define a private workspace and offers a deposition protocol as well as input files for "in house" calculations in those cases where confidentiality is a must. The database and the associated server are available at https://mmb.irbbarcelona.org/BCE.
Collapse
Affiliation(s)
- Sanja Zivanovic
- Institute for Research in Biomedicine (IRB Barcelona). The Barcelona Institute of Science and Technology (BIST). Barcelona 08028, Spain
| | - Genís Bayarri
- Institute for Research in Biomedicine (IRB Barcelona). The Barcelona Institute of Science and Technology (BIST). Barcelona 08028, Spain
| | - Francesco Colizzi
- Institute for Research in Biomedicine (IRB Barcelona). The Barcelona Institute of Science and Technology (BIST). Barcelona 08028, Spain
| | - David Moreno
- Institute for Research in Biomedicine (IRB Barcelona). The Barcelona Institute of Science and Technology (BIST). Barcelona 08028, Spain
| | - Josep Lluís Gelpí
- Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain.,Departament de Bioquímica i Biomedicina, Facultat de Biologia. Universitat de Barcelona, Barcelona E08028, Spain
| | - Robert Soliva
- Nostrum Biodiscovery, Nexus II Building, Barcelona 08034, Spain
| | - Adam Hospital
- Institute for Research in Biomedicine (IRB Barcelona). The Barcelona Institute of Science and Technology (BIST). Barcelona 08028, Spain
| | - Modesto Orozco
- Institute for Research in Biomedicine (IRB Barcelona). The Barcelona Institute of Science and Technology (BIST). Barcelona 08028, Spain.,Departament de Bioquímica i Biomedicina, Facultat de Biologia. Universitat de Barcelona, Barcelona E08028, Spain
| |
Collapse
|
42
|
Alshehri AS, Gani R, You F. Deep learning and knowledge-based methods for computer-aided molecular design—toward a unified approach: State-of-the-art and future directions. Comput Chem Eng 2020. [DOI: 10.1016/j.compchemeng.2020.107005] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
43
|
Burggraaff L, Lenselink EB, Jespers W, van Engelen J, Bongers BJ, González MG, Liu R, Hoos HH, van Vlijmen HWT, IJzerman AP, van Westen GJP. Successive Statistical and Structure-Based Modeling to Identify Chemically Novel Kinase Inhibitors. J Chem Inf Model 2020; 60:4283-4295. [PMID: 32343143 PMCID: PMC7525794 DOI: 10.1021/acs.jcim.9b01204] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
![]()
Kinases are frequently
studied in the context of anticancer drugs.
Their involvement in cell responses, such as proliferation, differentiation,
and apoptosis, makes them interesting subjects in multitarget drug
design. In this study, a workflow is presented that models the bioactivity
spectra for two panels of kinases: (1) inhibition of RET, BRAF, SRC,
and S6K, while avoiding inhibition of MKNK1, TTK, ERK8, PDK1, and
PAK3, and (2) inhibition of AURKA, PAK1, FGFR1, and LKB1, while avoiding
inhibition of PAK3, TAK1, and PIK3CA. Both statistical and structure-based
models were included, which were thoroughly benchmarked and optimized.
A virtual screening was performed to test the workflow for one of
the main targets, RET kinase. This resulted in 5 novel and chemically
dissimilar RET inhibitors with remaining RET activity of <60% (at
a concentration of 10 μM) and similarities with known RET inhibitors
from 0.18 to 0.29 (Tanimoto, ECFP6). The four more potent inhibitors
were assessed in a concentration range and proved to be modestly active
with a pIC50 value of 5.1 for the most active compound.
The experimental validation of inhibitors for RET strongly indicates
that the multitarget workflow is able to detect novel inhibitors for
kinases, and hence, this workflow can potentially be applied in polypharmacology
modeling. We conclude that this approach can identify new chemical
matter for existing targets. Moreover, this workflow can easily be
applied to other targets as well.
Collapse
Affiliation(s)
- Lindsey Burggraaff
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Eelke B Lenselink
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Willem Jespers
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands.,Department of Cell and Molecular Biology, Uppsala University, Uppsala 75124, Sweden
| | - Jesper van Engelen
- Department of Computer Science, Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands
| | - Brandon J Bongers
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Marina Gorostiola González
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Rongfang Liu
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Holger H Hoos
- Department of Computer Science, Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands
| | - Herman W T van Vlijmen
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands.,Janssen Research & Development, Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Adriaan P IJzerman
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Gerard J P van Westen
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| |
Collapse
|
44
|
Maragakis P, Nisonoff H, Cole B, Shaw DE. A Deep-Learning View of Chemical Space Designed to Facilitate Drug Discovery. J Chem Inf Model 2020; 60:4487-4496. [PMID: 32697578 DOI: 10.1021/acs.jcim.0c00321] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Drug discovery projects entail cycles of design, synthesis, and testing that yield a series of chemically related small molecules whose properties, such as binding affinity to a given target protein, are progressively tailored to a particular drug discovery goal. The use of deep-learning technologies could augment the typical practice of using human intuition in the design cycle, and thereby expedite drug discovery projects. Here, we present DESMILES, a deep neural network model that advances the state of the art in machine learning approaches to molecular design. We applied DESMILES to a previously published benchmark that assesses the ability of a method to modify input molecules to inhibit the dopamine receptor D2, and DESMILES yielded a 77% lower failure rate compared to state-of-the-art models. To explain the ability of DESMILES to hone molecular properties, we visualize a layer of the DESMILES network, and further demonstrate this ability by using DESMILES to tailor the same molecules used in the D2 benchmark test to dock more potently against seven different receptors.
Collapse
Affiliation(s)
- Paul Maragakis
- D. E. Shaw Research, New York, New York 10036, United States
| | - Hunter Nisonoff
- D. E. Shaw Research, New York, New York 10036, United States
| | - Brian Cole
- D. E. Shaw Research, New York, New York 10036, United States
| | - David E Shaw
- D. E. Shaw Research, New York, New York 10036, United States.,Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, United States
| |
Collapse
|
45
|
Yang ZY, Yang ZJ, Lu AP, Hou TJ, Cao DS. Scopy: an integrated negative design python library for desirable HTS/VS database design. Brief Bioinform 2020; 22:5901981. [PMID: 32892221 DOI: 10.1093/bib/bbaa194] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2020] [Revised: 07/27/2020] [Accepted: 07/28/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND High-throughput screening (HTS) and virtual screening (VS) have been widely used to identify potential hits from large chemical libraries. However, the frequent occurrence of 'noisy compounds' in the screened libraries, such as compounds with poor drug-likeness, poor selectivity or potential toxicity, has greatly weakened the enrichment capability of HTS and VS campaigns. Therefore, the development of comprehensive and credible tools to detect noisy compounds from chemical libraries is urgently needed in early stages of drug discovery. RESULTS In this study, we developed a freely available integrated python library for negative design, called Scopy, which supports the functions of data preparation, calculation of descriptors, scaffolds and screening filters, and data visualization. The current version of Scopy can calculate 39 basic molecular properties, 3 comprehensive molecular evaluation scores, 2 types of molecular scaffolds, 6 types of substructure descriptors and 2 types of fingerprints. A number of important screening rules are also provided by Scopy, including 15 drug-likeness rules (13 drug-likeness rules and 2 building block rules), 8 frequent hitter rules (four assay interference substructure filters and four promiscuous compound substructure filters), and 11 toxicophore filters (five human-related toxicity substructure filters, three environment-related toxicity substructure filters and three comprehensive toxicity substructure filters). Moreover, this library supports four different visualization functions to help users to gain a better understanding of the screened data, including basic feature radar chart, feature-feature-related scatter diagram, functional group marker gram and cloud gram. CONCLUSION Scopy provides a comprehensive Python package to filter out compounds with undesirable properties or substructures, which will benefit the design of high-quality chemical libraries for drug design and discovery. It is freely available at https://github.com/kotori-y/Scopy.
Collapse
Affiliation(s)
- Zi-Yi Yang
- Xiangya School of Pharmaceutical Sciences, Central South University (Changsha)
| | - Zhi-Jiang Yang
- Xiangya School of Pharmaceutical Sciences, Central South University
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong
| | - Ting-Jun Hou
- College of Pharmaceutical Sciences, Zhejiang University, China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, China
| |
Collapse
|
46
|
Ertl P, Altmann E, McKenna JM. The Most Common Functional Groups in Bioactive Molecules and How Their Popularity Has Evolved over Time. J Med Chem 2020; 63:8408-8418. [DOI: 10.1021/acs.jmedchem.0c00754] [Citation(s) in RCA: 69] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Peter Ertl
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research, Basel CH-4056, Switzerland
| | - Eva Altmann
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research, Basel CH-4056, Switzerland
| | - Jeffrey M. McKenna
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research, 250 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
47
|
Amabilino S, Pogány P, Pickett SD, Green DVS. Guidelines for Recurrent Neural Network Transfer Learning-Based Molecular Generation of Focused Libraries. J Chem Inf Model 2020; 60:5699-5713. [DOI: 10.1021/acs.jcim.0c00343] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Silvia Amabilino
- School of Chemistry, University of Bristol, Cantock’s Close, Bristol BS8 1TS, United Kingdom
| | - Peter Pogány
- Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Stevenage, Herts SG1 2NY, United Kingdom
| | - Stephen D. Pickett
- Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Stevenage, Herts SG1 2NY, United Kingdom
| | - Darren V. S. Green
- Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Stevenage, Herts SG1 2NY, United Kingdom
| |
Collapse
|
48
|
Murali V, Königs C, Deekshitula S, Nukala S, Santhi MD, Athri P. CompoundDB4j: Integrated Drug Resource of Heterogeneous Chemical Databases. Mol Inform 2020; 39:e2000013. [DOI: 10.1002/minf.202000013] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Accepted: 05/09/2020] [Indexed: 01/13/2023]
Affiliation(s)
- Vidhya Murali
- Dept. of Computer Science & Engineering Amrita School of Engineering Bengaluru Amrita Vishwa Vidyapeetham India 2518 3700
| | - Cassandra Königs
- Bio informatics and Medical Informatics Bielefeld University Northrhine-Westphalia Germany
| | - Sarvani Deekshitula
- Dept. of Computer Science & Engineering Amrita School of Engineering, Amritapuri, Amrita Vishwa Vidyapeetham India
| | - Saranya Nukala
- Dept. of Computer Science & Engineering Amrita School of Engineering, Amritapuri, Amrita Vishwa Vidyapeetham India
| | - Maddala Divya Santhi
- Dept. of Computer Science & Engineering Amrita School of Engineering, Amritapuri, Amrita Vishwa Vidyapeetham India
| | - Prashanth Athri
- Dept. of Computer Science & Engineering Amrita School of Engineering Bengaluru Amrita Vishwa Vidyapeetham India 2518 3700
| |
Collapse
|
49
|
Coley CW, Eyke NS, Jensen KF. Autonomous Discovery in the Chemical Sciences Part II: Outlook. Angew Chem Int Ed Engl 2020; 59:23414-23436. [PMID: 31553509 DOI: 10.1002/anie.201909989] [Citation(s) in RCA: 94] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Indexed: 01/19/2023]
Abstract
This two-part Review examines how automation has contributed to different aspects of discovery in the chemical sciences. In this second part, we reflect on a selection of exemplary studies. It is increasingly important to articulate what the role of automation and computation has been in the scientific process and how that has or has not accelerated discovery. One can argue that even the best automated systems have yet to "discover" despite being incredibly useful as laboratory assistants. We must carefully consider how they have been and can be applied to future problems of chemical discovery in order to effectively design and interact with future autonomous platforms. The majority of this Review defines a large set of open research directions, including improving our ability to work with complex data, build empirical models, automate both physical and computational experiments for validation, select experiments, and evaluate whether we are making progress towards the ultimate goal of autonomous discovery. Addressing these practical and methodological challenges will greatly advance the extent to which autonomous systems can make meaningful discoveries.
Collapse
Affiliation(s)
- Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Natalie S Eyke
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Klavs F Jensen
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| |
Collapse
|
50
|
Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, Li Q, Shoemaker BA, Thiessen PA, Yu B, Zaslavsky L, Zhang J, Bolton EE. PubChem 2019 update: improved access to chemical data. Nucleic Acids Res 2020; 47:D1102-D1109. [PMID: 30371825 PMCID: PMC6324075 DOI: 10.1093/nar/gky1033] [Citation(s) in RCA: 1692] [Impact Index Per Article: 423.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Accepted: 10/26/2018] [Indexed: 11/14/2022] Open
Abstract
PubChem (https://pubchem.ncbi.nlm.nih.gov) is a key chemical information resource for the biomedical research community. Substantial improvements were made in the past few years. New data content was added, including spectral information, scientific articles mentioning chemicals, and information for food and agricultural chemicals. PubChem released new web interfaces, such as PubChem Target View page, Sources page, Bioactivity dyad pages and Patent View page. PubChem also released a major update to PubChem Widgets and introduced a new programmatic access interface, called PUG-View. This paper describes these new developments in PubChem.
Collapse
Affiliation(s)
- Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| | - Jie Chen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| | - Tiejun Cheng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| | - Asta Gindulyte
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| | - Jia He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| | - Siqian He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| | - Qingliang Li
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| | - Benjamin A Shoemaker
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| | - Paul A Thiessen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| | - Bo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| | - Leonid Zaslavsky
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| | - Jian Zhang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| | - Evan E Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| |
Collapse
|