251
|
Réda C, Kaufmann E, Delahaye-Duriez A. Machine learning applications in drug development. Comput Struct Biotechnol J 2019; 18:241-252. [PMID: 33489002 PMCID: PMC7790737 DOI: 10.1016/j.csbj.2019.12.006] [Citation(s) in RCA: 85] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2019] [Revised: 12/10/2019] [Accepted: 12/10/2019] [Indexed: 02/07/2023] Open
Abstract
Due to the huge amount of biological and medical data available today, along with well-established machine learning algorithms, the design of largely automated drug development pipelines can now be envisioned. These pipelines may guide, or speed up, drug discovery; provide a better understanding of diseases and associated biological phenomena; help planning preclinical wet-lab experiments, and even future clinical trials. This automation of the drug development process might be key to the current issue of low productivity rate that pharmaceutical companies currently face. In this survey, we will particularly focus on two classes of methods: sequential learning and recommender systems, which are active biomedical fields of research.
Collapse
Affiliation(s)
- Clémence Réda
- NeuroDiderot, UMR 1141, Inserm, Université de Paris, Sorbonne Paris Cité, Hôpital Robert Debré, 48, boulevard Sérurier, Paris 75019, France
- Université Paris Diderot, Université de Paris, Sorbonne Paris Cité, 5, rue Thomas Mann, Paris 75013, France
| | - Emilie Kaufmann
- Univ. Lille, CNRS, Centrale Lille, Inria, UMR 9189 - CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille, F-59000 Lille, France
| | - Andrée Delahaye-Duriez
- NeuroDiderot, UMR 1141, Inserm, Université de Paris, Sorbonne Paris Cité, Hôpital Robert Debré, 48, boulevard Sérurier, Paris 75019, France
- Université Paris 13, Sorbonne Paris Cité, UFR de santé, médecine et biologie humaine, Bobigny 93000, France
- Service histologie-embryologie-cytogénétique-biologie de la reproduction-CECOS, Hôpital Jean Verdier, AP-HP, Bondy 93140, France
| |
Collapse
|
252
|
Fu L, Liu L, Yang ZJ, Li P, Ding JJ, Yun YH, Lu AP, Hou TJ, Cao DS. Systematic Modeling of log D7.4 Based on Ensemble Machine Learning, Group Contribution, and Matched Molecular Pair Analysis. J Chem Inf Model 2019; 60:63-76. [DOI: 10.1021/acs.jcim.9b00718] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
- Li Fu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Lu Liu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Zhi-Jiang Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Pan Li
- Beijing Institute of Pharmaceutical Chemistry, Beijing 102205, P. R. China
| | - Jun-Jie Ding
- Beijing Institute of Pharmaceutical Chemistry, Beijing 102205, P. R. China
| | - Yong-Huan Yun
- College of Food Science and Engineering, Hainan University, Haikou 570228, P. R. China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, P. R. China
| | - Ting-Jun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, P. R. China
| |
Collapse
|
253
|
Santos FRS, Nunes DAF, Lima WG, Davyt D, Santos LL, Taranto AG, M. S. Ferreira J. Identification of Zika Virus NS2B-NS3 Protease Inhibitors by Structure-Based Virtual Screening and Drug Repurposing Approaches. J Chem Inf Model 2019; 60:731-737. [DOI: 10.1021/acs.jcim.9b00933] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Affiliation(s)
- Felipe R. S. Santos
- Laboratório de Microbiologia Médica, Campus Centro-Oeste Dona Lindu, Universidade Federal de São João Del-Rei (UFSJ), Divinópolis 35501-296, Minas Gerais, Brasil
- Laboratório de Química Farmacêutica Medicinal, Campus Centro-Oeste Dona Lindu, Universidade Federal de São João Del-Rei (UFSJ), Divinópolis 35501-296, Minas Gerais, Brasil
| | - Damiana A. F. Nunes
- Laboratório de Microbiologia Médica, Campus Centro-Oeste Dona Lindu, Universidade Federal de São João Del-Rei (UFSJ), Divinópolis 35501-296, Minas Gerais, Brasil
| | - William G. Lima
- Departamento de Produtos Farmacêuticos, Faculdade de Farmácia, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Minas Gerais, Brasil
| | - Danilo Davyt
- Departamento de Química Orgánica, Facultad de Química, Universidad de la República, Montevideo 11800, Uruguay
| | - Luciana L. Santos
- Laboratório de Biologia Molecular, Campus Centro-Oeste Dona Lindu, Universidade Federal de São, João Del-Rei (UFSJ), Divinópolis 35501-296, Minas Gerais, Brasil
| | - Alex G. Taranto
- Laboratório de Química Farmacêutica Medicinal, Campus Centro-Oeste Dona Lindu, Universidade Federal de São João Del-Rei (UFSJ), Divinópolis 35501-296, Minas Gerais, Brasil
| | - Jaqueline M. S. Ferreira
- Laboratório de Microbiologia Médica, Campus Centro-Oeste Dona Lindu, Universidade Federal de São João Del-Rei (UFSJ), Divinópolis 35501-296, Minas Gerais, Brasil
| |
Collapse
|
254
|
Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, Li B, Madabhushi A, Shah P, Spitzer M, Zhao S. Applications of machine learning in drug discovery and development. Nat Rev Drug Discov 2019; 18:463-477. [PMID: 30976107 DOI: 10.1038/s41573-019-0024-5] [Citation(s) in RCA: 1174] [Impact Index Per Article: 195.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Drug discovery and development pipelines are long, complex and depend on numerous factors. Machine learning (ML) approaches provide a set of tools that can improve discovery and decision making for well-specified questions with abundant, high-quality data. Opportunities to apply ML occur in all stages of drug discovery. Examples include target validation, identification of prognostic biomarkers and analysis of digital pathology data in clinical trials. Applications have ranged in context and methodology, with some approaches yielding accurate predictions and insights. The challenges of applying ML lie primarily with the lack of interpretability and repeatability of ML-generated results, which may limit their application. In all areas, systematic and comprehensive high-dimensional data still need to be generated. With ongoing efforts to tackle these issues, as well as increasing awareness of the factors needed to validate ML approaches, the application of ML can promote data-driven decision making and has the potential to speed up the process and reduce failure rates in drug discovery and development.
Collapse
Affiliation(s)
- Jessica Vamathevan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK.
| | - Dominic Clark
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | | | - Ian Dunham
- Open Targets and European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Edgardo Ferran
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - George Lee
- Bristol-Myers Squibb, Princeton, NJ, USA
| | - Bin Li
- Takeda Pharmaceuticals International Co., Cambridge, MA, USA
| | - Anant Madabhushi
- Case Western Reserve University, Cleveland, OH, USA.,Louis Stokes Cleveland Veterans Affair Medical Center, Cleveland, OH, USA
| | | | - Michaela Spitzer
- Open Targets and European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Shanrong Zhao
- Pfizer Worldwide Research and Development, Cambridge, MA, USA
| |
Collapse
|
255
|
Hu R, Pei G, Jia P, Zhao Z. Decoding regulatory structures and features from epigenomics profiles: A Roadmap-ENCODE Variational Auto-Encoder (RE-VAE) model. Methods 2019; 189:44-53. [PMID: 31672653 DOI: 10.1016/j.ymeth.2019.10.012] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Accepted: 10/24/2019] [Indexed: 12/15/2022] Open
Abstract
The development of chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing (ChIP-seq) technologies has promoted generation of large-scale epigenomics data, providing us unprecedented opportunities to explore the landscape of epigenomic profiles at scales across both histone marks and tissue types. In addition to many tools directly for data analysis, advanced computational approaches, such as deep learning, have recently become promising to deeply mine the data structures and identify important regulators from complex functional genomics data. We implemented a neural network framework, a Variational Auto-Encoder (VAE) model, to explore the epigenomic data from the Roadmap Epigenomics Project and the Encyclopedia of DNA Elements (ENCODE) project. Our model is applied to 935 reference samples, covering 28 tissues and 12 histone marks. We used the enhancer and promoter regions as the annotation features and ChIP-seq signal values in these regions as the feature values. Through a parameter sweep process, we identified the suitable hyperparameter values and built a VAE model to represent the epigenomics data and to further explore the biological regulation. The resultant Roadmap-ENCODE VAE (RE-VAE) model contained data compression and feature representation. Using the compressed data in the latent space, we found that the majority of histone marks were well clustered but not for tissues or cell types. Tissue or cell specificity was observed only in some histone marks (e.g., H3K4me3 and H3K27ac) and could be characterized when the number of tissue samples is large (e.g., blood and brain). In blood, the contributive regions and genes identified by RE-VAE model were confirmed by tissue-specificity enrichment analysis with an independent tissue expression panel. Finally, we demonstrated that RE-VAE model could detect cancer cell lines with similar epigenomics profiles. In conclusion, we introduced and implemented a VAE model to represent large-scale epigenomics data. The model could be used to explore classifications of histone modifications and tissue/cell specificity and to classify new data with unknown sources.
Collapse
Affiliation(s)
- Ruifeng Hu
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Guangsheng Pei
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Peilin Jia
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA; Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA.
| |
Collapse
|
256
|
Moumbock AF, Li J, Mishra P, Gao M, Günther S. Current computational methods for predicting protein interactions of natural products. Comput Struct Biotechnol J 2019; 17:1367-1376. [PMID: 31762960 PMCID: PMC6861622 DOI: 10.1016/j.csbj.2019.08.008] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 08/09/2019] [Accepted: 08/23/2019] [Indexed: 01/08/2023] Open
Abstract
Natural products (NPs) are an indispensable source of drugs and they have a better coverage of the pharmacological space than synthetic compounds, owing to their high structural diversity. The prediction of their interaction profiles with druggable protein targets remains a major challenge in modern drug discovery. Experimental (off-)target predictions of NPs are cost- and time-consuming, whereas computational methods, on the other hand, are much faster and cheaper. As a result, computational predictions are preferentially used in the first instance for NP profiling, prior to experimental validations. This review covers recent advances in computational approaches which have been developed to aid the annotation of unknown drug-target interactions (DTIs), by focusing on three broad classes, namely: ligand-based, target-based, and target-ligand-based (hybrid) approaches. Computational DTI prediction methods have the potential to significantly advance the discovery and development of novel selective drugs exhibiting minimal side effects. We highlight some inherent caveats of these methods which must be overcome to enable them to realize their full potential, and a future outlook is given.
Collapse
Affiliation(s)
| | | | | | | | - Stefan Günther
- Institute of Pharmaceutical Sciences, Research Group Pharmaceutical Bioinformatics, Albert-Ludwigs-Universität Freiburg, Germany
| |
Collapse
|
257
|
Cockroft NT, Cheng X, Fuchs JR. STarFish: A Stacked Ensemble Target Fishing Approach and its Application to Natural Products. J Chem Inf Model 2019; 59:4906-4920. [PMID: 31589422 DOI: 10.1021/acs.jcim.9b00489] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Target fishing is the process of identifying the protein target of a bioactive small molecule. To do so experimentally requires a significant investment of time and resources, which can be expedited with a reliable computational target fishing model. The development of computational target fishing models using machine learning has become very popular over the last several years because of the increased availability of large amounts of public bioactivity data. Unfortunately, the applicability and performance of such models for natural products has not yet been comprehensively assessed. This is, in part, due to the relative lack of bioactivity data available for natural products compared to synthetic compounds. Moreover, the databases commonly used to train such models do not annotate which compounds are natural products, which makes the collection of a benchmarking set difficult. To address this knowledge gap, a data set composed of natural product structures and their associated protein targets was generated by cross-referencing 20 publicly available natural product databases with the bioactivity database ChEMBL. This data set contains 5589 compound-target pairs for 1943 unique compounds and 1023 unique targets. A synthetic data set comprising 107 190 compound-target pairs for 88 728 unique compounds and 1907 unique targets was used to train k-nearest neighbors, random forest, and multilayer perceptron models. The predictive performance of each model was assessed by stratified 10-fold cross-validation and benchmarking on the newly collected natural product data set. Strong performance was observed for each model during cross-validation with area under the receiver operating characteristic (AUROC) scores ranging from 0.94 to 0.99 and Boltzmann-enhanced discrimination of receiver operating characteristic (BEDROC) scores from 0.89 to 0.94. When tested on the natural product data set, performance dramatically decreased with AUROC scores ranging from 0.70 to 0.85 and BEDROC scores from 0.43 to 0.59. However, the implementation of a model stacking approach, which uses logistic regression as a meta-classifier to combine model predictions, dramatically improved the ability to correctly predict the protein targets of natural products and increased the AUROC score to 0.94 and BEDROC score to 0.73. This stacked model was deployed as a web application, called STarFish, and has been made available for use to aid in target identification for natural products.
Collapse
Affiliation(s)
- Nicholas T Cockroft
- Division of Medicinal Chemistry & Pharmacognosy, College of Pharmacy , The Ohio State University , Columbus , Ohio 43210 , United States
| | - Xiaolin Cheng
- Division of Medicinal Chemistry & Pharmacognosy, College of Pharmacy , The Ohio State University , Columbus , Ohio 43210 , United States
| | - James R Fuchs
- Division of Medicinal Chemistry & Pharmacognosy, College of Pharmacy , The Ohio State University , Columbus , Ohio 43210 , United States
| |
Collapse
|
258
|
Gonçalves LM, Trevisol ETV, de Azevedo Abrahim Vieira B, De Mesquita JF. Trehalose synthesis inhibitor: A molecular in silico drug design. J Cell Biochem 2019; 121:1114-1125. [PMID: 31478225 DOI: 10.1002/jcb.29347] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Accepted: 08/13/2019] [Indexed: 11/11/2022]
Abstract
Infectious diseases are serious public health problems, affecting a large portion of the world's population. A molecule that plays a key role in pathogenic organisms is trehalose and recently has been an interest in the metabolism of this molecule for drug development. The trehalose-6-phosphate synthase (TPS1) is an enzyme responsible for the biosynthesis of trehalose-6-phosphate (T6P) in the TPS1/TPS2 pathway, which results in the formation of trehalose. Studies carried out by our group demonstrated the inhibitory capacity of T6P in the TPS1 enzyme from Saccharomyces cerevisiae, preventing the synthesis of trehalose. By in silico techniques, we compiled sequences and experimentally determined structures of TPS1. Sequence alignments and molecular modeling were performed. The generated structures were submitted in validation of algorithms, aligned structurally and analyzed evolutionarily. Molecular docking methodology was applied to analyze the interaction between T6P and TPS1 and ADMET properties of T6P were analyzed. The results demonstrated the models created presented sequence and structural similarities with experimentally determined structures. With the molecular docking, a cavity in the protein surface was identified and the molecule T6P was interacting with the residues TYR-40, ALA-41, MET-42, and PHE-372, indicating the possible uncompetitive inhibition mechanism provided by this ligand, which can be useful in directing the molecular design of inhibitors. In ADMET analyses, T6P had acceptable risk values compared with other compounds from World Drug Index. Therefore, these results may present a promising strategy to explore to develop a broad-spectrum antibiotic of this specific target with selectivity, potency, and reduced side effects, leading to a new way to treat infectious diseases like tuberculosis and candidiasis.
Collapse
Affiliation(s)
- Lucas Machado Gonçalves
- Bioinformatics and Computational Biology Group, Federal University of Rio de Janeiro - UNIRIO, RJ, Brazil
| | | | | | - Joelma Freire De Mesquita
- Bioinformatics and Computational Biology Group, Federal University of Rio de Janeiro - UNIRIO, RJ, Brazil
| |
Collapse
|
259
|
Comparison of Target Features for Predicting Drug-Target Interactions by Deep Neural Network Based on Large-Scale Drug-Induced Transcriptome Data. Pharmaceutics 2019; 11:pharmaceutics11080377. [PMID: 31382356 PMCID: PMC6723794 DOI: 10.3390/pharmaceutics11080377] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Revised: 07/16/2019] [Accepted: 07/24/2019] [Indexed: 12/31/2022] Open
Abstract
Uncovering drug-target interactions (DTIs) is pivotal to understand drug mode-of-action (MoA), avoid adverse drug reaction (ADR), and seek opportunities for drug repositioning (DR). For decades, in silico predictions for DTIs have largely depended on structural information of both targets and compounds, e.g., docking or ligand-based virtual screening. Recently, the application of deep neural network (DNN) is opening a new path to uncover novel DTIs for thousands of targets. One important question is which features for targets are most relevant to DTI prediction. As an early attempt to answer this question, we objectively compared three canonical target features extracted from: (i) the expression profiles by gene knockdown (GEPs); (ii) the protein–protein interaction network (PPI network); and (iii) the pathway membership (PM) of a target gene. For drug features, the large-scale drug-induced transcriptome dataset, or the Library of Integrated Network-based Cellular Signatures (LINCS) L1000 dataset was used. All these features are closely related to protein function or drug MoA, of which utility is only sparsely investigated. In particular, few studies have compared the three types of target features in DNN-based DTI prediction under the same evaluation scheme. Among the three target features, the PM and the PPI network show similar performances superior to GEPs. DNN models based on both features consistently outperformed other machine learning methods such as naïve Bayes, random forest, or logistic regression.
Collapse
|
260
|
Cerisier N, Petitjean M, Regad L, Bayard Q, Réau M, Badel A, Camproux AC. High Impact: The Role of Promiscuous Binding Sites in Polypharmacology. Molecules 2019; 24:molecules24142529. [PMID: 31295958 PMCID: PMC6680532 DOI: 10.3390/molecules24142529] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Revised: 06/27/2019] [Accepted: 06/27/2019] [Indexed: 02/06/2023] Open
Abstract
The literature focuses on drug promiscuity, which is a drug’s ability to bind to several targets, because it plays an essential role in polypharmacology. However, little work has been completed regarding binding site promiscuity, even though its properties are now recognized among the key factors that impact drug promiscuity. Here, we quantified and characterized the promiscuity of druggable binding sites from protein-ligand complexes in the high quality Mother Of All Databases while using statistical methods. Most of the sites (80%) exhibited promiscuity, irrespective of the protein class. Nearly half were highly promiscuous and able to interact with various types of ligands. The corresponding pockets were rather large and hydrophobic, with high sulfur atom and aliphatic residue frequencies, but few side chain atoms. Consequently, their interacting ligands can be large, rigid, and weakly hydrophilic. The selective sites that interacted with one ligand type presented less favorable pocket properties for establishing ligand contacts. Thus, their ligands were highly adaptable, small, and hydrophilic. In the dataset, the promiscuity of the site rather than the drug mainly explains the multiple interactions between the drug and target, as most ligand types are dedicated to one site. This underlines the essential contribution of binding site promiscuity to drug promiscuity between different protein classes.
Collapse
Affiliation(s)
- Natacha Cerisier
- Université de Paris, Biologie Fonctionnelle et Adaptative, UMR 8251, CNRS, ERL U1133, INSERM, Computational Modeling of Protein Ligand Interactions, F-75013 Paris, France
| | - Michel Petitjean
- Université de Paris, Biologie Fonctionnelle et Adaptative, UMR 8251, CNRS, ERL U1133, INSERM, Computational Modeling of Protein Ligand Interactions, F-75013 Paris, France
| | - Leslie Regad
- Université de Paris, Biologie Fonctionnelle et Adaptative, UMR 8251, CNRS, ERL U1133, INSERM, Computational Modeling of Protein Ligand Interactions, F-75013 Paris, France
| | - Quentin Bayard
- Centre de Recherche des Cordeliers, Sorbonne Universités, INSERM, USPC, Université Paris Descartes, Université Paris Diderot, Université Paris 13, Functional Genomics of Solid Tumors Laboratory, F-75006 Paris, France
| | - Manon Réau
- Laboratoire Génomique Bioinformatique et Chimie Moléculaire, EA 7528, Conservatoire National des Arts et Métiers, F-75003 Paris, France
| | - Anne Badel
- Université de Paris, Biologie Fonctionnelle et Adaptative, UMR 8251, CNRS, ERL U1133, INSERM, Computational Modeling of Protein Ligand Interactions, F-75013 Paris, France
| | - Anne-Claude Camproux
- Université de Paris, Biologie Fonctionnelle et Adaptative, UMR 8251, CNRS, ERL U1133, INSERM, Computational Modeling of Protein Ligand Interactions, F-75013 Paris, France.
| |
Collapse
|
261
|
Lee M, Kim H, Joe H, Kim HG. Multi-channel PINN: investigating scalable and transferable neural networks for drug discovery. J Cheminform 2019; 11:46. [PMID: 31289963 PMCID: PMC6617572 DOI: 10.1186/s13321-019-0368-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Accepted: 07/02/2019] [Indexed: 12/19/2022] Open
Abstract
Analysis of compound–protein interactions (CPIs) has become a crucial prerequisite for drug discovery and drug repositioning. In vitro experiments are commonly used in identifying CPIs, but it is not feasible to discover the molecular and proteomic space only through experimental approaches. Machine learning’s advances in predicting CPIs have made significant contributions to drug discovery. Deep neural networks (DNNs), which have recently been applied to predict CPIs, performed better than other shallow classifiers. However, such techniques commonly require a considerable volume of dense data for each training target. Although the number of publicly available CPI data has grown rapidly, public data is still sparse and has a large number of measurement errors. In this paper, we propose a novel method, Multi-channel PINN, to fully utilize sparse data in terms of representation learning. With representation learning, Multi-channel PINN can utilize three approaches of DNNs which are a classifier, a feature extractor, and an end-to-end learner. Multi-channel PINN can be fed with both low and high levels of representations and incorporates each of them by utilizing all approaches within a single model. To fully utilize sparse public data, we additionally explore the potential of transferring representations from training tasks to test tasks. As a proof of concept, Multi-channel PINN was evaluated on fifteen combinations of feature pairs to investigate how they affect the performance in terms of highest performance, initial performance, and convergence speed. The experimental results obtained indicate that the multi-channel models using protein features performed better than single-channel models or multi-channel models using compound features. Therefore, Multi-channel PINN can be advantageous when used with appropriate representations. Additionally, we pretrained models on a training task then finetuned them on a test task to figure out whether Multi-channel PINN can capture general representations for compounds and proteins. We found that there were significant differences in performance between pretrained models and non-pretrained models.
Collapse
Affiliation(s)
- Munhwan Lee
- Biomedical Knowledge Engineering Laboratory, Seoul National University, 1 Gwanak-ro, Seoul, Republic of Korea
| | - Hyeyeon Kim
- Biomedical Knowledge Engineering Laboratory, Seoul National University, 1 Gwanak-ro, Seoul, Republic of Korea
| | - Hyunwhan Joe
- Biomedical Knowledge Engineering Laboratory, Seoul National University, 1 Gwanak-ro, Seoul, Republic of Korea
| | - Hong-Gee Kim
- Biomedical Knowledge Engineering Laboratory, Seoul National University, 1 Gwanak-ro, Seoul, Republic of Korea.
| |
Collapse
|
262
|
Shen C, Ding J, Wang Z, Cao D, Ding X, Hou T. From machine learning to deep learning: Advances in scoring functions for protein–ligand docking. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2019. [DOI: 10.1002/wcms.1429] [Citation(s) in RCA: 76] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
- Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University Hangzhou P. R. China
| | - Junjie Ding
- Beijing Institute of Pharmaceutical Chemistry Beijing P. R. China
| | - Zhe Wang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University Hangzhou P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University Changsha P. R. China
| | - Xiaoqin Ding
- Beijing Institute of Pharmaceutical Chemistry Beijing P. R. China
| | - Tingjun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University Hangzhou P. R. China
| |
Collapse
|
263
|
Masoudi-Sobhanzadeh Y, Omidi Y, Amanlou M, Masoudi-Nejad A. Trader as a new optimization algorithm predicts drug-target interactions efficiently. Sci Rep 2019; 9:9348. [PMID: 31249365 PMCID: PMC6597553 DOI: 10.1038/s41598-019-45814-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Accepted: 06/17/2019] [Indexed: 12/29/2022] Open
Abstract
Several machine learning approaches have been proposed for predicting new benefits of the existing drugs. Although these methods have introduced new usage(s) of some medications, efficient methods can lead to more accurate predictions. To this end, we proposed a novel machine learning method which is based on a new optimization algorithm, named Trader. To show the capabilities of the proposed algorithm which can be applied to the different scope of science, it was compared with ten other state-of-the-art optimization algorithms based on the standard and advanced benchmark functions. Next, a multi-layer artificial neural network was designed and trained by Trader to predict drug-target interactions (DTIs). Finally, the functionality of the proposed method was investigated on some DTIs datasets and compared with other methods. The data obtained by Trader showed that it eliminates the disadvantages of different optimization algorithms, resulting in a better outcome. Further, the proposed machine learning method was found to achieve a significant level of performance compared to the other popular and efficient approaches in predicting unknown DTIs. All the implemented source codes are freely available at https://github.com/LBBSoft/Trader .
Collapse
Affiliation(s)
- Yosef Masoudi-Sobhanzadeh
- Laboratory of systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Yadollah Omidi
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Massoud Amanlou
- Drug Design and Development Research Center, The Institute of Pharmaceutical Sciences (TIPS), Tehran University of Medical Sciences, Tehran, 14176-53955, Iran
| | - Ali Masoudi-Nejad
- Laboratory of systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
| |
Collapse
|
264
|
Ekins S, Puhl AC, Zorn KM, Lane TR, Russo DP, Klein JJ, Hickey AJ, Clark AM. Exploiting machine learning for end-to-end drug discovery and development. NATURE MATERIALS 2019; 18:435-441. [PMID: 31000803 PMCID: PMC6594828 DOI: 10.1038/s41563-019-0338-z] [Citation(s) in RCA: 269] [Impact Index Per Article: 44.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Accepted: 03/07/2019] [Indexed: 05/20/2023]
Abstract
A variety of machine learning methods such as naive Bayesian, support vector machines and more recently deep neural networks are demonstrating their utility for drug discovery and development. These leverage the generally bigger datasets created from high-throughput screening data and allow prediction of bioactivities for targets and molecular properties with increased levels of accuracy. We have only just begun to exploit the potential of these techniques but they may already be fundamentally changing the research process for identifying new molecules and/or repurposing old drugs. The integrated application of such machine learning models for end-to-end (E2E) application is broadly relevant and has considerable implications for developing future therapies and their targeting.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations Pharmaceuticals, Inc., Raleigh, NC, USA.
| | - Ana C Puhl
- Collaborations Pharmaceuticals, Inc., Raleigh, NC, USA
| | | | - Thomas R Lane
- Collaborations Pharmaceuticals, Inc., Raleigh, NC, USA
| | - Daniel P Russo
- Collaborations Pharmaceuticals, Inc., Raleigh, NC, USA
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ, USA
| | | | - Anthony J Hickey
- RTI International, Research Triangle Park, NC, USA
- UNC Catalyst for Rare Diseases, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Alex M Clark
- Molecular Materials Informatics, Inc., Montreal, Quebec, Canada
| |
Collapse
|
265
|
Zolotovskaia MA, Sorokin MI, Emelianova AA, Borisov NM, Kuzmin DV, Borger P, Garazha AV, Buzdin AA. Pathway Based Analysis of Mutation Data Is Efficient for Scoring Target Cancer Drugs. Front Pharmacol 2019; 10:1. [PMID: 30728774 PMCID: PMC6351482 DOI: 10.3389/fphar.2019.00001] [Citation(s) in RCA: 100] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Accepted: 01/03/2019] [Indexed: 12/20/2022] Open
Abstract
Despite the significant achievements in chemotherapy, cancer remains one of the leading causes of death. Target therapy revolutionized this field, but efficiencies of target drugs show dramatic variation among individual patients. Personalization of target therapies remains, therefore, a challenge in oncology. Here, we proposed molecular pathway-based algorithm for scoring of target drugs using high throughput mutation data to personalize their clinical efficacies. This algorithm was validated on 3,800 exome mutation profiles from The Cancer Genome Atlas (TCGA) project for 128 target drugs. The output values termed Mutational Drug Scores (MDS) showed positive correlation with the published drug efficiencies in clinical trials. We also used MDS approach to simulate all known protein coding genes as the putative drug targets. The model used was built on the basis of 18,273 mutation profiles from COSMIC database for eight cancer types. We found that the MDS algorithm-predicted hits frequently coincide with those already used as targets of the existing cancer drugs, but several novel candidates can be considered promising for further developments. Our results evidence that the MDS is applicable to ranking of anticancer drugs and can be applied for the identification of novel molecular targets.
Collapse
Affiliation(s)
- Marianna A Zolotovskaia
- Oncobox Ltd., Moscow, Russia.,Department of Oncology, Hematology and Radiotherapy of Pediatric Faculty, Pirogov Russian National Research Medical University, Moscow, Russia
| | - Maxim I Sorokin
- The Laboratory of Clinical Bioinformatics, IM Sechenov First Moscow State Medical University, Moscow, Russia.,Omicsway Corp., Walnut, CA, United States.,Science-Educational Center Department, M. M. Shemyakin and Yu. A. Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia
| | - Anna A Emelianova
- Science-Educational Center Department, M. M. Shemyakin and Yu. A. Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia
| | - Nikolay M Borisov
- The Laboratory of Clinical Bioinformatics, IM Sechenov First Moscow State Medical University, Moscow, Russia.,Omicsway Corp., Walnut, CA, United States
| | - Denis V Kuzmin
- Science-Educational Center Department, M. M. Shemyakin and Yu. A. Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia
| | - Pieter Borger
- Laboratory of the Swiss Hepato-Pancreato-Biliary, Department of Surgery, Transplantation Center, University Hospital Zurich, Zurich, Switzerland
| | | | - Anton A Buzdin
- Oncobox Ltd., Moscow, Russia.,The Laboratory of Clinical Bioinformatics, IM Sechenov First Moscow State Medical University, Moscow, Russia.,Science-Educational Center Department, M. M. Shemyakin and Yu. A. Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
266
|
Zhavoronkov A, Mamoshina P, Vanhaelen Q, Scheibye-Knudsen M, Moskalev A, Aliper A. Artificial intelligence for aging and longevity research: Recent advances and perspectives. Ageing Res Rev 2019; 49:49-66. [PMID: 30472217 DOI: 10.1016/j.arr.2018.11.003] [Citation(s) in RCA: 102] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2018] [Revised: 11/07/2018] [Accepted: 11/21/2018] [Indexed: 12/14/2022]
Abstract
The applications of modern artificial intelligence (AI) algorithms within the field of aging research offer tremendous opportunities. Aging is an almost universal unifying feature possessed by all living organisms, tissues, and cells. Modern deep learning techniques used to develop age predictors offer new possibilities for formerly incompatible dynamic and static data types. AI biomarkers of aging enable a holistic view of biological processes and allow for novel methods for building causal models-extracting the most important features and identifying biological targets and mechanisms. Recent developments in generative adversarial networks (GANs) and reinforcement learning (RL) permit the generation of diverse synthetic molecular and patient data, identification of novel biological targets, and generation of novel molecular compounds with desired properties and geroprotectors. These novel techniques can be combined into a unified, seamless end-to-end biomarker development, target identification, drug discovery and real world evidence pipeline that may help accelerate and improve pharmaceutical research and development practices. Modern AI is therefore expected to contribute to the credibility and prominence of longevity biotechnology in the healthcare and pharmaceutical industry, and to the convergence of countless areas of research.
Collapse
|
267
|
Tanoli Z, Alam Z, Ianevski A, Wennerberg K, Vähä-Koskela M, Aittokallio T. Interactive visual analysis of drug–target interaction networks using Drug Target Profiler, with applications to precision medicine and drug repurposing. Brief Bioinform 2018; 21:211-220. [PMID: 30566623 DOI: 10.1093/bib/bby119] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Revised: 11/01/2018] [Accepted: 11/19/2018] [Indexed: 12/13/2022] Open
Abstract
Knowledge of the full target space of drugs (or drug-like compounds) provides important insights into the potential therapeutic use of the agents to modulate or avoid their various on- and off-targets in drug discovery and precision medicine. However, there is a lack of consolidated databases and associated data exploration tools that allow for systematic profiling of drug target-binding potencies of both approved and investigational agents using a network-centric approach. We recently initiated a community-driven platform, Drug Target Commons (DTC), which is an open-data crowdsourcing platform designed to improve the management, reproducibility and extended use of compound-target bioactivity data for drug discovery and repurposing, as well as target identification applications. In this work, we demonstrate an integrated use of the rich bioactivity data from DTC and related drug databases using Drug Target Profiler (DTP), an open-source software and web tool for interactive exploration of drug-target interaction networks. DTP was designed for network-centric modeling of mode-of-action of multi-targeting anticancer compounds, especially for precision oncology applications. DTP enables users to construct an interaction network based on integrated bioactivity data across selected chemical compounds and their protein targets, further customizable using various visualization and filtering options, as well as cross-links to several drug and protein databases to provide comprehensive information of the network nodes and interactions. We demonstrate here the operation of the DTP tool and its unique features by several use cases related to both drug discovery and drug repurposing applications, using examples of anticancer drugs with shared target profiles. DTP is freely accessible at http://drugtargetprofiler.fimm.fi/.
Collapse
Affiliation(s)
- Ziaurrehman Tanoli
- Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland
| | - Zaid Alam
- Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland
| | - Aleksandr Ianevski
- Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland
- Helsinki Institute for Information Technology, Aalto University, Espoo, Finland
| | | | - Markus Vähä-Koskela
- Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland
- Helsinki Institute for Information Technology, Aalto University, Espoo, Finland
- Department of Mathematics and Statistics, University of Turku, Turku, Finland
| |
Collapse
|