1
|
Zhong Y, Zheng H, Chen X, Zhao Y, Gao T, Dong H, Luo H, Weng Z. DDI-GCN: Drug-drug interaction prediction via explainable graph convolutional networks. Artif Intell Med 2023; 144:102640. [PMID: 37783544 DOI: 10.1016/j.artmed.2023.102640] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 03/21/2023] [Accepted: 08/20/2023] [Indexed: 10/04/2023]
Abstract
Drug-drug interactions (DDI) may lead to unexpected side effects, which is a growing concern in both academia and industry. Many DDIs have been reported, but the underlying mechanisms are not well understood. Predicting and understanding DDIs can help researchers to improve drug safety and protect patient health. Here, we introduce DDI-GCN, a method that utilizes graph convolutional networks (GCN) to predict DDIs based on chemical structures. We demonstrate that this method achieves state-of-the-art prediction performance on the independent hold-out set. It can also provide visualization of structural features associated with DDIs, which can help us to study the underlying mechanisms. To make it easy and accessible to use, we developed a web server for DDI-GCN, which is freely available at http://wengzq-lab.cn/ddi/.
Collapse
Affiliation(s)
- Yi Zhong
- The Center for Big Data Research in Burns and Trauma, College of Computer and Data Science/College of Software, Fuzhou University, Fujian Province, China
| | - Houbing Zheng
- Department of Plastic Surgery, the First Affiliated Hospital of Fujian Medical University, Fuzhou, China
| | - Xiaoming Chen
- The Center for Big Data Research in Burns and Trauma, College of Computer and Data Science/College of Software, Fuzhou University, Fujian Province, China
| | - Yu Zhao
- The Center for Big Data Research in Burns and Trauma, College of Computer and Data Science/College of Software, Fuzhou University, Fujian Province, China
| | - Tingfang Gao
- College of Biological Science and Engineering, Fuzhou University, Fujian Province, China
| | - Huiqun Dong
- College of Biological Science and Engineering, Fuzhou University, Fujian Province, China
| | - Heng Luo
- The Center for Big Data Research in Burns and Trauma, College of Computer and Data Science/College of Software, Fuzhou University, Fujian Province, China; MetaNovas Biotech Inc., Foster City, CA, USA.
| | - Zuquan Weng
- College of Biological Science and Engineering, Fuzhou University, Fujian Province, China; The Center for Big Data Research in Burns and Trauma, College of Computer and Data Science/College of Software, Fuzhou University, Fujian Province, China; Department of Plastic Surgery, the First Affiliated Hospital of Fujian Medical University, Fuzhou, China.
| |
Collapse
|
2
|
Yang J, Cai Y, Zhao K, Xie H, Chen X. Concepts and applications of chemical fingerprint for hit and lead screening. Drug Discov Today 2022; 27:103356. [PMID: 36113834 DOI: 10.1016/j.drudis.2022.103356] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 07/28/2022] [Accepted: 09/08/2022] [Indexed: 11/22/2022]
Abstract
Molecular fingerprints are used to represent chemical (structural, physicochemical, etc.) properties of large-scale chemical sets in a low computational cost way. They have a prominent role in transforming chemical data sets into consistent input formats (bit strings or numeric values) suitable for in silico approaches. In this review, we summarize and classify common and state-of-the-art fingerprints into eight different types (dictionary based, circular, topological, pharmacophore, protein-ligand interaction, shape based, reinforced, and multi). We also highlight applications of fingerprints in early drug research and development (R&D). Thus, this review provides a guide for the selection of appropriate fingerprints of compounds (or ligand-protein complexes) for use in drug R&D.
Collapse
Affiliation(s)
- Jingbo Yang
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Yiyang Cai
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Kairui Zhao
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Hongbo Xie
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China.
| | - Xiujie Chen
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China.
| |
Collapse
|
3
|
Carracedo-Reboredo P, Liñares-Blanco J, Rodríguez-Fernández N, Cedrón F, Novoa FJ, Carballal A, Maojo V, Pazos A, Fernandez-Lozano C. A review on machine learning approaches and trends in drug discovery. Comput Struct Biotechnol J 2021; 19:4538-4558. [PMID: 34471498 PMCID: PMC8387781 DOI: 10.1016/j.csbj.2021.08.011] [Citation(s) in RCA: 158] [Impact Index Per Article: 39.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 08/06/2021] [Accepted: 08/06/2021] [Indexed: 12/30/2022] Open
Abstract
Drug discovery aims at finding new compounds with specific chemical properties for the treatment of diseases. In the last years, the approach used in this search presents an important component in computer science with the skyrocketing of machine learning techniques due to its democratization. With the objectives set by the Precision Medicine initiative and the new challenges generated, it is necessary to establish robust, standard and reproducible computational methodologies to achieve the objectives set. Currently, predictive models based on Machine Learning have gained great importance in the step prior to preclinical studies. This stage manages to drastically reduce costs and research times in the discovery of new drugs. This review article focuses on how these new methodologies are being used in recent years of research. Analyzing the state of the art in this field will give us an idea of where cheminformatics will be developed in the short term, the limitations it presents and the positive results it has achieved. This review will focus mainly on the methods used to model the molecular data, as well as the biological problems addressed and the Machine Learning algorithms used for drug discovery in recent years.
Collapse
Key Words
- ADMET, Absorption, distribution, metabolism, elimination and toxicity
- ADR, Adverse Drug Reaction
- AI, Artificial Intelligence
- ANN, Artificial Neural Networks
- APFP, Atom Pairs 2d FingerPrint
- AUC, Area under the Curve
- BBB, Blood–Brain barrier
- CDK, Chemical Development Kit
- CNN, Convolutional Neural Networks
- CNS, Central Nervous System
- CPI, Compound-protein interaction
- CV, Cross Validation
- Cheminformatics
- DL, Deep Learning
- DNA, Deoxyribonucleic acid
- Deep Learning
- Drug Discovery
- ECFP, Extended Connectivity Fingerprints
- FDA, Food and Drug Administration
- FNN, Fully Connected Neural Networks
- FP, Fringerprints
- FS, Feature Selection
- GCN, Graph Convolutional Networks
- GEO, Gene Expression Omnibus
- GNN, Graph Neural Networks
- GO, Gene Ontology
- KEGG, Kyoto Encyclopedia of Genes and Genomes
- MACCS, Molecular ACCess System
- MCC, Matthews correlation coefficient
- MD, Molecular Descriptors
- MKL, Multiple Kernel Learning
- ML, Machine Learning
- Machine Learning
- Molecular Descriptors
- NB, Naive Bayes
- OOB, Out of Bag
- PCA, Principal Component Analyisis
- QSAR
- QSAR, Quantitative structure–activity relationship
- RF, Random Forest
- RNA, Ribonucleic Acid
- SMILES, simplified molecular-input line-entry system
- SVM, Support Vector Machines
- TCGA, The Cancer Genome Atlas
- WHO, World Health Organization
- t-SNE, t-Distributed Stochastic Neighbor Embedding
Collapse
Affiliation(s)
- Paula Carracedo-Reboredo
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
| | - Jose Liñares-Blanco
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruna, A Coruña 15071, Spain
| | - Nereida Rodríguez-Fernández
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruna, A Coruña 15071, Spain
- Department of Computer Science and Information Technologies, Faculty of Communication Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
| | - Francisco Cedrón
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
| | - Francisco J. Novoa
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
| | - Adrian Carballal
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruna, A Coruña 15071, Spain
- Department of Computer Science and Information Technologies, Faculty of Communication Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
| | - Victor Maojo
- Biomedical Informatics Group, Artificial Intelligence Department, Polytechnic University of Madrid, Calle de los Ciruelos, Boadilla del Monte, Madrid 28660, Spain
| | - Alejandro Pazos
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruna, A Coruña 15071, Spain
- Grupo de Redes de Neuronas Artificiales y Sistemas Adaptativos. Imagen Médica y Diagnóstico Radiológico (RNASA-IMEDIR), Complexo Hospitalario Universitario de A Coruña (CHUAC), SERGAS, Universidade da Coruña, Instituto de Investigación Biomédica de A Coruña (INIBIC), A Coruña, Spain
| | - Carlos Fernandez-Lozano
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruna, A Coruña 15071, Spain
- Grupo de Redes de Neuronas Artificiales y Sistemas Adaptativos. Imagen Médica y Diagnóstico Radiológico (RNASA-IMEDIR), Complexo Hospitalario Universitario de A Coruña (CHUAC), SERGAS, Universidade da Coruña, Instituto de Investigación Biomédica de A Coruña (INIBIC), A Coruña, Spain
| |
Collapse
|
4
|
Liu Y, Wu Y, Shen X, Xie L. COVID-19 Multi-Targeted Drug Repurposing Using Few-Shot Learning. FRONTIERS IN BIOINFORMATICS 2021; 1:693177. [PMID: 36303751 PMCID: PMC9581066 DOI: 10.3389/fbinf.2021.693177] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2021] [Accepted: 05/25/2021] [Indexed: 11/18/2022] Open
Abstract
The life-threatening disease COVID-19 has inspired significant efforts to discover novel therapeutic agents through repurposing of existing drugs. Although multi-targeted (polypharmacological) therapies are recognized as the most efficient approach to system diseases such as COVID-19, computational multi-targeted compound screening has been limited by the scarcity of high-quality experimental data and difficulties in extracting information from molecules. This study introduces MolGNN, a new deep learning model for molecular property prediction. MolGNN applies a graph neural network to computational learning of chemical molecule embedding. Comparing to state-of-the-art approaches heavily relying on labeled experimental data, our method achieves equivalent or superior prediction performance without manual labels in the pretraining stage, and excellent performance on data with only a few labels. Our results indicate that MolGNN is robust to scarce training data, and hence a powerful few-shot learning tool. MolGNN predicted several multi-targeted molecules against both human Janus kinases and the SARS-CoV-2 main protease, which are preferential targets for drugs aiming, respectively, at alleviating cytokine storm COVID-19 symptoms and suppressing viral replication. We also predicted molecules potentially inhibiting cell death induced by SARS-CoV-2. Several of MolGNN top predictions are supported by existing experimental and clinical evidence, demonstrating the potential value of our method.
Collapse
Affiliation(s)
- Yang Liu
- Department of Computer Science, Hunter College, The City University of New York, New York, NY, United States
| | - You Wu
- The Graduate Center, The City University of New York, New York, NY, United States
| | - Xiaoke Shen
- Department of Computer Science, Hunter College, The City University of New York, New York, NY, United States
| | - Lei Xie
- Department of Computer Science, Hunter College, The City University of New York, New York, NY, United States
- The Graduate Center, The City University of New York, New York, NY, United States
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, Cornell University, New York, NY, United States
| |
Collapse
|
5
|
Pham TH, Qiu Y, Zeng J, Xie L, Zhang P. A deep learning framework for high-throughput mechanism-driven phenotype compound screening and its application to COVID-19 drug repurposing. NAT MACH INTELL 2021; 3:247-257. [PMID: 33796820 PMCID: PMC8009091 DOI: 10.1038/s42256-020-00285-9] [Citation(s) in RCA: 102] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Accepted: 12/15/2020] [Indexed: 12/15/2022]
Abstract
Phenotype-based compound screening has advantages over target-based drug discovery, but is unscalable and lacks understanding of mechanism. Chemical-induced gene expression profile provides a mechanistic signature of phenotypic response. However, the use of such data is limited by their sparseness, unreliability, and relatively low throughput. Few methods can perform phenotype-based de novo chemical compound screening. Here, we propose a mechanism-driven neural network-based method DeepCE, which utilizes graph neural network and multi-head attention mechanism to model chemical substructure-gene and gene-gene associations, for predicting the differential gene expression profile perturbed by de novo chemicals. Moreover, we propose a novel data augmentation method which extracts useful information from unreliable experiments in L1000 dataset. The experimental results show that DeepCE achieves superior performances to state-of-the-art methods. The effectiveness of gene expression profiles generated from DeepCE is further supported by comparing them with observed data for downstream classification tasks. To demonstrate the value of DeepCE, we apply it to drug repurposing of COVID-19, and generate novel lead compounds consistent with clinical evidence. Thus, DeepCE provides a potentially powerful framework for robust predictive modeling by utilizing noisy omics data and screening novel chemicals for the modulation of a systemic response to disease.
Collapse
Affiliation(s)
- Thai-Hoang Pham
- Department of Computer Science and Engineering, The Ohio State University, Columbus, 43210, USA
| | - Yue Qiu
- Ph.D. Program in Biology, The Graduate Center, The City University of New York, New York, 10016, USA
| | - Jucheng Zeng
- Department of Biomedical Informatics, The Ohio State University, Columbus, 43210, USA
| | - Lei Xie
- Ph.D. Program in Biology, The Graduate Center, The City University of New York, New York, 10016, USA
- Department of Computer Science, Hunter College, The City University of New York, New York, 10065, USA
- Ph.D. Program in Computer Science and Biochemistry, The Graduate Center, The City University of New York, New York, 10016, USA
- Helen and Robert Appel Alzheimer’s Disease Research Institute, Feil Family Brain & Mind Research Institute, Weill Cornell Medicine, Cornell University, New York, 10021, USA
| | - Ping Zhang
- Department of Computer Science and Engineering, The Ohio State University, Columbus, 43210, USA
- Department of Biomedical Informatics, The Ohio State University, Columbus, 43210, USA
- Translational Data Analytics institute, The Ohio State University, Columbus, 43210, USA
| |
Collapse
|
6
|
Liu Q, Xie L. TranSynergy: Mechanism-driven interpretable deep neural network for the synergistic prediction and pathway deconvolution of drug combinations. PLoS Comput Biol 2021; 17:e1008653. [PMID: 33577560 PMCID: PMC7906476 DOI: 10.1371/journal.pcbi.1008653] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 02/25/2021] [Accepted: 12/21/2020] [Indexed: 02/08/2023] Open
Abstract
Drug combinations have demonstrated great potential in cancer treatments. They alleviate drug resistance and improve therapeutic efficacy. The fast-growing number of anti-cancer drugs has caused the experimental investigation of all drug combinations to become costly and time-consuming. Computational techniques can improve the efficiency of drug combination screening. Despite recent advances in applying machine learning to synergistic drug combination prediction, several challenges remain. First, the performance of existing methods is suboptimal. There is still much space for improvement. Second, biological knowledge has not been fully incorporated into the model. Finally, many models are lack interpretability, limiting their clinical applications. To address these challenges, we have developed a knowledge-enabled and self-attention transformer boosted deep learning model, TranSynergy, which improves the performance and interpretability of synergistic drug combination prediction. TranSynergy is designed so that the cellular effect of drug actions can be explicitly modeled through cell-line gene dependency, gene-gene interaction, and genome-wide drug-target interaction. A novel Shapley Additive Gene Set Enrichment Analysis (SA-GSEA) method has been developed to deconvolute genes that contribute to the synergistic drug combination and improve model interpretability. Extensive benchmark studies demonstrate that TranSynergy outperforms the state-of-the-art method, suggesting the potential of mechanism-driven machine learning. Novel pathways that are associated with the synergistic combinations are revealed and supported by experimental evidences. They may provide new insights into identifying biomarkers for precision medicine and discovering new anti-cancer therapies. Several new synergistic drug combinations have been predicted with high confidence for ovarian cancer which has few treatment options. The code is available at https://github.com/qiaoliuhub/drug_combination.
Collapse
Affiliation(s)
- Qiao Liu
- Department of Computer Science, Hunter College, The City University of New York, New York, United States of America
| | - Lei Xie
- Department of Computer Science, Hunter College, The City University of New York, New York, United States of America
- Ph.D. Program in Computer Science, The City University of New York, New York, United States of America
- Ph.D. Program in Biochemistry and Biology, The City University of New York, New York, United States of America
- Helen and Robert Appel Alzheimer’s Disease Research Institute, Feil Family Brain & Mind Research Institute, Weill Cornell Medicine, Cornell University, New York, United States of America
- * E-mail:
| |
Collapse
|
7
|
Pham TH, Qiu Y, Zeng J, Xie L, Zhang P. A deep learning framework for high-throughput mechanism-driven phenotype compound screening. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020. [PMID: 32743586 PMCID: PMC7386506 DOI: 10.1101/2020.07.19.211235] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Target-based high-throughput compound screening dominates conventional one-drug-one-gene drug discovery process. However, the readout from the chemical modulation of a single protein is poorly correlated with phenotypic response of organism, leading to high failure rate in drug development. Chemical-induced gene expression profile provides an attractive solution to phenotype-based screening. However, the use of such data is currently limited by their sparseness, unreliability, and relatively low throughput. Several methods have been proposed to impute missing values for gene expression datasets. However, few existing methods can perform de novo chemical compound screening. In this study, we propose a mechanism-driven neural network-based method named DeepCE (Deep Chemical Expression) which utilizes graph convolutional neural network to learn chemical representation and multi-head attention mechanism to model chemical substructure-gene and gene-gene feature associations. In addition, we propose a novel data augmentation method which extracts useful information from unreliable experiments in L1000 dataset. The experimental results show that DeepCE achieves the superior performances not only in de novo chemical setting but also in traditional imputation setting compared to state-of-the-art baselines for the prediction of chemical-induced gene expression. We further verify the effectiveness of gene expression profiles generated from DeepCE by comparing them with gene expression profiles in L1000 dataset for downstream classification tasks including drug-target and disease predictions. To demonstrate the value of DeepCE, we apply it to patient-specific drug repurposing of COVID-19 for the first time, and generate novel lead compounds consistent with clinical evidences. Thus, DeepCE provides a potentially powerful framework for robust predictive modeling by utilizing noisy omics data as well as screening novel chemicals for the modulation of systemic response to disease.
Collapse
Affiliation(s)
- Thai-Hoang Pham
- The Ohio State University, Department of Computer Science and Engineering, Columbus, 43210, USA
| | - Yue Qiu
- The City University of New York, Ph.D. Program in Biology, The Graduate Center, New York, 10016, USA
| | - Jucheng Zeng
- The Ohio State University, Department of Biomedical Informatics, Columbus, 43210, USA
| | - Lei Xie
- The City University of New York, Ph.D. Program in Biology, The Graduate Center, New York, 10016, USA.,Hunter College, The City University of New York, Department of Computer Science, New York, 10065, USA.,The City University of New York, Ph.D. Program in Computer Science and Biochemistry, The Graduate Center, New York, 10016, USA.,Weill Cornell Medicine, Cornell University, Helen and Robert Appel Alzheimer's Disease Research Institute, Feil Family Brain Mind Research Institute, New York, 10021, USA
| | - Ping Zhang
- The Ohio State University, Department of Computer Science and Engineering, Columbus, 43210, USA.,The Ohio State University, Department of Biomedical Informatics, Columbus, 43210, USA
| |
Collapse
|
8
|
Zhao Z, Dai Y, Zhang C, Mathé E, Wei L, Wang K. The International Conference on Intelligent Biology and Medicine (ICIBM) 2019: bioinformatics methods and applications for human diseases. BMC Bioinformatics 2019; 20:676. [PMID: 31861973 PMCID: PMC6924135 DOI: 10.1186/s12859-019-3240-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Between June 9-11, 2019, the International Conference on Intelligent Biology and Medicine (ICIBM 2019) was held in Columbus, Ohio, USA. The conference included 12 scientific sessions, five tutorials or workshops, one poster session, four keynote talks and four eminent scholar talks that covered a wide range of topics in bioinformatics, medical informatics, systems biology and intelligent computing. Here, we describe 13 high quality research articles selected for publishing in BMC Bioinformatics.
Collapse
Affiliation(s)
- Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Yulin Dai
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Chi Zhang
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202 USA
| | - Ewy Mathé
- Department of Biomedical Informatics, The Ohio State University College of Medicine, Columbus, OH 43214 USA
| | - Lai Wei
- Department of Biomedical Informatics, The Ohio State University College of Medicine, Columbus, OH 43214 USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104 USA
| |
Collapse
|