51
|
Liu M, Srivastava G, Ramanujam J, Brylinski M. Augmented drug combination dataset to improve the performance of machine learning models predicting synergistic anticancer effects. Sci Rep 2024; 14:1668. [PMID: 38238448 PMCID: PMC10796434 DOI: 10.1038/s41598-024-51940-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 01/11/2024] [Indexed: 01/22/2024] Open
Abstract
Combination therapy has gained popularity in cancer treatment as it enhances the treatment efficacy and overcomes drug resistance. Although machine learning (ML) techniques have become an indispensable tool for discovering new drug combinations, the data on drug combination therapy currently available may be insufficient to build high-precision models. We developed a data augmentation protocol to unbiasedly scale up the existing anti-cancer drug synergy dataset. Using a new drug similarity metric, we augmented the synergy data by substituting a compound in a drug combination instance with another molecule that exhibits highly similar pharmacological effects. Using this protocol, we were able to upscale the AZ-DREAM Challenges dataset from 8798 to 6,016,697 drug combinations. Comprehensive performance evaluations show that ML models trained on the augmented data consistently achieve higher accuracy than those trained solely on the original dataset. Our data augmentation protocol provides a systematic and unbiased approach to generating more diverse and larger-scale drug combination datasets, enabling the development of more precise and effective ML models. The protocol presented in this study could serve as a foundation for future research aimed at discovering novel and effective drug combinations for cancer treatment.
Collapse
Affiliation(s)
- Mengmeng Liu
- Division of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA, 70803, USA
| | - Gopal Srivastava
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, 70803, USA
| | - J Ramanujam
- Division of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA, 70803, USA
- Center for Computation and Technology, Louisiana State University, Baton Rouge, LA, 70803, USA
| | - Michal Brylinski
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, 70803, USA.
- Center for Computation and Technology, Louisiana State University, Baton Rouge, LA, 70803, USA.
| |
Collapse
|
52
|
Wang Y, Yu X, Gu Y, Li W, Zhu K, Chen L, Tang Y, Liu G. XGraphCDS: An explainable deep learning model for predicting drug sensitivity from gene pathways and chemical structures. Comput Biol Med 2024; 168:107746. [PMID: 38039896 DOI: 10.1016/j.compbiomed.2023.107746] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 10/29/2023] [Accepted: 11/20/2023] [Indexed: 12/03/2023]
Abstract
Cancer is a highly complex disease characterized by genetic and phenotypic heterogeneity among individuals. In the era of precision medicine, understanding the genetic basis of these individual differences is crucial for developing new drugs and achieving personalized treatment. Despite the increasing abundance of cancer genomics data, predicting the relationship between cancer samples and drug sensitivity remains challenging. In this study, we developed an explainable graph neural network framework for predicting cancer drug sensitivity (XGraphCDS) based on comparative learning by integrating cancer gene expression information and drug chemical structure knowledge. Specifically, XGraphCDS consists of a unified heterogeneous network and multiple sub-networks, with molecular graphs representing drugs and gene enrichment scores representing cell lines. Experimental results showed that XGraphCDS consistently outperformed most state-of-the-art baselines (R2 = 0.863, AUC = 0.858). We also constructed a separate in vivo prediction model by using transfer learning strategies with in vitro experimental data and achieved good predictive power (AUC = 0.808). Simultaneously, our framework is interpretable, providing insights into resistance mechanisms alongside accurate predictions. The excellent performance of XGraphCDS highlights its immense potential in aiding the development of selective anti-tumor drugs and personalized dosing strategies in the field of precision medicine.
Collapse
Affiliation(s)
- Yimeng Wang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Xinxin Yu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yaxin Gu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Keyun Zhu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Long Chen
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
| |
Collapse
|
53
|
Branson N, Cutillas PR, Bessant C. Comparison of multiple modalities for drug response prediction with learning curves using neural networks and XGBoost. BIOINFORMATICS ADVANCES 2023; 4:vbad190. [PMID: 38282976 PMCID: PMC10812874 DOI: 10.1093/bioadv/vbad190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 12/19/2023] [Accepted: 12/22/2023] [Indexed: 01/30/2024]
Abstract
Motivation Anti-cancer drug response prediction is a central problem within stratified medicine. Transcriptomic profiles of cancer cell lines are typically used for drug response prediction, but we hypothesize that proteomics or phosphoproteomics might be more suitable as they give a more direct insight into cellular processes. However, there has not yet been a systematic comparison between all three of these datatypes using consistent evaluation criteria. Results Due to the limited number of cell lines with phosphoproteomics profiles we use learning curves, a plot of predictive performance as a function of dataset size, to compare the current performance and predict the future performance of the three omics datasets with more data. We use neural networks and XGBoost and compare them against a simple rule-based benchmark. We show that phosphoproteomics slightly outperforms RNA-seq and proteomics using the 38 cell lines with profiles of all three omics data types. Furthermore, using the 877 cell lines with proteomics and RNA-seq profiles, we show that RNA-seq slightly outperforms proteomics. With the learning curves we predict that the mean squared error using the phosphoproteomics dataset would decrease by ∼ 15 % if a dataset of the same size as the proteomics/transcriptomics was collected. For the cell lines with proteomics and RNA-seq profiles the learning curves reveal that for smaller dataset sizes neural networks outperform XGBoost and vice versa for larger datasets. Furthermore, the trajectory of the XGBoost curve suggests that it will improve faster than the neural networks as more data are collected. Availability and implementation See https://github.com/Nik-BB/Learning-curves-for-DRP for the code used.
Collapse
Affiliation(s)
- Nikhil Branson
- School of Biological and Behavioural Sciences, Queen Mary University of London, London E1 4NS, United Kingdom
- Digital Environment Research Institute, Queen Mary University of London, London E1 1HH, United Kingdom
| | - Pedro R Cutillas
- Centre for Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ, United Kingdom
| | - Conrad Bessant
- School of Biological and Behavioural Sciences, Queen Mary University of London, London E1 4NS, United Kingdom
- Digital Environment Research Institute, Queen Mary University of London, London E1 1HH, United Kingdom
| |
Collapse
|
54
|
Narykov O, Zhu Y, Brettin T, Evrard YA, Partin A, Shukla M, Xia F, Clyde A, Vasanthakumari P, Doroshow JH, Stevens RL. Integration of Computational Docking into Anti-Cancer Drug Response Prediction Models. Cancers (Basel) 2023; 16:50. [PMID: 38201477 PMCID: PMC10777918 DOI: 10.3390/cancers16010050] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 12/01/2023] [Accepted: 12/07/2023] [Indexed: 01/12/2024] Open
Abstract
Cancer is a heterogeneous disease in that tumors of the same histology type can respond differently to a treatment. Anti-cancer drug response prediction is of paramount importance for both drug development and patient treatment design. Although various computational methods and data have been used to develop drug response prediction models, it remains a challenging problem due to the complexities of cancer mechanisms and cancer-drug interactions. To better characterize the interaction between cancer and drugs, we investigate the feasibility of integrating computationally derived features of molecular mechanisms of action into prediction models. Specifically, we add docking scores of drug molecules and target proteins in combination with cancer gene expressions and molecular drug descriptors for building response models. The results demonstrate a marginal improvement in drug response prediction performance when adding docking scores as additional features, through tests on large drug screening data. We discuss the limitations of the current approach and provide the research community with a baseline dataset of the large-scale computational docking for anti-cancer drugs.
Collapse
Affiliation(s)
- Oleksandr Narykov
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Yitan Zhu
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Thomas Brettin
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Yvonne A. Evrard
- Leidos Biomedical Research, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA;
| | - Alexander Partin
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Maulik Shukla
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Fangfang Xia
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - Austin Clyde
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
- Department of Computer Science, The University of Chicago, Chicago, IL 60637, USA
| | - Priyanka Vasanthakumari
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
| | - James H. Doroshow
- Developmental Therapeutics Branch, National Cancer Institute, Bethesda, MD 20892, USA;
| | - Rick L. Stevens
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (T.B.); (A.P.); (M.S.); (F.X.); (P.V.); (R.L.S.)
- Department of Computer Science, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
55
|
Yang Y, Li P. GPDRP: a multimodal framework for drug response prediction with graph transformer. BMC Bioinformatics 2023; 24:484. [PMID: 38105227 PMCID: PMC10726525 DOI: 10.1186/s12859-023-05618-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 12/13/2023] [Indexed: 12/19/2023] Open
Abstract
BACKGROUND In the field of computational personalized medicine, drug response prediction (DRP) is a critical issue. However, existing studies often characterize drugs as strings, a representation that does not align with the natural description of molecules. Additionally, they ignore gene pathway-specific combinatorial implication. RESULTS In this study, we propose drug Graph and gene Pathway based Drug response prediction method (GPDRP), a new multimodal deep learning model for predicting drug responses based on drug molecular graphs and gene pathway activity. In GPDRP, drugs are represented by molecular graphs, while cell lines are described by gene pathway activity scores. The model separately learns these two types of data using Graph Neural Networks (GNN) with Graph Transformers and deep neural networks. Predictions are subsequently made through fully connected layers. CONCLUSIONS Our results indicate that Graph Transformer-based model delivers superior performance. We apply GPDRP on hundreds of cancer cell lines' bulk RNA-sequencing data, and it outperforms some recently published models. Furthermore, the generalizability and applicability of GPDRP are demonstrated through its predictions on unknown drug-cell line pairs and xenografts. This underscores the interpretability achieved by incorporating gene pathways.
Collapse
Affiliation(s)
- Yingke Yang
- School of Mathematics and Statistics, Henan University of Science and Technology, Luoyang, 471000, China
| | - Peiluan Li
- School of Mathematics and Statistics, Henan University of Science and Technology, Luoyang, 471000, China.
- Longmen Laboratory, Luoyang, 471003, China.
| |
Collapse
|
56
|
Park S, Lee H. Molecular data representation based on gene embeddings for cancer drug response prediction. Sci Rep 2023; 13:21898. [PMID: 38081928 PMCID: PMC10713675 DOI: 10.1038/s41598-023-49003-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 12/02/2023] [Indexed: 12/18/2023] Open
Abstract
Cancer drug response prediction is a crucial task in precision medicine, but existing models have limitations in effectively representing molecular profiles of cancer cells. Specifically, when these models represent molecular omics data such as gene expression, they employ a one-hot encoding-based approach, where a fixed gene set is selected for all samples and omics data values are assigned to specific positions in a vector. However, this approach restricts the utilization of embedding-vector-based methods, such as attention-based models, and limits the flexibility of gene selection. To address these issues, our study proposes gene embedding-based fully connected neural networks (GEN) that utilizes gene embedding vectors as input data for cancer drug response prediction. The GEN allows for the use of embedding-vector-based architectures and different gene sets for each sample, providing enhanced flexibility. To validate the efficacy of GEN, we conducted experiments on three cancer drug response datasets. Our results demonstrate that GEN outperforms other recently developed methods in cancer drug prediction tasks and offers improved gene representation capabilities. All source codes are available at https://github.com/DMCB-GIST/GEN/ .
Collapse
Affiliation(s)
- Sejin Park
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, 61005, Republic of Korea
| | - Hyunju Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, 61005, Republic of Korea.
- Artificial Intelligence Graduate School, Gwangju Institute of Science and Technology, Gwangju, 61005, Republic of Korea.
| |
Collapse
|
57
|
Marzi SJ, Schilder BM, Nott A, Frigerio CS, Willaime-Morawek S, Bucholc M, Hanger DP, James C, Lewis PA, Lourida I, Noble W, Rodriguez-Algarra F, Sharif JA, Tsalenchuk M, Winchester LM, Yaman Ü, Yao Z, Ranson JM, Llewellyn DJ. Artificial intelligence for neurodegenerative experimental models. Alzheimers Dement 2023; 19:5970-5987. [PMID: 37768001 DOI: 10.1002/alz.13479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 08/11/2023] [Accepted: 08/14/2023] [Indexed: 09/29/2023]
Abstract
INTRODUCTION Experimental models are essential tools in neurodegenerative disease research. However, the translation of insights and drugs discovered in model systems has proven immensely challenging, marred by high failure rates in human clinical trials. METHODS Here we review the application of artificial intelligence (AI) and machine learning (ML) in experimental medicine for dementia research. RESULTS Considering the specific challenges of reproducibility and translation between other species or model systems and human biology in preclinical dementia research, we highlight best practices and resources that can be leveraged to quantify and evaluate translatability. We then evaluate how AI and ML approaches could be applied to enhance both cross-model reproducibility and translation to human biology, while sustaining biological interpretability. DISCUSSION AI and ML approaches in experimental medicine remain in their infancy. However, they have great potential to strengthen preclinical research and translation if based upon adequate, robust, and reproducible experimental data. HIGHLIGHTS There are increasing applications of AI in experimental medicine. We identified issues in reproducibility, cross-species translation, and data curation in the field. Our review highlights data resources and AI approaches as solutions. Multi-omics analysis with AI offers exciting future possibilities in drug discovery.
Collapse
Affiliation(s)
- Sarah J Marzi
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Brian M Schilder
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Alexi Nott
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | | | | | - Magda Bucholc
- School of Computing, Engineering & Intelligent Systems, Ulster University, Derry, UK
| | - Diane P Hanger
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | | | - Patrick A Lewis
- Royal Veterinary College, London, UK
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London, UK
| | | | - Wendy Noble
- Faculty of Health and Life Sciences, University of Exeter, Exeter, UK
| | | | - Jalil-Ahmad Sharif
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Maria Tsalenchuk
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | | | - Ümran Yaman
- UK Dementia Research Institute at UCL, London, UK
| | | | | | - David J Llewellyn
- University of Exeter Medical School, Exeter, UK
- Alan Turing Institute, London, UK
| |
Collapse
|
58
|
Piochi LF, Preto AJ, Moreira IS. DELFOS-drug efficacy leveraging forked and specialized networks-benchmarking scRNA-seq data in multi-omics-based prediction of cancer sensitivity. Bioinformatics 2023; 39:btad645. [PMID: 37862234 PMCID: PMC10627353 DOI: 10.1093/bioinformatics/btad645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 09/28/2023] [Accepted: 10/19/2023] [Indexed: 10/22/2023] Open
Abstract
MOTIVATION Cancer is currently one of the most notorious diseases, with over 1 million deaths in the European Union alone in 2022. As each tumor can be composed of diverse cell types with distinct genotypes, cancer cells can acquire resistance to different compounds. Moreover, anticancer drugs can display severe side effects, compromising patient well-being. Therefore, novel strategies for identifying the optimal set of compounds to treat each tumor have become an important research topic in recent decades. RESULTS To address this challenge, we developed a novel drug response prediction algorithm called Drug Efficacy Leveraging Forked and Specialized networks (DELFOS). Our model learns from multi-omics data from over 65 cancer cell lines, as well as structural data from over 200 compounds, for the prediction of drug sensitivity. We also evaluated the benefits of incorporating single-cell expression data to predict drug response. DELFOS was validated using datasets with unseen cell lines or drugs and compared with other state-of-the-art algorithms, achieving a high prediction performance on several correlation and error metrics. Overall, DELFOS can effectively leverage multi-omics data for the prediction of drug responses in thousands of drug-cell line pairs. AVAILABILITY AND IMPLEMENTATION The DELFOS pipeline and associated data are available at github.com/MoreiraLAB/delfos.
Collapse
Affiliation(s)
- Luiz Felipe Piochi
- Department of Life Sciences, University of Coimbra, Coimbra 3000-456, Portugal
- CNC—Center for Neuroscience and Cell Biology, Center for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
- CIBB—Center for Innovative Biomedicine and Biotechnology, Coimbra 3004-504, Portugal
| | - António J Preto
- CNC—Center for Neuroscience and Cell Biology, Center for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
- CIBB—Center for Innovative Biomedicine and Biotechnology, Coimbra 3004-504, Portugal
- PhD Programme in Experimental Biology and Biomedicine, Institute for Interdisciplinary Research (IIIUC), University of Coimbra, Coimbra 3030-789, Portugal
| | - Irina S Moreira
- Department of Life Sciences, University of Coimbra, Coimbra 3000-456, Portugal
- CNC—Center for Neuroscience and Cell Biology, Center for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
- CIBB—Center for Innovative Biomedicine and Biotechnology, Coimbra 3004-504, Portugal
| |
Collapse
|
59
|
Liu Y, Tong S, Chen Y. HMM-GDAN: Hybrid multi-view and multi-scale graph duplex-attention networks for drug response prediction in cancer. Neural Netw 2023; 167:213-222. [PMID: 37660670 DOI: 10.1016/j.neunet.2023.08.036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 06/01/2023] [Accepted: 08/20/2023] [Indexed: 09/05/2023]
Abstract
Precision medicine is devoted to discovering personalized therapy for complex and difficult diseases like cancer. Many machine learning approaches have been developed for drug response prediction towards precision medicine. Notwithstanding, genetic profiles based multi-view graph learning schemes have not yet been explored for drug response prediction in previous works. Furthermore, multi-scale latent feature fusion is not considered sufficiently in the existing frameworks of graph neural networks (GNNs). Previous works on drug response prediction mainly depend on sequence data or single-view graph data. In this paper, we propose to construct multi-view graph by means of multi-omics data and STRING protein-protein association data, and develop a new architecture of GNNs for drug response prediction in cancer. Specifically, we propose hybrid multi-view and multi-scale graph duplex-attention networks (HMM-GDAN), in which both multi-view self-attention mechanism and view-level attention mechanism are devised to capture the complementary information of views and emphasize on the importance of each view collaboratively, and rich multi-scale features are constructed and integrated to further form high-level representations for better prediction. Experiments on GDSC2 dataset verify the superiority of the proposed HMM-GDAN when compared with state-of-the-art baselines. The effectiveness of multi-view and multi-scale strategies is demonstrated by the ablation study.
Collapse
Affiliation(s)
- Youfa Liu
- College of Informatics, Huazhong Agricultural University, PR China.
| | - Shufan Tong
- College of Informatics, Huazhong Agricultural University, PR China
| | - Yongyong Chen
- School of Computer Science, Harbin Institute of Technology, (Shenzhen), PR China
| |
Collapse
|
60
|
Zhao H, Zhang X, Zhao Q, Li Y, Wang J. MSDRP: a deep learning model based on multisource data for predicting drug response. Bioinformatics 2023; 39:btad514. [PMID: 37606993 PMCID: PMC10474952 DOI: 10.1093/bioinformatics/btad514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 07/30/2023] [Accepted: 08/21/2023] [Indexed: 08/23/2023] Open
Abstract
MOTIVATION Cancer heterogeneity drastically affects cancer therapeutic outcomes. Predicting drug response in vitro is expected to help formulate personalized therapy regimens. In recent years, several computational models based on machine learning and deep learning have been proposed to predict drug response in vitro. However, most of these methods capture drug features based on a single drug description (e.g. drug structure), without considering the relationships between drugs and biological entities (e.g. target, diseases, and side effects). Moreover, most of these methods collect features separately for drugs and cell lines but fail to consider the pairwise interactions between drugs and cell lines. RESULTS In this paper, we propose a deep learning framework, named MSDRP for drug response prediction. MSDRP uses an interaction module to capture interactions between drugs and cell lines, and integrates multiple associations/interactions between drugs and biological entities through similarity network fusion algorithms, outperforming some state-of-the-art models in all performance measures for all experiments. The experimental results of de novo test and independent test demonstrate the excellent performance of our model for new drugs. Furthermore, several case studies illustrate the rationality for using feature vectors derived from drug similarity matrices from multisource data to represent drugs and the interpretability of our model. AVAILABILITY AND IMPLEMENTATION The codes of MSDRP are available at https://github.com/xyzhang-10/MSDRP.
Collapse
Affiliation(s)
- Haochen Zhao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Xiaoyu Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Qichang Zhao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529-0001, United States
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
61
|
Xiao S, Lin H, Wang C, Wang S, Rajapakse JC. Graph Neural Networks With Multiple Prior Knowledge for Multi-Omics Data Analysis. IEEE J Biomed Health Inform 2023; 27:4591-4600. [PMID: 37307177 DOI: 10.1109/jbhi.2023.3284794] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
With the development of biotechnology, a large amount of multi-omics data have been collected for precision medicine. There exists multiple graph-based prior biological knowledge about omics data, such as gene-gene interaction networks. Recently, there has been an increasing interest in introducing graph neural networks (GNNs) into multi-omics learning. However, existing methods have not fully exploited these graphical priors since none have been able to integrate knowledge from multiple sources simultaneously. To solve this problem, we propose a multi-omics data analysis framework by incorporating multiple prior knowledge into graph neural network (MPK-GNN). To the best of our knowledge, this is the first attempt to introduce multiple prior graphs into multi-omics data analysis. Specifically, the proposed method contains four parts: (1) a feature-level learning module to aggregate information from prior graphs; (2) a projection module to maximize the agreement among prior networks by optimizing a contrastive loss; (3) a sample-level module to learn a global representation from input multi-omics features; (4) a task-specific module to flexibly extend MPK-GNN for various downstream multi-omics analysis tasks. Finally, we verify the effectiveness of the proposed multi-omics learning algorithm on the cancer molecular subtype classification task. Experimental results show that MPK-GNN outperforms other state-of-the-art algorithms, including multi-view learning methods and multi-omics integrative approaches.
Collapse
|
62
|
Wang C, Zhang M, Zhao J, Li B, Xiao X, Zhang Y. The prediction of drug sensitivity by multi-omics fusion reveals the heterogeneity of drug response in pan-cancer. Comput Biol Med 2023; 163:107220. [PMID: 37406589 DOI: 10.1016/j.compbiomed.2023.107220] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 06/14/2023] [Accepted: 06/30/2023] [Indexed: 07/07/2023]
Abstract
Cancer drug response prediction based on genomic information plays a crucial role in modern pharmacogenomics, enabling individualized therapy. Given the expensive and complexity of biological experiments, computational methods serve as effective tools in predicting cancer drug sensitivity. In this study, we proposed a novel method called Multi-Omics Integrated Collective Variational Autoencoders (MOICVAE), which leverages integrated omics knowledge, including genomic and transcriptomic data, to fill in missing cancer-drug associations and enhance drug sensitivity prediction. Our method employs an encoder-decoder network to learn latent feature representations from cell lines. These learned feature vectors are then fed into a collective variational autoencoder network to train an association matrix. We evaluated MOICVAE on the GDSC and CCLE benchmark datasets using 10-fold cross-validation and achieved impressive AUCs of 0.856 and 0.808, respectively, outperforming state-of-the-art methods. Furthermore, on the TCGA dataset, consisting of 25 drugs across 7 cancer types, MOICVAE exhibited an average AUC of 0.91 in predicting drug sensitivity. Additionally, significant differences were observed in survival, tumor inflammatory assessment, and tumor microenvironment between the predicted drug-sensitive and drug-resistant groups. These results are consistent with predictions made on the METABRIC dataset. Moreover, we discovered that fusing omics data based on mRNA and CNV (copy number variations) yielded superior results in drug sensitivity prediction. MOICVAE not only achieved higher accuracy in drug sensitivity prediction but also provided additional value for combining immunotherapy with chemotherapy, offering patients with more precise treatment options. The code and dataset for MOICVAE are freely available at https://github.com/wanggnoc/MOICVAE.
Collapse
Affiliation(s)
- Cong Wang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Mengyan Zhang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Jiyun Zhao
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Bin Li
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Xingjun Xiao
- Department of Neurology, The Second Affiliated Hospital of Harbin Medical University, 246 Xuefu Road, Harbin, 150086, People's Republic of China.
| | - Yan Zhang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150001, China; College of Pathology, Qiqihar Medical University, Qiqihar, 161042, China.
| |
Collapse
|
63
|
Jiang X, Li Z, Mehmood A, Wang H, Wang Q, Chu Y, Mao X, Zhao J, Jiang M, Zhao B, Lin G, Wang E, Wei D. A Self-attention Graph Convolutional Network for Precision Multi-tumor Early Diagnostics with DNA Methylation Data. Interdiscip Sci 2023; 15:405-418. [PMID: 37247186 DOI: 10.1007/s12539-023-00563-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 03/16/2023] [Accepted: 03/20/2023] [Indexed: 05/30/2023]
Abstract
DNA methylation-based precision tumor early diagnostics is emerging as state-of-the-art technology that could capture early cancer signs 3 ~ 5 years in advance, even for clinically homogenous groups. Presently, the sensitivity of early detection for many tumors is ~ 30%, which needs significant improvement. Nevertheless, based on the genome-wide DNA methylation data, one could comprehensively characterize tumors' entire molecular genetic landscape and their subtle differences. Therefore, novel high-performance methods must be modeled by considering unbiased information using excessively available DNA methylation data. To fill this gap, we have designed a computational model involving a self-attention graph convolutional network and multi-class classification support vector machine to identify the 11 most common cancers using DNA methylation data. The self-attention graph convolutional network automatically learns key methylation sites in a data-driven way. Then, multi-tumor early diagnostics is realized by training a multi-class classification support vector machine based on the selected methylation sites. We evaluated our model's performance through several data sets of experiments, and our results demonstrate the effectiveness of the selected key methylation sites, which are highly relevant for blood diagnosis. The pipeline of the self-attention graph convolutional network based computational framework.
Collapse
Affiliation(s)
- Xue Jiang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Zhiqi Li
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Aamir Mehmood
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Heng Wang
- International School of Cosmetics, School of Perfume and Aroma Technology, Shanghai Institute of Technology, Shanghai, China
| | - Qiankun Wang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Yanyi Chu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Xueying Mao
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Jing Zhao
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Mingming Jiang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Bowen Zhao
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Guanning Lin
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Edwin Wang
- Department of Biochemistry and Molecular Biology, Medical Genetics, and Oncology, Cumming School of Medicine, University of Calgary, Calgary, Canada.
| | - Dongqing Wei
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.
| |
Collapse
|
64
|
Connell W, Garcia K, Goodarzi H, Keiser MJ. Learning chemical sensitivity reveals mechanisms of cellular response. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.26.554851. [PMID: 37693536 PMCID: PMC10491110 DOI: 10.1101/2023.08.26.554851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Chemical probes interrogate disease mechanisms at the molecular level by linking genetic changes to observable traits. However, comprehensive chemical screens in diverse biological models are impractical. To address this challenge, we developed ChemProbe, a model that predicts cellular sensitivity to hundreds of molecular probes and drugs by learning to combine transcriptomes and chemical structures. Using ChemProbe, we inferred the chemical sensitivity of cancer cell lines and tumor samples and analyzed how the model makes predictions. We retrospectively evaluated drug response predictions for precision breast cancer treatment and prospectively validated chemical sensitivity predictions in new cellular models, including a genetically modified cell line. Our model interpretation analysis identified transcriptome features reflecting compound targets and protein network modules, identifying genes that drive ferroptosis. ChemProbe is an interpretable in silico screening tool that allows researchers to measure cellular response to diverse compounds, facilitating research into molecular mechanisms of chemical sensitivity.
Collapse
Affiliation(s)
- William Connell
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, USA
- Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Kristle Garcia
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Department of Urology, University of California, San Francisco, San Francisco, CA, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
| | - Hani Goodarzi
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Department of Urology, University of California, San Francisco, San Francisco, CA, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
| | - Michael J. Keiser
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, USA
- Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| |
Collapse
|
65
|
Liu X, Zhang W. A subcomponent-guided deep learning method for interpretable cancer drug response prediction. PLoS Comput Biol 2023; 19:e1011382. [PMID: 37603576 PMCID: PMC10470940 DOI: 10.1371/journal.pcbi.1011382] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 08/31/2023] [Accepted: 07/24/2023] [Indexed: 08/23/2023] Open
Abstract
Accurate prediction of cancer drug response (CDR) is a longstanding challenge in modern oncology that underpins personalized treatment. Current computational methods implement CDR prediction by modeling responses between entire drugs and cell lines, without the consideration that response outcomes may primarily attribute to a few finer-level 'subcomponents', such as privileged substructures of the drug or gene signatures of the cancer cell, thus producing predictions that are hard to explain. Herein, we present SubCDR, a subcomponent-guided deep learning method for interpretable CDR prediction, to recognize the most relevant subcomponents driving response outcomes. Technically, SubCDR is built upon a line of deep neural networks that enables a set of functional subcomponents to be extracted from each drug and cell line profile, and breaks the CDR prediction down to identifying pairwise interactions between subcomponents. Such a subcomponent interaction form can offer a traceable path to explicitly indicate which subcomponents contribute more to the response outcome. We verify the superiority of SubCDR over state-of-the-art CDR prediction methods through extensive computational experiments on the GDSC dataset. Crucially, we found many predicted cases that demonstrate the strength of SubCDR in finding the key subcomponents driving responses and exploiting these subcomponents to discover new therapeutic drugs. These results suggest that SubCDR will be highly useful for biomedical researchers, particularly in anti-cancer drug design.
Collapse
Affiliation(s)
- Xuan Liu
- College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
66
|
Zhan Y, Guo J, Philip Chen CL, Meng XB. iBT-Net: an incremental broad transformer network for cancer drug response prediction. Brief Bioinform 2023:bbad256. [PMID: 37429577 DOI: 10.1093/bib/bbad256] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 05/30/2023] [Accepted: 06/15/2023] [Indexed: 07/12/2023] Open
Abstract
In modern precision medicine, it is an important research topic to predict cancer drug response. Due to incomplete chemical structures and complex gene features, however, it is an ongoing work to design efficient data-driven methods for predicting drug response. Moreover, since the clinical data cannot be easily obtained all at once, the data-driven methods may require relearning when new data are available, resulting in increased time consumption and cost. To address these issues, an incremental broad Transformer network (iBT-Net) is proposed for cancer drug response prediction. Different from the gene expression features learning from cancer cell lines, structural features are further extracted from drugs by Transformer. Broad learning system is then designed to integrate the learned gene features and structural features of drugs to predict the response. With the capability of incremental learning, the proposed method can further use new data to improve its prediction performance without retraining totally. Experiments and comparison studies demonstrate the effectiveness and superiority of iBT-Net under different experimental configurations and continuous data learning.
Collapse
Affiliation(s)
- Yongkang Zhan
- School of Computer Science & Engineering,South China University of Technology, 510006, China
| | - Jifeng Guo
- School of Computer Science & Engineering,South China University of Technology, 510006, China
| | - C L Philip Chen
- School of Computer Science & Engineering,South China University of Technology, 510006, China
- Brain and Affective Cognitive Research Center, Pazhou Lab, 510335, China
| | - Xian-Bing Meng
- School of Electromechanical Engineering, Guangdong University of Technology, 510006, China
| |
Collapse
|
67
|
Wang Y, Gao YL, Wang J, Li F, Liu JX. MSGCA: Drug-Disease Associations Prediction Based on Multi-Similarities Graph Convolutional Autoencoder. IEEE J Biomed Health Inform 2023; 27:3686-3694. [PMID: 37163398 DOI: 10.1109/jbhi.2023.3272154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Identifying drug-disease associations (DDAs) is critical to the development of drugs. Traditional methods to determine DDAs are expensive and inefficient. Therefore, it is imperative to develop more accurate and effective methods for DDAs prediction. Most current DDAs prediction methods utilize original DDAs matrix directly. However, the original DDAs matrix is sparse, which greatly affects the prediction consequences. Hence, a prediction method based on multi-similarities graph convolutional autoencoder (MSGCA) is proposed for DDAs prediction. First, MSGCA integrates multiple drug similarities and disease similarities using centered kernel alignment-based multiple kernel learning (CKA-MKL) algorithm to form new drug similarity and disease similarity, respectively. Second, the new drug and disease similarities are improved by linear neighborhood, and the DDAs matrix is reconstructed by weighted K nearest neighbor profiles. Next, the reconstructed DDAs and the improved drug and disease similarities are integrated into a heterogeneous network. Finally, the graph convolutional autoencoder with attention mechanism is utilized to predict DDAs. Compared with extant methods, MSGCA shows superior results on three datasets. Furthermore, case studies further demonstrate the reliability of MSGCA.
Collapse
|
68
|
Huang Z, Zhang P, Deng L. DeepCoVDR: deep transfer learning with graph transformer and cross-attention for predicting COVID-19 drug response. Bioinformatics 2023; 39:i475-i483. [PMID: 37387168 DOI: 10.1093/bioinformatics/btad244] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION The coronavirus disease 2019 (COVID-19) remains a global public health emergency. Although people, especially those with underlying health conditions, could benefit from several approved COVID-19 therapeutics, the development of effective antiviral COVID-19 drugs is still a very urgent problem. Accurate and robust drug response prediction to a new chemical compound is critical for discovering safe and effective COVID-19 therapeutics. RESULTS In this study, we propose DeepCoVDR, a novel COVID-19 drug response prediction method based on deep transfer learning with graph transformer and cross-attention. First, we adopt a graph transformer and feed-forward neural network to mine the drug and cell line information. Then, we use a cross-attention module that calculates the interaction between the drug and cell line. After that, DeepCoVDR combines drug and cell line representation and their interaction features to predict drug response. To solve the problem of SARS-CoV-2 data scarcity, we apply transfer learning and use the SARS-CoV-2 dataset to fine-tune the model pretrained on the cancer dataset. The experiments of regression and classification show that DeepCoVDR outperforms baseline methods. We also evaluate DeepCoVDR on the cancer dataset, and the results indicate that our approach has high performance compared with other state-of-the-art methods. Moreover, we use DeepCoVDR to predict COVID-19 drugs from FDA-approved drugs and demonstrate the effectiveness of DeepCoVDR in identifying novel COVID-19 drugs. AVAILABILITY AND IMPLEMENTATION https://github.com/Hhhzj-7/DeepCoVDR.
Collapse
Affiliation(s)
- Zhijian Huang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Pan Zhang
- Hunan Provincial Key Laboratory of Clinical Epidemiology, Xiangya School of Public Health, Central South University, Changsha 410083, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
69
|
Shahzad M, Tahir MA, Alhussein M, Mobin A, Shams Malick RA, Anwar MS. NeuPD-A Neural Network-Based Approach to Predict Antineoplastic Drug Response. Diagnostics (Basel) 2023; 13:2043. [PMID: 37370938 DOI: 10.3390/diagnostics13122043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 06/01/2023] [Accepted: 06/05/2023] [Indexed: 06/29/2023] Open
Abstract
With the beginning of the high-throughput screening, in silico-based drug response analysis has opened lots of research avenues in the field of personalized medicine. For a decade, many different predicting techniques have been recommended for the antineoplastic (anti-cancer) drug response, but still, there is a need for improvements in drug sensitivity prediction. The intent of this research study is to propose a framework, namely NeuPD, to validate the potential anti-cancer drugs against a panel of cancer cell lines in publicly available datasets. The datasets used in this work are Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE). As not all drugs are effective on cancer cell lines, we have worked on 10 essential drugs from the GDSC dataset that have achieved the best modeling results in previous studies. We also extracted 1610 essential oncogene expressions from 983 cell lines from the same dataset. Whereas, from the CCLE dataset, 16,383 gene expressions from 1037 cell lines and 24 drugs have been used in our experiments. For dimensionality reduction, Pearson correlation is applied to best fit the model. We integrate the genomic features of cell lines and drugs' fingerprints to fit the neural network model. For evaluation of the proposed NeuPD framework, we have used repeated K-fold cross-validation with 5 times repeats where K = 10 to demonstrate the performance in terms of root mean square error (RMSE) and coefficient determination (R2). The results obtained on the GDSC dataset that were measured using these cost functions show that our proposed NeuPD framework has outperformed existing approaches with an RMSE of 0.490 and R2 of 0.929.
Collapse
Affiliation(s)
- Muhammad Shahzad
- FAST School of Computing, National University of Computer and Emerging Sciences (NUCES-FAST), Karachi 75030, Pakistan
| | - Muhammad Atif Tahir
- FAST School of Computing, National University of Computer and Emerging Sciences (NUCES-FAST), Karachi 75030, Pakistan
| | - Musaed Alhussein
- Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, P.O. Box 51178, Riyadh 11543, Saudi Arabia
| | - Ansharah Mobin
- FAST School of Computing, National University of Computer and Emerging Sciences (NUCES-FAST), Karachi 75030, Pakistan
| | - Rauf Ahmed Shams Malick
- FAST School of Computing, National University of Computer and Emerging Sciences (NUCES-FAST), Karachi 75030, Pakistan
| | - Muhammad Shahid Anwar
- Department of AI and Software, Gachon University, Seongnam-si 13120, Republic of Korea
| |
Collapse
|
70
|
Thafar MA, Albaradei S, Uludag M, Alshahrani M, Gojobori T, Essack M, Gao X. OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features. Front Genet 2023; 14:1139626. [PMID: 37091791 PMCID: PMC10117673 DOI: 10.3389/fgene.2023.1139626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Accepted: 03/24/2023] [Indexed: 04/08/2023] Open
Abstract
Late-stage drug development failures are usually a consequence of ineffective targets. Thus, proper target identification is needed, which may be possible using computational approaches. The reason being, effective targets have disease-relevant biological functions, and omics data unveil the proteins involved in these functions. Also, properties that favor the existence of binding between drug and target are deducible from the protein’s amino acid sequence. In this work, we developed OncoRTT, a deep learning (DL)-based method for predicting novel therapeutic targets. OncoRTT is designed to reduce suboptimal target selection by identifying novel targets based on features of known effective targets using DL approaches. First, we created the “OncologyTT” datasets, which include genes/proteins associated with ten prevalent cancer types. Then, we generated three sets of features for all genes: omics features, the proteins’ amino-acid sequence BERT embeddings, and the integrated features to train and test the DL classifiers separately. The models achieved high prediction performances in terms of area under the curve (AUC), i.e., AUC greater than 0.88 for all cancer types, with a maximum of 0.95 for leukemia. Also, OncoRTT outperformed the state-of-the-art method using their data in five out of seven cancer types commonly assessed by both methods. Furthermore, OncoRTT predicts novel therapeutic targets using new test data related to the seven cancer types. We further corroborated these results with other validation evidence using the Open Targets Platform and a case study focused on the top-10 predicted therapeutic targets for lung cancer.
Collapse
Affiliation(s)
- Maha A. Thafar
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- College of Computers and Information Technology, Computer Science Department, Taif University, Taif, Saudi Arabia
| | - Somayah Albaradei
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Mahmut Uludag
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Mona Alshahrani
- National Center for Artificial Intelligence (NCAI), Saudi Data and Artificial Intelligence Authority (SDAIA), Riyadh, Saudi Arabia
| | - Takashi Gojobori
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Magbubah Essack
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- *Correspondence: Xin Gao, ; Magbubah Essack,
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- *Correspondence: Xin Gao, ; Magbubah Essack,
| |
Collapse
|
71
|
Zhang P, Xia C, Shen HB. High-accuracy protein model quality assessment using attention graph neural networks. Brief Bioinform 2023; 24:7025462. [PMID: 36736352 DOI: 10.1093/bib/bbac614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 11/23/2022] [Accepted: 12/12/2022] [Indexed: 02/05/2023] Open
Abstract
Great improvement has been brought to protein tertiary structure prediction through deep learning. It is important but very challenging to accurately rank and score decoy structures predicted by different models. CASP14 results show that existing quality assessment (QA) approaches lag behind the development of protein structure prediction methods, where almost all existing QA models degrade in accuracy when the target is a decoy of high quality. How to give an accurate assessment to high-accuracy decoys is particularly useful with the available of accurate structure prediction methods. Here we propose a fast and effective single-model QA method, QATEN, which can evaluate decoys only by their topological characteristics and atomic types. Our model uses graph neural networks and attention mechanisms to evaluate global and amino acid level scores, and uses specific loss functions to constrain the network to focus more on high-precision decoys and protein domains. On the CASP14 evaluation decoys, QATEN performs better than other QA models under all correlation coefficients when targeting average LDDT. QATEN shows promising performance when considering only high-accuracy decoys. Compared to the embedded evaluation modules of predicted ${C}_{\alpha^{-}} RMSD$ (pRMSD) in RosettaFold and predicted LDDT (pLDDT) in AlphaFold2, QATEN is complementary and capable of achieving better evaluation on some decoy structures generated by AlphaFold2 and RosettaFold. These results suggest that the new QATEN approach can be used as a reliable independent assessment algorithm for high-accuracy protein structure decoys.
Collapse
Affiliation(s)
- Peidong Zhang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China
| | - Chunqiu Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China
| |
Collapse
|
72
|
Partin A, Brettin T, Zhu Y, Dolezal JM, Kochanny S, Pearson AT, Shukla M, Evrard YA, Doroshow JH, Stevens RL. Data augmentation and multimodal learning for predicting drug response in patient-derived xenografts from gene expressions and histology images. Front Med (Lausanne) 2023; 10:1058919. [PMID: 36960342 PMCID: PMC10027779 DOI: 10.3389/fmed.2023.1058919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 02/10/2023] [Indexed: 03/09/2023] Open
Abstract
Patient-derived xenografts (PDXs) are an appealing platform for preclinical drug studies. A primary challenge in modeling drug response prediction (DRP) with PDXs and neural networks (NNs) is the limited number of drug response samples. We investigate multimodal neural network (MM-Net) and data augmentation for DRP in PDXs. The MM-Net learns to predict response using drug descriptors, gene expressions (GE), and histology whole-slide images (WSIs). We explore whether combining WSIs with GE improves predictions as compared with models that use GE alone. We propose two data augmentation methods which allow us training multimodal and unimodal NNs without changing architectures with a single larger dataset: 1) combine single-drug and drug-pair treatments by homogenizing drug representations, and 2) augment drug-pairs which doubles the sample size of all drug-pair samples. Unimodal NNs which use GE are compared to assess the contribution of data augmentation. The NN that uses the original and the augmented drug-pair treatments as well as single-drug treatments outperforms NNs that ignore either the augmented drug-pairs or the single-drug treatments. In assessing the multimodal learning based on the MCC metric, MM-Net outperforms all the baselines. Our results show that data augmentation and integration of histology images with GE can improve prediction performance of drug response in PDXs.
Collapse
Affiliation(s)
- Alexander Partin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Thomas Brettin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Yitan Zhu
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - James M. Dolezal
- Section of Hematology/Oncology, Department of Medicine, University of Chicago Medical Center, Chicago, IL, United States
| | - Sara Kochanny
- Section of Hematology/Oncology, Department of Medicine, University of Chicago Medical Center, Chicago, IL, United States
| | - Alexander T. Pearson
- Section of Hematology/Oncology, Department of Medicine, University of Chicago Medical Center, Chicago, IL, United States
| | - Maulik Shukla
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Yvonne A. Evrard
- Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc., Frederick, MD, United States
| | - James H. Doroshow
- Division of Cancer Therapeutics and Diagnosis, National Cancer Institute, Bethesda, MD, United States
| | - Rick L. Stevens
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
- Department of Computer Science, The University of Chicago, Chicago, IL, United States
| |
Collapse
|
73
|
Peng W, Chen T, Liu H, Dai W, Yu N, Lan W. Improving drug response prediction based on two-space graph convolution. Comput Biol Med 2023; 158:106859. [PMID: 37023539 DOI: 10.1016/j.compbiomed.2023.106859] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Revised: 02/22/2023] [Accepted: 03/30/2023] [Indexed: 04/03/2023]
Abstract
Patients with the same cancer types may present different genomic features and therefore have different drug sensitivities. Accordingly, correctly predicting patients' responses to the drugs can guide treatment decisions and improve the outcome of cancer patients. Existing computational methods leverage the graph convolution network model to aggregate features of different types of nodes in the heterogeneous network. They most fail to consider the similarity between homogeneous nodes. To this end, we propose an algorithm based on two-space graph convolutional neural networks, TSGCNN, to predict the response of anticancer drugs. TSGCNN first constructs the cell line feature space and the drug feature space and separately performs the graph convolution operation on the feature spaces to diffuse similarity information among homogeneous nodes. After that, we generate a heterogeneous network based on the known cell line and drug relationship and perform graph convolution operations on the heterogeneous network to collect the features of different types of nodes. Subsequently, the algorithm produces the final feature representations for cell lines and drugs by adding their self features, the feature space representations, and the heterogeneous space representations. Finally, we leverage the linear correlation coefficient decoder to reconstruct the cell line-drug correlation matrix for drug response prediction based on the final representations. We tested our model on the Cancer Drug Sensitivity Data (GDSC) and Cancer Cell Line Encyclopedia (CCLE) databases. The results indicate that TSGCNN shows excellent performance drug response prediction compared with other eight state-of-the-art methods.
Collapse
Affiliation(s)
- Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050, China.
| | - Tielin Chen
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650050, China
| | - Hancheng Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650050, China
| | - Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050, China
| | - Ning Yu
- State University of New York, The College at Brockport, Department of Computing Sciences, 350 New Campus Drive, Brockport, NY 14422, United States of America
| | - Wei Lan
- School of Computer Electronic and Information, Guangxi University, Nanning, Guangxi 530004, China
| |
Collapse
|
74
|
Badwan BA, Liaropoulos G, Kyrodimos E, Skaltsas D, Tsirigos A, Gorgoulis VG. Machine learning approaches to predict drug efficacy and toxicity in oncology. CELL REPORTS METHODS 2023; 3:100413. [PMID: 36936080 PMCID: PMC10014302 DOI: 10.1016/j.crmeth.2023.100413] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
In recent years, there has been a surge of interest in using machine learning algorithms (MLAs) in oncology, particularly for biomedical applications such as drug discovery, drug repurposing, diagnostics, clinical trial design, and pharmaceutical production. MLAs have the potential to provide valuable insights and predictions in these areas by representing both the disease state and the therapeutic agents used to treat it. To fully utilize the capabilities of MLAs in oncology, it is important to understand the fundamental concepts underlying these algorithms and how they can be applied to assess the efficacy and toxicity of therapeutics. In this perspective, we lay out approaches to represent both the disease state and the therapeutic agents used by MLAs to derive novel insights and make relevant predictions.
Collapse
Affiliation(s)
| | | | - Efthymios Kyrodimos
- First ENT Department, Hippocration Hospital, National Kapodistrian University of Athens, Athens, GR 11527, Greece
| | | | - Aristotelis Tsirigos
- Department of Medicine, New York University School of Medicine, New York, NY 10016, USA
- Department of Pathology, New York University School of Medicine, New York, NY 10016, USA
| | - Vassilis G. Gorgoulis
- Intelligencia Inc, New York, NY 10014, USA
- Department of Histology and Embryology, Faculty of Medicine, School of Health Sciences, National Kapodistrian University of Athens, Athens 11527, Greece
- Ninewells Hospital and Medical School, University of Dundee, Dundee DD1 9SY, UK
- Biomedical Research Foundation, Academy of Athens, Athens 11527, Greece
- Molecular and Clinical Cancer Sciences, Manchester Cancer Research Centre, Manchester Academic Health Sciences Centre, University of Manchester, Manchester M20 4GJ, UK
| |
Collapse
|
75
|
Partin A, Brettin TS, Zhu Y, Narykov O, Clyde A, Overbeek J, Stevens RL. Deep learning methods for drug response prediction in cancer: Predominant and emerging trends. Front Med (Lausanne) 2023; 10:1086097. [PMID: 36873878 PMCID: PMC9975164 DOI: 10.3389/fmed.2023.1086097] [Citation(s) in RCA: 39] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 01/23/2023] [Indexed: 02/17/2023] Open
Abstract
Cancer claims millions of lives yearly worldwide. While many therapies have been made available in recent years, by in large cancer remains unsolved. Exploiting computational predictive models to study and treat cancer holds great promise in improving drug development and personalized design of treatment plans, ultimately suppressing tumors, alleviating suffering, and prolonging lives of patients. A wave of recent papers demonstrates promising results in predicting cancer response to drug treatments while utilizing deep learning methods. These papers investigate diverse data representations, neural network architectures, learning methodologies, and evaluations schemes. However, deciphering promising predominant and emerging trends is difficult due to the variety of explored methods and lack of standardized framework for comparing drug response prediction models. To obtain a comprehensive landscape of deep learning methods, we conducted an extensive search and analysis of deep learning models that predict the response to single drug treatments. A total of 61 deep learning-based models have been curated, and summary plots were generated. Based on the analysis, observable patterns and prevalence of methods have been revealed. This review allows to better understand the current state of the field and identify major challenges and promising solution paths.
Collapse
Affiliation(s)
- Alexander Partin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Thomas S. Brettin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Yitan Zhu
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Oleksandr Narykov
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Austin Clyde
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Jamie Overbeek
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Rick L. Stevens
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
- Department of Computer Science, The University of Chicago, Chicago, IL, United States
| |
Collapse
|
76
|
Wang H, Dai C, Wen Y, Wang X, Liu W, He S, Bo X, Peng S. GADRP: graph convolutional networks and autoencoders for cancer drug response prediction. Brief Bioinform 2023; 24:6865039. [PMID: 36460622 DOI: 10.1093/bib/bbac501] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 10/19/2022] [Accepted: 10/22/2022] [Indexed: 12/04/2022] Open
Abstract
Drug response prediction in cancer cell lines is of great significance in personalized medicine. In this study, we propose GADRP, a cancer drug response prediction model based on graph convolutional networks (GCNs) and autoencoders (AEs). We first use a stacked deep AE to extract low-dimensional representations from cell line features, and then construct a sparse drug cell line pair (DCP) network incorporating drug, cell line, and DCP similarity information. Later, initial residual and layer attention-based GCN (ILGCN) that can alleviate over-smoothing problem is utilized to learn DCP features. And finally, fully connected network is employed to make prediction. Benchmarking results demonstrate that GADRP can significantly improve prediction performance on all metrics compared with baselines on five datasets. Particularly, experiments of predictions of unknown DCP responses, drug-cancer tissue associations, and drug-pathway associations illustrate the predictive power of GADRP. All results highlight the effectiveness of GADRP in predicting drug responses, and its potential value in guiding anti-cancer drug selection.
Collapse
Affiliation(s)
- Hong Wang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Chong Dai
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China.,Department of Bioinformatics, Beijing Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Yuqi Wen
- Department of Bioinformatics, Beijing Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaoqi Wang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Wenjuan Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Song He
- Department of Bioinformatics, Beijing Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaochen Bo
- Department of Bioinformatics, Beijing Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China.,The State Key Laboratory of Chemo/Biosensing and Chemometrics, Hunan University, Changsha 410082, China
| |
Collapse
|
77
|
Shen B, Feng F, Li K, Lin P, Ma L, Li H. A systematic assessment of deep learning methods for drug response prediction: from in vitro to clinical applications. Brief Bioinform 2023; 24:6961794. [PMID: 36575826 DOI: 10.1093/bib/bbac605] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 10/30/2022] [Accepted: 12/09/2022] [Indexed: 12/29/2022] Open
Abstract
Drug response prediction is an important problem in personalized cancer therapy. Among various newly developed models, significant improvement in prediction performance has been reported using deep learning methods. However, systematic comparisons of deep learning methods, especially of the transferability from preclinical models to clinical cohorts, are currently lacking. To provide a more rigorous assessment, the performance of six representative deep learning methods for drug response prediction using nine evaluation metrics, including the overall prediction accuracy, predictability of each drug, potential associated factors and transferability to clinical cohorts, in multiple application scenarios was benchmarked. Most methods show promising prediction within cell line datasets, and TGSA, with its lower time cost and better performance, is recommended. Although the performance metrics decrease when applying models trained on cell lines to patients, a certain amount of power to distinguish clinical response on some drugs can be maintained using CRDNN and TGSA. With these assessments, we provide a guidance for researchers to choose appropriate methods, as well as insights into future directions for the development of more effective methods in clinical scenarios.
Collapse
Affiliation(s)
- Bihan Shen
- Cancer Systems Biology group at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Fangyoumin Feng
- Cancer Systems Biology group at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Kunshi Li
- Cancer Systems Biology group at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Ping Lin
- Cancer Systems Biology group at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Liangxiao Ma
- Bio-Med Big Data Center at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Hong Li
- Cancer Systems Biology group at Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
78
|
Lee K, Cho D, Jang J, Choi K, Jeong HO, Seo J, Jeong WK, Lee S. RAMP: response-aware multi-task learning with contrastive regularization for cancer drug response prediction. Brief Bioinform 2023; 24:6865135. [PMID: 36460623 DOI: 10.1093/bib/bbac504] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 10/13/2022] [Accepted: 10/24/2022] [Indexed: 12/05/2022] Open
Abstract
The accurate prediction of cancer drug sensitivity according to the multiomics profiles of individual patients is crucial for precision cancer medicine. However, the development of prediction models has been challenged by the complex crosstalk of input features and the resistance-dominant drug response information contained in public databases. In this study, we propose a novel multidrug response prediction framework, response-aware multitask prediction (RAMP), via a Bayesian neural network and restrict it by soft-supervised contrastive regularization. To utilize network embedding vectors as representation learning features for heterogeneous networks, we harness response-aware negative sampling, which applies cell line-drug response information to the training of network embeddings. RAMP overcomes the prediction accuracy limitation induced by the imbalance of trained response data based on the comprehensive selection and utilization of drug response features. When trained on the Genomics of Drug Sensitivity in Cancer dataset, RAMP achieved an area under the receiver operating characteristic curve > 89%, an area under the precision-recall curve > 59% and an $\textrm{F}_1$ score > 52% and outperformed previously developed methods on both balanced and imbalanced datasets. Furthermore, RAMP predicted many missing drug responses that were not included in the public databases. Our results showed that RAMP will be suitable for the high-throughput prediction of cancer drug sensitivity and will be useful for guiding cancer drug selection processes. The Python implementation for RAMP is available at https://github.com/hvcl/RAMP.
Collapse
Affiliation(s)
- Kanggeun Lee
- Department of Computer Science and Engineering at Korea University
| | - Dongbin Cho
- Department of Computer Science at Hanyang University
| | - Jinho Jang
- Department of Biomedical Engineering at UNIST
| | - Kang Choi
- Department of Computer Science at Hanyang University
| | | | - Jiwon Seo
- Department of Computer Science at Hanyang University
| | - Won-Ki Jeong
- Department of Computer Science and Engineering at Korea University
| | - Semin Lee
- Department of Biomedical Engineering at UNIST
| |
Collapse
|
79
|
Liu Q, Zeng W, Zhang W, Wang S, Chen H, Jiang R, Zhou M, Zhang S. Deep generative modeling and clustering of single cell Hi-C data. Brief Bioinform 2023; 24:6858951. [PMID: 36458445 DOI: 10.1093/bib/bbac494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 09/28/2022] [Accepted: 10/18/2022] [Indexed: 12/05/2022] Open
Abstract
Deciphering 3D genome conformation is important for understanding gene regulation and cellular function at a spatial level. The recent advances of single cell Hi-C technologies have enabled the profiling of the 3D architecture of DNA within individual cell, which allows us to study the cell-to-cell variability of 3D chromatin organization. Computational approaches are in urgent need to comprehensively analyze the sparse and heterogeneous single cell Hi-C data. Here, we proposed scDEC-Hi-C, a new framework for single cell Hi-C analysis with deep generative neural networks. scDEC-Hi-C outperforms existing methods in terms of single cell Hi-C data clustering and imputation. Moreover, the generative power of scDEC-Hi-C could help unveil the differences of chromatin architecture across cell types. We expect that scDEC-Hi-C could shed light on deepening our understanding of the complex mechanism underlying the formation of chromatin contacts.
Collapse
Affiliation(s)
- Qiao Liu
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
| | - Wanwen Zeng
- College of Software, Nankai University, Tianjin 300071, China
| | - Wei Zhang
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Sicheng Wang
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Hongyang Chen
- The Research Center for Intelligent Network, Zhejiang Lab, Hangzhou 311121, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Mu Zhou
- SenseBrain Research, San Jose, CA 95131, USA
| | - Shaoting Zhang
- Shanghai Artificial Intelligence Laboratory, Shanghai 200240, China
| |
Collapse
|
80
|
Yang X, Yang G, Chu J. The Neural Metric Factorization for Computational Drug Repositioning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:731-741. [PMID: 35061591 DOI: 10.1109/tcbb.2022.3144429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Computational drug repositioning aims to discover new therapeutic diseases for marketed drugs and has the advantages of low cost, short development cycle, and high controllability compared to traditional drug development. The matrix factorization model has become the cornerstone technique for computational drug repositioning due to its ease of implementation and excellent scalability. However, the matrix factorization model uses the inner product operation to represent the association between drugs and diseases, which is lacking in expressive ability. Moreover, the degree of similarity of drugs or diseases could not be implied on their respective latent factor vectors, which is not satisfy the common sense of drug discovery. Therefore, a neural metric factorization model for computational drug repositioning (NMFDR) is proposed in this work. We novelly consider the latent factor vector of drugs and diseases as a point in the high-dimensional coordinate system and propose a generalized euclidean distance to represent the association between drugs and diseases to compensate for the shortcomings of the inner product operation. Furthermore, by embedding multiple drug (disease) metrics information into the encoding space of the latent factor vector, the information about the similarity between drugs (diseases) can be reflected in the distance between latent factor vectors. Finally, we conduct wide analysis experiments on three real datasets to demonstrate the effectiveness of the above improvement points and the superiority of the NMFDR model.
Collapse
|
81
|
Qi R, Zou Q. Trends and Potential of Machine Learning and Deep Learning in Drug Study at Single-Cell Level. RESEARCH (WASHINGTON, D.C.) 2023; 6:0050. [PMID: 36930772 PMCID: PMC10013796 DOI: 10.34133/research.0050] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 12/27/2022] [Indexed: 01/12/2023]
Abstract
Cancer treatments always face challenging problems, particularly drug resistance due to tumor cell heterogeneity. The existing datasets include the relationship between gene expression and drug sensitivities; however, the majority are based on tissue-level studies. Study drugs at the single-cell level are perspective to overcome minimal residual disease caused by subclonal resistant cancer cells retained after initial curative therapy. Fortunately, machine learning techniques can help us understand how different types of cells respond to different cancer drugs from the perspective of single-cell gene expression. Good modeling using single-cell data and drug response information will not only improve machine learning for cell-drug outcome prediction but also facilitate the discovery of drugs for specific cancer subgroups and specific cancer treatments. In this paper, we review machine learning and deep learning approaches in drug research. By analyzing the application of these methods on cancer cell lines and single-cell data and comparing the technical gap between single-cell sequencing data analysis and single-cell drug sensitivity analysis, we hope to explore the trends and potential of drug research at the single-cell data level and provide more inspiration for drug research at the single-cell level. We anticipate that this review will stimulate the innovative use of machine learning methods to address new challenges in precision medicine more broadly.
Collapse
Affiliation(s)
- Ren Qi
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China.,School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
82
|
Lee M, Kim PJ, Joe H, Kim HG. Gene-centric multi-omics integration with convolutional encoders for cancer drug response prediction. Comput Biol Med 2022; 151:106192. [PMID: 36327883 DOI: 10.1016/j.compbiomed.2022.106192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 08/26/2022] [Accepted: 10/08/2022] [Indexed: 12/27/2022]
Abstract
MOTIVATION Tumor heterogeneity, including genetic and transcriptomic characteristics, can reduce the efficacy of anticancer pharmacological therapy, resulting in clinical variability in patient response to therapeutic medications. Multi-omics integration can allow in silico models to provide an additional perspective on a biological system. METHODS In this study, we propose a gene-centric multi-channel (GCMC) architecture to integrate multi-omics for predicting cancer drug response. GCMC transformed multi-omics profiles into a three-dimensional tensor with an additional dimension for omics types. GCMC's convolutional encoders captures multi-omics profiles for each gene and yields gene-centric features to predict drug responses. RESULTS We evaluated GCMC on various datasets, including The Cancer Genome Atlas (TCGA) patients, patient-derived xenografts (PDX) mice models, and the Genomics of Drug Sensitivity in Cancer (GDSC) cell line datasets. GCMC achieved better performance than baseline models, including single-omics models, in more than 75% of 265 drugs from GDSC cell line datasets. Furthermore, as for the clinical applicability of GCMC, it achieved the best performance on TCGA and PDX datasets in terms of both AUPR and AUC. We also analyzed models' capability of integrating multi-omics profiles by measuring the contribution ratio of omics types. GCMC can incorporate multi-omics profiles in various manners to enhance performance for each drug type. These results suggested that GCMC can improve performance and feature extraction capability by integrating multi-omics profiles in a gene-centric manner.
Collapse
Affiliation(s)
- Munhwan Lee
- Biomedical Knowledge Engineering Lab., Seoul National University, 1 Gwanak-ro, Seoul, 08826, Republic of Korea.
| | - Pil-Jong Kim
- Biomedical Knowledge Engineering Lab., Seoul National University, 1 Gwanak-ro, Seoul, 08826, Republic of Korea.
| | - Hyunwhan Joe
- Biomedical Knowledge Engineering Lab., Seoul National University, 1 Gwanak-ro, Seoul, 08826, Republic of Korea.
| | - Hong-Gee Kim
- Biomedical Knowledge Engineering Lab., Seoul National University, 1 Gwanak-ro, Seoul, 08826, Republic of Korea.
| |
Collapse
|
83
|
A survey of graph neural networks in various learning paradigms: methods, applications, and challenges. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10321-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
84
|
Shin J, Piao Y, Bang D, Kim S, Jo K. DRPreter: Interpretable Anticancer Drug Response Prediction Using Knowledge-Guided Graph Neural Networks and Transformer. Int J Mol Sci 2022; 23:13919. [PMID: 36430395 PMCID: PMC9699175 DOI: 10.3390/ijms232213919] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 10/27/2022] [Accepted: 11/08/2022] [Indexed: 11/16/2022] Open
Abstract
Some of the recent studies on drug sensitivity prediction have applied graph neural networks to leverage prior knowledge on the drug structure or gene network, and other studies have focused on the interpretability of the model to delineate the mechanism governing the drug response. However, it is crucial to make a prediction model that is both knowledge-guided and interpretable, so that the prediction accuracy is improved and practical use of the model can be enhanced. We propose an interpretable model called DRPreter (drug response predictor and interpreter) that predicts the anticancer drug response. DRPreter learns cell line and drug information with graph neural networks; the cell-line graph is further divided into multiple subgraphs with domain knowledge on biological pathways. A type-aware transformer in DRPreter helps detect relationships between pathways and a drug, highlighting important pathways that are involved in the drug response. Extensive experiments on the GDSC (Genomics of Drug Sensitivity and Cancer) dataset demonstrate that the proposed method outperforms state-of-the-art graph-based models for drug response prediction. In addition, DRPreter detected putative key genes and pathways for specific drug-cell-line pairs with supporting evidence in the literature, implying that our model can help interpret the mechanism of action of the drug.
Collapse
Affiliation(s)
- Jihye Shin
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Yinhua Piao
- Department of Computer Science and Engineering, Institute of Engineering Research, Seoul National University, Seoul 08826, Korea
| | - Dongmin Bang
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
- AIGENDRUG Co., Ltd., Seoul 08826, Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
- Department of Computer Science and Engineering, Institute of Engineering Research, Seoul National University, Seoul 08826, Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul 08826, Korea
- MOGAM Institute for Biomedical Research, Yongin-si 16924, Korea
| | - Kyuri Jo
- Department of Computer Engineering, Chungbuk National University, Cheongju 28644, Korea
| |
Collapse
|
85
|
Gu Y, Zheng S, Yin Q, Jiang R, Li J. REDDA: Integrating multiple biological relations to heterogeneous graph neural network for drug-disease association prediction. Comput Biol Med 2022; 150:106127. [PMID: 36182762 DOI: 10.1016/j.compbiomed.2022.106127] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Revised: 07/27/2022] [Accepted: 09/18/2022] [Indexed: 11/03/2022]
Abstract
Computational drug repositioning is an effective way to find new indications for existing drugs, thus can accelerate drug development and reduce experimental costs. Recently, various deep learning-based repurposing methods have been established to identify the potential drug-disease associations (DDA). However, effective utilization of the relations of biological entities to capture the biological interactions to enhance the drug-disease association prediction is still challenging. To resolve the above problem, we proposed a heterogeneous graph neural network called REDDA (Relations-Enhanced Drug-Disease Association prediction). Assembled with three attention mechanisms, REDDA can sequentially learn drug/disease representations by a general heterogeneous graph convolutional network-based node embedding block, a topological subnet embedding block, a graph attention block, and a layer attention block. Performance comparisons on our proposed benchmark dataset show that REDDA outperforms 8 advanced drug-disease association prediction methods, achieving relative improvements of 0.76% on the area under the receiver operating characteristic curve (AUC) score and 13.92% on the precision-recall curve (AUPR) score compared to the suboptimal method. On the other benchmark dataset, REDDA also obtains relative improvements of 2.48% on the AUC score and 4.93% on the AUPR score. Specifically, case studies also indicate that REDDA can give valid predictions for the discovery of -new indications for drugs and new therapies for diseases. The overall results provide an inspiring potential for REDDA in the in silico drug development. The proposed benchmark dataset and source code are available in https://github.com/gu-yaowen/REDDA.
Collapse
Affiliation(s)
- Yaowen Gu
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, 100020, China
| | - Si Zheng
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, 100020, China; Institute for Artificial Intelligence, Department of Computer Science and Technology, BNRist, Tsinghua University, Beijing, 100084, China
| | - Qijin Yin
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Jiao Li
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, 100020, China.
| |
Collapse
|
86
|
Yingtaweesittikul H, Wu J, Mongia A, Peres R, Ko K, Nagarajan N, Suphavilai C. CREAMMIST: an integrative probabilistic database for cancer drug response prediction. Nucleic Acids Res 2022; 51:D1242-D1248. [PMID: 36259664 PMCID: PMC9825458 DOI: 10.1093/nar/gkac911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 09/18/2022] [Accepted: 10/11/2022] [Indexed: 01/30/2023] Open
Abstract
Extensive in vitro cancer drug screening datasets have enabled scientists to identify biomarkers and develop machine learning models for predicting drug sensitivity. While most advancements have focused on omics profiles, cancer drug sensitivity scores precalculated by the original sources are often used as-is, without consideration for variabilities between studies. It is well-known that significant inconsistencies exist between the drug sensitivity scores across datasets due to differences in experimental setups and preprocessing methods used to obtain the sensitivity scores. As a result, many studies opt to focus only on a single dataset, leading to underutilization of available data and a limited interpretation of cancer pharmacogenomics analysis. To overcome these caveats, we have developed CREAMMIST (https://creammist.mtms.dev), an integrative database that enables users to obtain an integrative dose-response curve, to capture uncertainty (or high certainty when multiple datasets well align) across five widely used cancer cell-line drug-response datasets. We utilized the Bayesian framework to systematically integrate all available dose-response values across datasets (>14 millions dose-response data points). CREAMMIST provides easy-to-use statistics derived from the integrative dose-response curves for various downstream analyses such as identifying biomarkers, selecting drug concentrations for experiments, and training robust machine learning models.
Collapse
Affiliation(s)
| | - Jiaxi Wu
- Genome Institute of Singapore, A*STAR, Singapore, Singapore
| | - Aanchal Mongia
- Genome Institute of Singapore, A*STAR, Singapore, Singapore
| | - Rafael Peres
- Genome Institute of Singapore, A*STAR, Singapore, Singapore
| | - Karrie Ko
- Genome Institute of Singapore, A*STAR, Singapore, Singapore
| | | | - Chayaporn Suphavilai
- To whom correspondence should be addressed. Tel: +65 86213683; Fax: +65 68088292;
| |
Collapse
|
87
|
Cheng X, Dai C, Wen Y, Wang X, Bo X, He S, Peng S. NeRD: a multichannel neural network to predict cellular response of drugs by integrating multidimensional data. BMC Med 2022; 20:368. [PMID: 36244991 PMCID: PMC9575288 DOI: 10.1186/s12916-022-02549-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Accepted: 09/01/2022] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Considering the heterogeneity of tumors, it is a key issue in precision medicine to predict the drug response of each individual. The accumulation of various types of drug informatics and multi-omics data facilitates the development of efficient models for drug response prediction. However, the selection of high-quality data sources and the design of suitable methods remain a challenge. METHODS In this paper, we design NeRD, a multidimensional data integration model based on the PRISM drug response database, to predict the cellular response of drugs. Four feature extractors, including drug structure extractor (DSE), molecular fingerprint extractor (MFE), miRNA expression extractor (mEE), and copy number extractor (CNE), are designed for different types and dimensions of data. A fully connected network is used to fuse all features and make predictions. RESULTS Experimental results demonstrate the effective integration of the global and local structural features of drugs, as well as the features of cell lines from different omics data. For all metrics tested on the PRISM database, NeRD surpassed previous approaches. We also verified that NeRD has strong reliability in the prediction results of new samples. Moreover, unlike other algorithms, when the amount of training data was reduced, NeRD maintained stable performance. CONCLUSIONS NeRD's feature fusion provides a new idea for drug response prediction, which is of great significance for precise cancer treatment.
Collapse
Affiliation(s)
- Xiaoxiao Cheng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Chong Dai
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing, China.,Department of Biotechnology, Beijing Institute of Health Service and Transfusion Medicine, Beijing, China
| | - Yuqi Wen
- Department of Biotechnology, Beijing Institute of Health Service and Transfusion Medicine, Beijing, China
| | - Xiaoqi Wang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xiaochen Bo
- Department of Biotechnology, Beijing Institute of Health Service and Transfusion Medicine, Beijing, China.
| | - Song He
- Department of Biotechnology, Beijing Institute of Health Service and Transfusion Medicine, Beijing, China.
| | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China. .,The State Key Laboratory of Chemo/Biosensing and Chemometrics, Hunan University, Changsha, China.
| |
Collapse
|
88
|
Peng W, Liu H, Dai W, Yu N, Wang J. Predicting cancer drug response using parallel heterogeneous graph convolutional networks with neighborhood interactions. Bioinformatics 2022; 38:4546-4553. [PMID: 35997568 DOI: 10.1093/bioinformatics/btac574] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 07/26/2022] [Accepted: 08/22/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Due to cancer heterogeneity, the therapeutic effect may not be the same when a cohort of patients of the same cancer type receive the same treatment. The anticancer drug response prediction may help develop personalized therapy regimens to increase survival and reduce patients' expenses. Recently, graph neural network-based methods have aroused widespread interest and achieved impressive results on the drug response prediction task. However, most of them apply graph convolution to process cell line-drug bipartite graphs while ignoring the intrinsic differences between cell lines and drug nodes. Moreover, most of these methods aggregate node-wise neighbor features but fail to consider the element-wise interaction between cell lines and drugs. RESULTS This work proposes a neighborhood interaction (NI)-based heterogeneous graph convolution network method, namely NIHGCN, for anticancer drug response prediction in an end-to-end way. Firstly, it constructs a heterogeneous network consisting of drugs, cell lines and the known drug response information. Cell line gene expression and drug molecular fingerprints are linearly transformed and input as node attributes into an interaction model. The interaction module consists of a parallel graph convolution network layer and a NI layer, which aggregates node-level features from their neighbors through graph convolution operation and considers the element-level of interactions with their neighbors in the NI layer. Finally, the drug response predictions are made by calculating the linear correlation coefficients of feature representations of cell lines and drugs. We have conducted extensive experiments to assess the effectiveness of our model on Cancer Drug Sensitivity Data (GDSC) and Cancer Cell Line Encyclopedia (CCLE) datasets. It has achieved the best performance compared with the state-of-the-art algorithms, especially in predicting drug responses for new cell lines, new drugs and targeted drugs. Furthermore, our model that was well trained on the GDSC dataset can be successfully applied to predict samples of PDX and TCGA, which verified the transferability of our model from cell line in vitro to the datasets in vivo. AVAILABILITY AND IMPLEMENTATION The source code can be obtained from https://github.com/weiba/NIHGCN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, P.R. China
| | - Hancheng Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, P.R. China
| | - Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, P.R. China
| | - Ning Yu
- Department of Computing Sciences, The College at Brockport, State University of New York, Brockport, NY 14422, USA
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha 410083, P.R. China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, P. R. China
| |
Collapse
|
89
|
Hu J, Gao J, Fang X, Liu Z, Wang F, Huang W, Wu H, Zhao G. DTSyn: a dual-transformer-based neural network to predict synergistic drug combinations. Brief Bioinform 2022; 23:6652782. [PMID: 35915050 DOI: 10.1093/bib/bbac302] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 06/23/2022] [Accepted: 07/04/2022] [Indexed: 11/14/2022] Open
Abstract
Drug combination therapies are superior to monotherapy for cancer treatment in many ways. Identifying novel drug combinations by screening is challenging for the wet-lab experiments due to the time-consuming process of the enormous search space of possible drug pairs. Thus, computational methods have been developed to predict drug pairs with potential synergistic functions. Notwithstanding the success of current models, understanding the mechanism of drug synergy from a chemical-gene-tissue interaction perspective lacks study, hindering current algorithms from drug mechanism study. Here, we proposed a deep neural network model termed DTSyn (Dual Transformer encoder model for drug pair Synergy prediction) based on a multi-head attention mechanism to identify novel drug combinations. We designed a fine-granularity transformer encoder to capture chemical substructure-gene and gene-gene associations and a coarse-granularity transformer encoder to extract chemical-chemical and chemical-cell line interactions. DTSyn achieved the highest receiver operating characteristic area under the curve of 0.73, 0.78. 0.82 and 0.81 on four different cross-validation tasks, outperforming all competing methods. Further, DTSyn achieved the best True Positive Rate (TPR) over five independent data sets. The ablation study showed that both transformer encoder blocks contributed to the performance of DTSyn. In addition, DTSyn can extract interactions among chemicals and cell lines, representing the potential mechanisms of drug action. By leveraging the attention mechanism and pretrained gene embeddings, DTSyn shows improved interpretability ability. Thus, we envision our model as a valuable tool to prioritize synergistic drug pairs with chemical and cell line gene expression profile.
Collapse
Affiliation(s)
- Jing Hu
- Baidu, Inc., 701, Na Xian Road, 201210, Shanghai, China
| | - Jie Gao
- Baidu, Inc., 701, Na Xian Road, 201210, Shanghai, China
| | - Xiaomin Fang
- Baidu, Inc., Xue Fu Road, 518000, Shenzhen, China
| | - Zijing Liu
- Baidu, Inc., Xue Fu Road, 518000, Shenzhen, China
| | - Fan Wang
- Baidu, Inc., Xue Fu Road, 518000, Shenzhen, China
| | - Weili Huang
- HWL Consulting LLC, 3328 Antigua Dr, 97408, Oregon, US
| | - Hua Wu
- Baidu, Inc., No. 10 Shangdi 10th Street, 100085, Beijing, China
| | - Guodong Zhao
- Baidu, Inc., 701, Na Xian Road, 201210, Shanghai, China
| |
Collapse
|
90
|
Lagisetty Y, Bourquard T, Al-Ramahi I, Mangleburg CG, Mota S, Soleimani S, Shulman JM, Botas J, Lee K, Lichtarge O. Identification of risk genes for Alzheimer's disease by gene embedding. CELL GENOMICS 2022; 2:100162. [PMID: 36268052 PMCID: PMC9581494 DOI: 10.1016/j.xgen.2022.100162] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Most disease-gene association methods do not account for gene-gene interactions, even though these play a crucial role in complex, polygenic diseases like Alzheimer's disease (AD). To discover new genes whose interactions may contribute to pathology, we introduce GeneEMBED. This approach compares the functional perturbations induced in gene interaction network neighborhoods by coding variants from disease versus healthy subjects. In two independent AD cohorts of 5,169 exomes and 969 genomes, GeneEMBED identified novel candidates. These genes were differentially expressed in post mortem AD brains and modulated neurological phenotypes in mice. Four that were differentially overexpressed and modified neurodegeneration in vivo are PLEC, UTRN, TP53, and POLD1. Notably, TP53 and POLD1 are involved in DNA break repair and inhibited by approved drugs. While these data show proof of concept in AD, GeneEMBED is a general approach that should be broadly applicable to identify genes relevant to risk mechanisms and therapy of other complex diseases.
Collapse
Affiliation(s)
- Yashwanth Lagisetty
- Department of Biology and Pharmacology, UTHealth McGovern Medical School, Houston, TX 77030, USA,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Thomas Bourquard
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ismael Al-Ramahi
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA,Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX 77030, USA,Center for Alzheimer’s and Neurodegenerative Diseases, Baylor College of Medicine, Houston, TX 77030, USA
| | - Carl Grant Mangleburg
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Samantha Mota
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Shirin Soleimani
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Joshua M. Shulman
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA,Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX 77030, USA,Center for Alzheimer’s and Neurodegenerative Diseases, Baylor College of Medicine, Houston, TX 77030, USA,Department of Neurology, Baylor College of Medicine, Houston, TX 77030, USA,Department of Neuroscience, Baylor College of Medicine, Houston, TX 77030, USA
| | - Juan Botas
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA,Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX 77030, USA,Center for Alzheimer’s and Neurodegenerative Diseases, Baylor College of Medicine, Houston, TX 77030, USA
| | - Kwanghyuk Lee
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA,Center for Alzheimer’s and Neurodegenerative Diseases, Baylor College of Medicine, Houston, TX 77030, USA,Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX 77030, USA,Corresponding author
| |
Collapse
|
91
|
Rintala TJ, Ghosh A, Fortino V. Network approaches for modeling the effect of drugs and diseases. Brief Bioinform 2022; 23:6608969. [PMID: 35704883 PMCID: PMC9294412 DOI: 10.1093/bib/bbac229] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 04/29/2022] [Accepted: 05/17/2021] [Indexed: 12/12/2022] Open
Abstract
The network approach is quickly becoming a fundamental building block of computational methods aiming at elucidating the mechanism of action (MoA) and therapeutic effect of drugs. By modeling the effect of drugs and diseases on different biological networks, it is possible to better explain the interplay between disease perturbations and drug targets as well as how drug compounds induce favorable biological responses and/or adverse effects. Omics technologies have been extensively used to generate the data needed to study the mechanisms of action of drugs and diseases. These data are often exploited to define condition-specific networks and to study whether drugs can reverse disease perturbations. In this review, we describe network data mining algorithms that are commonly used to study drug’s MoA and to improve our understanding of the basis of chronic diseases. These methods can support fundamental stages of the drug development process, including the identification of putative drug targets, the in silico screening of drug compounds and drug combinations for the treatment of diseases. We also discuss recent studies using biological and omics-driven networks to search for possible repurposed FDA-approved drug treatments for SARS-CoV-2 infections (COVID-19).
Collapse
Affiliation(s)
- T J Rintala
- Institute of Biomedicine, University of Eastern Finland, 70210 Kuopio, Finland
| | - Arindam Ghosh
- Institute of Biomedicine, University of Eastern Finland, 70210 Kuopio, Finland
| | - V Fortino
- Institute of Biomedicine, University of Eastern Finland, 70210 Kuopio, Finland
| |
Collapse
|
92
|
Liu Q, Hua K, Zhang X, Wong WH, Jiang R. DeepCAGE: Incorporating Transcription Factors in Genome-wide Prediction of Chromatin Accessibility. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:496-507. [PMID: 35293310 PMCID: PMC9801045 DOI: 10.1016/j.gpb.2021.08.015] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 05/31/2021] [Accepted: 09/27/2021] [Indexed: 01/26/2023]
Abstract
Although computational approaches have been complementing high-throughput biological experiments for the identification of functional regions in the human genome, it remains a great challenge to systematically decipher interactions between transcription factors (TFs) and regulatory elements to achieve interpretable annotations of chromatin accessibility across diverse cellular contexts. To solve this problem, we propose DeepCAGE, a deep learning framework that integrates sequence information and binding statuses of TFs, for the accurate prediction of chromatin accessible regions at a genome-wide scale in a variety of cell types. DeepCAGE takes advantage of a densely connected deep convolutional neural network architecture to automatically learn sequence signatures of known chromatin accessible regions and then incorporates such features with expression levels and binding activities of human core TFs to predict novel chromatin accessible regions. In a series of systematic comparisons with existing methods, DeepCAGE exhibits superior performance in not only the classification but also the regression of chromatin accessibility signals. In a detailed analysis of TF activities, DeepCAGE successfully extracts novel binding motifs and measures the contribution of a TF to the regulation with respect to a specific locus in a certain cell type. When applied to whole-genome sequencing data analysis, our method successfully prioritizes putative deleterious variants underlying a human complex trait and thus provides insights into the understanding of disease-associated genetic variants. DeepCAGE can be downloaded from https://github.com/kimmo1019/DeepCAGE.
Collapse
Affiliation(s)
- Qiao Liu
- Ministry of Education Key Laboratory of Bioinformatics; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China,Department of Statistics, Stanford University, Stanford, CA 94305, USA
| | - Kui Hua
- Ministry of Education Key Laboratory of Bioinformatics; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xuegong Zhang
- Ministry of Education Key Laboratory of Bioinformatics; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Wing Hung Wong
- Department of Statistics, Stanford University, Stanford, CA 94305, USA,Corresponding authors.
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China,Corresponding authors.
| |
Collapse
|
93
|
Zhu EY, Dupuy AJ. Machine learning approach informs biology of cancer drug response. BMC Bioinformatics 2022; 23:184. [PMID: 35581546 PMCID: PMC9112473 DOI: 10.1186/s12859-022-04720-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 05/03/2022] [Indexed: 12/12/2022] Open
Abstract
Background The mechanism of action for most cancer drugs is not clear. Large-scale pharmacogenomic cancer cell line datasets offer a rich resource to obtain this knowledge. Here, we present an analysis strategy for revealing biological pathways that contribute to drug response using publicly available pharmacogenomic cancer cell line datasets. Methods We present a custom machine-learning based approach for identifying biological pathways involved in cancer drug response. We test the utility of our approach with a pan-cancer analysis of ML210, an inhibitor of GPX4, and a melanoma-focused analysis of inhibitors of BRAFV600. We apply our approach to reveal determinants of drug resistance to microtubule inhibitors. Results Our method implicated lipid metabolism and Rac1/cytoskeleton signaling in the context of ML210 and BRAF inhibitor response, respectively. These findings are consistent with current knowledge of how these drugs work. For microtubule inhibitors, our approach implicated Notch and Akt signaling as pathways that associated with response. Conclusions Our results demonstrate the utility of combining informed feature selection and machine learning algorithms in understanding cancer drug response. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04720-z.
Collapse
Affiliation(s)
- Eliot Y Zhu
- Department of Anatomy and Cell Biology, The University of Iowa, Iowa City, IA, USA.,Holden Comprehensive Cancer Center, The University of Iowa, Iowa City, IA, USA.,Cancer Biology Graduate Program, The University of Iowa, Iowa City, IA, USA.,The Medical Scientist Training Program, The University of Iowa, Iowa City, IA, USA
| | - Adam J Dupuy
- Department of Anatomy and Cell Biology, The University of Iowa, Iowa City, IA, USA. .,Holden Comprehensive Cancer Center, The University of Iowa, Iowa City, IA, USA.
| |
Collapse
|
94
|
Wang H, Wang Z, Chen J, Liu W. Graph Attention Network Model with Defined Applicability Domains for Screening PBT Chemicals. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2022; 56:6774-6785. [PMID: 35475611 DOI: 10.1021/acs.est.2c00765] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In silico models for screening environmentally persistent, bio-accumulative, and toxic (PBT) substances are necessary for sound management of chemicals. Due to the complex structure-activity landscapes (SALs) on the PBT attributes, previous models for screening PBT chemicals lack either applicability domain (AD) characterizations or interpretability, restricting their applications. Herein, graph attention networks (GATs), a novel neural network architecture, were introduced to construct models for screening PBT chemicals. Results show that the GAT model not only outperformed those in previous studies but also exhibited interpretability since it optimizes attention weight parameters (PAW) that indicate contributions of each atom to the PBT attributes. An AD characterization termed ADFP-AC, which considers both molecular fingerprint (FP) similarities and compounds at activity cliffs (ACs) of SALs, was proposed to describe the ADs, which further assured the performance of the GAT model. Eight previously unidentified classes of compounds were identified as PBT chemicals from the Inventory of Existing Chemical Substances in China. The GAT model together with the ADFP-AC characterization may serve as efficient tools for screening PBT chemicals, and the modeling methodology can be applied to other physicochemical, environmental, behavioral, and toxicological parameters of chemicals that are necessary for their risk assessment and management.
Collapse
Affiliation(s)
- Haobo Wang
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Zhongyu Wang
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Jingwen Chen
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Wenjia Liu
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
95
|
Leveraging Deep Learning Techniques and Integrated Omics Data for Tailored Treatment of Breast Cancer. J Pers Med 2022; 12:jpm12050674. [PMID: 35629097 PMCID: PMC9147748 DOI: 10.3390/jpm12050674] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 03/06/2022] [Accepted: 04/14/2022] [Indexed: 12/12/2022] Open
Abstract
Multiomics data of cancer patients and cell lines, in synergy with deep learning techniques, have aided in unravelling predictive problems related to cancer research and treatment. However, there is still room for improvement in the performance of the existing models based on the aforementioned combination. In this work, we propose two models that complement the treatment of breast cancer patients. First, we discuss our deep learning-based model for breast cancer subtype classification. Second, we propose DCNN-DR, a deep convolute.ion neural network-drug response method for predicting the effectiveness of drugs on in vitro and in vivo breast cancer datasets. Finally, we applied DCNN-DR for predicting effective drugs for the basal-like breast cancer subtype and validated the results with the information available in the literature. The models proposed use late integration methods and have fairly better predictive performance compared to the existing methods. We use the Pearson correlation coefficient and accuracy as the performance measures for the regression and classification models, respectively.
Collapse
|
96
|
Ma T, Liu Q, Li H, Zhou M, Jiang R, Zhang X. DualGCN: a dual graph convolutional network model to predict cancer drug response. BMC Bioinformatics 2022; 23:129. [PMID: 35428192 PMCID: PMC9011932 DOI: 10.1186/s12859-022-04664-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Accepted: 04/04/2022] [Indexed: 11/11/2022] Open
Abstract
Background Drug resistance is a critical obstacle in cancer therapy. Discovering cancer drug response is important to improve anti-cancer drug treatment and guide anti-cancer drug design. Abundant genomic and drug response resources of cancer cell lines provide unprecedented opportunities for such study. However, cancer cell lines cannot fully reflect heterogeneous tumor microenvironments. Transferring knowledge studied from in vitro cell lines to single-cell and clinical data will be a promising direction to better understand drug resistance. Most current studies include single nucleotide variants (SNV) as features and focus on improving predictive ability of cancer drug response on cell lines. However, obtaining accurate SNVs from clinical tumor samples and single-cell data is not reliable. This makes it difficult to generalize such SNV-based models to clinical tumor data or single-cell level studies in the future. Results We present a new method, DualGCN, a unified Dual Graph Convolutional Network model to predict cancer drug response. DualGCN encodes both chemical structures of drugs and omics data of biological samples using graph convolutional networks. Then the two embeddings are fed into a multilayer perceptron to predict drug response. DualGCN incorporates prior knowledge on cancer-related genes and protein–protein interactions, and outperforms most state-of-the-art methods while avoiding using large-scale SNV data. Conclusions The proposed method outperforms most state-of-the-art methods in predicting cancer drug response without the use of large-scale SNV data. These favorable results indicate its potential to be extended to clinical and single-cell tumor samples and advancements in precision medicine.
Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04664-4.
Collapse
|
97
|
Gu Y, Zheng S, Xu Z, Yin Q, Li L, Li J. An efficient curriculum learning-based strategy for molecular graph learning. Brief Bioinform 2022; 23:6562682. [DOI: 10.1093/bib/bbac099] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Revised: 01/18/2022] [Accepted: 02/27/2022] [Indexed: 12/14/2022] Open
Abstract
Abstract
Computational methods have been widely applied to resolve various core issues in drug discovery, such as molecular property prediction. In recent years, a data-driven computational method-deep learning had achieved a number of impressive successes in various domains. In drug discovery, graph neural networks (GNNs) take molecular graph data as input and learn graph-level representations in non-Euclidean space. An enormous amount of well-performed GNNs have been proposed for molecular graph learning. Meanwhile, efficient use of molecular data during training process, however, has not been paid enough attention. Curriculum learning (CL) is proposed as a training strategy by rearranging training queue based on calculated samples' difficulties, yet the effectiveness of CL method has not been determined in molecular graph learning. In this study, inspired by chemical domain knowledge and task prior information, we proposed a novel CL-based training strategy to improve the training efficiency of molecular graph learning, called CurrMG. Consisting of a difficulty measurer and a training scheduler, CurrMG is designed as a plug-and-play module, which is model-independent and easy-to-use on molecular data. Extensive experiments demonstrated that molecular graph learning models could benefit from CurrMG and gain noticeable improvement on five GNN models and eight molecular property prediction tasks (overall improvement is 4.08%). We further observed CurrMG’s encouraging potential in resource-constrained molecular property prediction. These results indicate that CurrMG can be used as a reliable and efficient training strategy for molecular graph learning.
Availability: The source code is available in https://github.com/gu-yaowen/CurrMG.
Collapse
Affiliation(s)
- Yaowen Gu
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China
| | - Si Zheng
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China
- Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
| | - Zidu Xu
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China
| | - Qijin Yin
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Liang Li
- Key Laboratory of Antibiotic Bioengineering of National Health and Family Planning Commission (NHFPC), Institute of Medicinal Biotechnology (IMB), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China
| | - Jiao Li
- Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China
| |
Collapse
|
98
|
Jiang L, Jiang C, Yu X, Fu R, Jin S, Liu X. DeepTTA: a transformer-based model for predicting cancer drug response. Brief Bioinform 2022; 23:6554594. [PMID: 35348595 DOI: 10.1093/bib/bbac100] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 02/08/2022] [Accepted: 02/27/2022] [Indexed: 12/27/2022] Open
Abstract
Identifying new lead molecules to treat cancer requires more than a decade of dedicated effort. Before selected drug candidates are used in the clinic, their anti-cancer activity is generally validated by in vitro cellular experiments. Therefore, accurate prediction of cancer drug response is a critical and challenging task for anti-cancer drugs design and precision medicine. With the development of pharmacogenomics, the combination of efficient drug feature extraction methods and omics data has made it possible to use computational models to assist in drug response prediction. In this study, we propose DeepTTA, a novel end-to-end deep learning model that utilizes transformer for drug representation learning and a multilayer neural network for transcriptomic data prediction of the anti-cancer drug responses. Specifically, DeepTTA uses transcriptomic gene expression data and chemical substructures of drugs for drug response prediction. Compared to existing methods, DeepTTA achieved higher performance in terms of root mean square error, Pearson correlation coefficient and Spearman's rank correlation coefficient on multiple test sets. Moreover, we discovered that anti-cancer drugs bortezomib and dactinomycin provide a potential therapeutic option with multiple clinical indications. With its excellent performance, DeepTTA is expected to be an effective method in cancer drug design.
Collapse
Affiliation(s)
- Likun Jiang
- Department of Computer Science, Xiamen University, Xiamen 361005, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Changzhi Jiang
- Department of Computer Science, Xiamen University, Xiamen 361005, China
| | - Xinyu Yu
- Department of Computer Science, Xiamen University, Xiamen 361005, China
| | - Rao Fu
- Department of Computer Science, Xiamen University, Xiamen 361005, China
| | - Shuting Jin
- Department of Computer Science, Xiamen University, Xiamen 361005, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Xiangrong Liu
- Department of Computer Science, Xiamen University, Xiamen 361005, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| |
Collapse
|
99
|
Wang Z, Wang Z, Huang Y, Lu L, Fu Y. A multi-view multi-omics model for cancer drug response prediction. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03294-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
100
|
Li X, Ma J, Leng L, Han M, Li M, He F, Zhu Y. MoGCN: A Multi-Omics Integration Method Based on Graph Convolutional Network for Cancer Subtype Analysis. Front Genet 2022; 13:806842. [PMID: 35186034 PMCID: PMC8847688 DOI: 10.3389/fgene.2022.806842] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 01/14/2022] [Indexed: 12/17/2022] Open
Abstract
In light of the rapid accumulation of large-scale omics datasets, numerous studies have attempted to characterize the molecular and clinical features of cancers from a multi-omics perspective. However, there are great challenges in integrating multi-omics using machine learning methods for cancer subtype classification. In this study, MoGCN, a multi-omics integration model based on graph convolutional network (GCN) was developed for cancer subtype classification and analysis. Genomics, transcriptomics and proteomics datasets for 511 breast invasive carcinoma (BRCA) samples were downloaded from the Cancer Genome Atlas (TCGA). The autoencoder (AE) and the similarity network fusion (SNF) methods were used to reduce dimensionality and construct the patient similarity network (PSN), respectively. Then the vector features and the PSN were input into the GCN for training and testing. Feature extraction and network visualization were used for further biological knowledge discovery and subtype classification. In the analysis of multi-dimensional omics data of the BRCA samples in TCGA, MoGCN achieved the highest accuracy in cancer subtype classification compared with several popular algorithms. Moreover, MoGCN can extract the most significant features of each omics layer and provide candidate functional molecules for further analysis of their biological effects. And network visualization showed that MoGCN could make clinically intuitive diagnosis. The generality of MoGCN was proven on the TCGA pan-kidney cancer datasets. MoGCN and datasets are public available at https://github.com/Lifoof/MoGCN. Our study shows that MoGCN performs well for heterogeneous data integration and the interpretability of classification results, which confers great potential for applications in biomarker identification and clinical diagnosis.
Collapse
Affiliation(s)
- Xiao Li
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing, China
| | - Jie Ma
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing, China
| | - Ling Leng
- Stem Cell and Regenerative Medicine Lab, Department of Medical Science Research Center, State Key Laboratory of Complex Severe and Rare Diseases, Translational Medicine Center, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China
| | - Mingfei Han
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing, China
| | - Mansheng Li
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing, China
| | - Fuchu He
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing, China
| | - Yunping Zhu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing, China
| |
Collapse
|