1
|
Dai W, Chen G, Peng W, Chen C, Fu X, Liu L, Liu L, Yu N. Domain alignment method based on masked variational autoencoder for predicting patient anticancer drug response. Methods 2025; 238:61-73. [PMID: 40090506 DOI: 10.1016/j.ymeth.2025.03.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2024] [Revised: 02/03/2025] [Accepted: 03/14/2025] [Indexed: 03/18/2025] Open
Abstract
Predicting the patient's response to anticancer drugs is essential in personalized treatment plans. However, due to significant distribution differences between cell line data and patient data, models trained well on cell line data may perform poorly on patient anticancer drug response predictions. Some existing methods use transfer learning strategies to implement domain feature alignment between cell lines and patient data and leverage knowledge from cell lines to predict patient anticancer drug responses. This study proposes a domain alignment method based on masked variational autoencoders, MVAEDA, to predict patient anticancer drug responses. The model constructs multiple variational autoencoders (VAEs) and mask predictors to extract specific and domain-invariant features of cell lines and patients. Then, it masks and reconstructs the gene expression matrix, using generative adversarial training to learn domain-invariant features from the cell line and patient domains. These domain-invariant features are then used to train a classifier. Finally, the final trained model predicts the anticancer drug response in the target domain. Our model is experimentally evaluated on the clinical dataset and the preclinical dataset. The results show that our method performs better than other state-of-the-art methods.
Collapse
Affiliation(s)
- Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China.
| | - Gong Chen
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China
| | - Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China.
| | - Chuyue Chen
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China
| | - Xiaodong Fu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China
| | - Li Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China.
| | - Lijun Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China
| | - Ning Yu
- State University of New York, The College at Brockport, Department of Computing Sciences, 350 New Campus Drive, Brockport, NY 14422, United States.
| |
Collapse
|
2
|
Wu J, Lai J, Zhao X, Wang Z, Zhang Y, Wang L, Su Y, He Y, Li S, Jiang Y, Han J. DeepCCDS: Interpretable Deep Learning Framework for Predicting Cancer Cell Drug Sensitivity through Characterizing Cancer Driver Signals. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2025:e2416958. [PMID: 40397390 DOI: 10.1002/advs.202416958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2024] [Revised: 04/18/2025] [Indexed: 05/22/2025]
Abstract
Accurate characterization of cellular states is the foundation for precise prediction of drug sensitivity in cancer cell lines, which in turn is fundamental to realizing precision oncology. However, current deep learning approaches have limitations in characterizing cellular states. They rely solely on isolated genetic markers, overlooking the complex regulatory networks and cellular mechanisms that underlie drug responses. To address this limitation, this work proposes DeepCCDS, a Deep learning framework for Cancer Cell Drug Sensitivity prediction through Characterizing Cancer Driver Signals. DeepCCDS incorporates a prior knowledge network to characterize cancer driver signals, building upon the self-supervised neural network framework. The signals can reflect key mechanisms influencing cancer cell development and drug response, enhancing the model's predictive performance and interpretability. DeepCCDS has demonstrated superior performance in predicting drug sensitivity compared to previous state-of-the-art approaches across multiple datasets. Benefiting from integrating prior knowledge, DeepCCDS exhibits powerful feature representation capabilities and interpretability. Based on these feature representations, we have identified embedding features that could potentially be used for drug screening in new indications. Further, this work demonstrates the applicability of DeepCCDS on solid tumor samples from The Cancer Genome Atlas. This work believes integrating DeepCCDS into clinical decision-making processes can potentially improve the selection of personalized treatment strategies for cancer patients.
Collapse
Affiliation(s)
- Jiashuo Wu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Jiyin Lai
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Xilong Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Ziyi Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yongbao Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Liqiang Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yinchun Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yalan He
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Siyuan Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Ying Jiang
- College of Basic Medical Science, Heilongjiang University of Chinese Medicine, Harbin, 150040, China
| | - Junwei Han
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| |
Collapse
|
3
|
He Y, Li S, Lan H, Long W, Zhai S, Li M, Wen Z. A Transfer Learning Framework for Predicting and Interpreting Drug Responses via Single-Cell RNA-Seq Data. Int J Mol Sci 2025; 26:4365. [PMID: 40362602 PMCID: PMC12072357 DOI: 10.3390/ijms26094365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2025] [Revised: 04/29/2025] [Accepted: 05/02/2025] [Indexed: 05/15/2025] Open
Abstract
Chemotherapy is a fundamental therapy in cancer treatment, yet its effectiveness is often undermined by drug resistance. Understanding the molecular mechanisms underlying drug response remains a major challenge due to tumor heterogeneity, complex cellular interactions, and limited access to clinical samples, which also hinder the performance and interpretability of existing predictive models. Meanwhile, single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool for uncovering resistance mechanisms, but the systematic collection and utilization of scRNA-seq drug response data remain limited. In this study, we collected scRNA-seq drug response datasets from publicly available web sources and proposed a transfer learning-based framework to align bulk and single cell sequencing data. A shared encoder was designed to project both bulk and single-cell sequencing data into a unified latent space for drug response prediction, while a sparse decoder guided by prior biological knowledge enhanced interpretability by mapping latent features to predefined pathways. The proposed model achieved superior performance across five curated scRNA-seq datasets and yielded biologically meaningful insights through integrated gradient analysis. This work demonstrates the potential of deep learning to advance drug response prediction and underscores the value of scRNA-seq data in supporting related research.
Collapse
Affiliation(s)
- Yujie He
- College of Chemistry, Sichuan University, Chengdu 610064, China; (Y.H.)
| | - Shenghao Li
- College of Chemistry, Sichuan University, Chengdu 610064, China; (Y.H.)
| | - Hao Lan
- College of Chemistry, Sichuan University, Chengdu 610064, China; (Y.H.)
| | - Wulin Long
- College of Chemistry, Sichuan University, Chengdu 610064, China; (Y.H.)
| | - Shengqiu Zhai
- College of Chemistry, Sichuan University, Chengdu 610064, China; (Y.H.)
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu 610064, China; (Y.H.)
| | - Zhining Wen
- College of Chemistry, Sichuan University, Chengdu 610064, China; (Y.H.)
- Medical Big Data Center, Sichuan University, Chengdu 610064, China
| |
Collapse
|
4
|
Shi H, Xu T, Li X, Gao Q, Xiong Z, Xia J, Yue Z. DRExplainer: Quantifiable interpretability in drug response prediction with directed graph convolutional network. Artif Intell Med 2025; 163:103101. [PMID: 40056540 DOI: 10.1016/j.artmed.2025.103101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Revised: 01/08/2025] [Accepted: 02/23/2025] [Indexed: 03/10/2025]
Abstract
Predicting the response of a cancer cell line to a therapeutic drug is pivotal for personalized medicine. Despite numerous deep learning methods that have been developed for drug response prediction, integrating diverse information about biological entities and predicting the directional response remain major challenges. Here, we propose a novel interpretable predictive model, DRExplainer, which leverages a directed graph convolutional network to enhance the prediction in a directed bipartite network framework. DRExplainer constructs a directed bipartite network integrating multi-omics profiles of cell lines, the chemical structure of drugs and known drug response to achieve directed prediction. Then, DRExplainer identifies the most relevant subgraph to each prediction in this directed bipartite network by learning a mask, facilitating critical medical decision-making. Additionally, we introduce a quantifiable method for model interpretability that leverages a ground truth benchmark dataset curated from biological features. In computational experiments, DRExplainer outperforms state-of-the-art predictive methods and another graph-based explanation method under the same experimental setting. Finally, the case studies further validate the interpretability and the effectiveness of DRExplainer in predictive novel drug response. Our code is available at: https://github.com/vshy-dream/DRExplainer.
Collapse
Affiliation(s)
- Haoyuan Shi
- University of Science and Technology of China, Hefei, 230026, Anhui, China; School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| | - Tao Xu
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| | - Xiaodi Li
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| | - Qian Gao
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| | - Zhiwei Xiong
- University of Science and Technology of China, Hefei, 230026, Anhui, China.
| | - Junfeng Xia
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230036, Anhui, China.
| | - Zhenyu Yue
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| |
Collapse
|
5
|
Salimi A, Lee JY. Hybrid intelligence for environmental pollution: biodegradability assessment of organic compounds through multimodal integration of graph attention networks and QSAR models. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2025; 27:981-991. [PMID: 40052292 DOI: 10.1039/d4em00594e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/17/2025]
Abstract
Computational methods are crucial for assessing chemical biodegradability, given their significant impact on both environmental and human health. Organic compounds that are not biodegradable can persist in the environment, contributing to pollution. Our novel approach leverages graph attention networks (GATs) and incorporates node and edge attributes for biodegradability prediction. Quantitative Structure-Activity Relationship (QSAR) models using two-dimensional descriptors alongside weighted average and stacking approaches were employed to generate ensemble models. The GAT models demonstrated a stable function and generally higher specificity on the validation set compared to a graph convolutional network, although definitive superiority is challenging to establish owing to overlapping standard deviations. However, the sensitivities tended to decrease with potential performance overlap owing to the interval intersection. Ensemble learning enhanced several performance metrics compared with individual models and base models, with the combination of extreme Gradient Boosting and GAT achieving the highest precision and specificity. Combining GAT with random forest and Gradient Boosting may be preferable for accurately predicting biodegradable molecules, whereas the stacking approach may be suitable for prioritizing the correct classification of nonbiodegradable substances. Important descriptors, such as SpMax1_Bh(m) and SAscore, were identified in at least two QSAR models. Despite inherent complexities, the ease of implementation depends on factors such as data availability, and domain knowledge. Assessing the biodegradability of organic compounds is essential for reducing their environmental impact, assessing risks, ensuring regulatory compliance, promoting sustainable development, and supporting effective pollution remediation. It assists in making informed decisions about chemical use, waste management, and environmental protection.
Collapse
Affiliation(s)
- Abbas Salimi
- Department of Chemistry, Sungkyunkwan University, Suwon 16419, Korea.
| | - Jin Yong Lee
- Department of Chemistry, Sungkyunkwan University, Suwon 16419, Korea.
| |
Collapse
|
6
|
Codicè F, Pancotti C, Rollo C, Moreau Y, Fariselli P, Raimondi D. The specification game: rethinking the evaluation of drug response prediction for precision oncology. J Cheminform 2025; 17:33. [PMID: 40087708 PMCID: PMC11907791 DOI: 10.1186/s13321-025-00972-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Accepted: 02/13/2025] [Indexed: 03/17/2025] Open
Abstract
Precision oncology plays a pivotal role in contemporary healthcare, aiming to optimize treatments for each patient based on their unique characteristics. This objective has spurred the emergence of various cancer cell line drug response datasets, driven by the need to facilitate pre-clinical studies by exploring the impact of multi-omics data on drug response. Despite the proliferation of machine learning models for Drug Response Prediction (DRP), their validation remains critical to reliably assess their usefulness for drug discovery, precision oncology and their actual ability to generalize over the immense space of cancer cells and chemical compounds. Scientific contribution In this paper we show that the commonly used evaluation strategies for DRP methods can be easily fooled by commonly occurring dataset biases, and they are therefore not able to truly measure the ability of DRP methods to generalize over drugs and cell lines ("specification gaming"). This problem hinders the development of reliable DRP methods and their application to experimental pipelines. Here we propose a new validation protocol composed by three Aggregation Strategies (Global, Fixed-Drug, and Fixed-Cell Line) integrating them with three of the most commonly used train-test evaluation settings, to ensure a truly realistic assessment of the prediction performance. We also scrutinize the challenges associated with using IC50 as a prediction label, showing how its close correlation with the drug concentration ranges worsens the risk of misleading performance assessment, and we indicate an additional reason to replace it with the Area Under the Dose-Response Curve instead.
Collapse
Affiliation(s)
- Francesco Codicè
- Department of Medical Sciences, University of Torino, 10123, Torino, Italy.
| | - Corrado Pancotti
- Department of Medical Sciences, University of Torino, 10123, Torino, Italy
| | - Cesare Rollo
- Department of Medical Sciences, University of Torino, 10123, Torino, Italy
| | - Yves Moreau
- ESAT-STADIUS, KU Leuven, Leuven, 3001, Belgium
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, 10123, Torino, Italy
| | - Daniele Raimondi
- Institut de Génétique Moléculaire de Montpellier, Université de Montpellier, 34293, Montpellier, France
| |
Collapse
|
7
|
Narykov O, Zhu Y, Brettin T, Evrard YA, Partin A, Xia F, Shukla M, Vasanthakumari P, Doroshow JH, Stevens RL. Data imbalance in drug response prediction: multi-objective optimization approach in deep learning setting. Brief Bioinform 2025; 26:bbaf134. [PMID: 40178282 PMCID: PMC11966611 DOI: 10.1093/bib/bbaf134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Revised: 02/07/2025] [Accepted: 02/18/2025] [Indexed: 04/05/2025] Open
Abstract
Drug response prediction (DRP) methods tackle the complex task of associating the effectiveness of small molecules with the specific genetic makeup of the patient. Anti-cancer DRP is a particularly challenging task requiring costly experiments as underlying pathogenic mechanisms are broad and associated with multiple genomic pathways. The scientific community has exerted significant efforts to generate public drug screening datasets, giving a path to various machine learning models that attempt to reason over complex data space of small compounds and biological characteristics of tumors. However, the data depth is still lacking compared to application domains like computer vision or natural language processing domains, limiting current learning capabilities. To combat this issue and improves the generalizability of the DRP models, we are exploring strategies that explicitly address the imbalance in the DRP datasets. We reframe the problem as a multi-objective optimization across multiple drugs to maximize deep learning model performance. We implement this approach by constructing Multi-Objective Optimization Regularized by Loss Entropy loss function and plugging it into a Deep Learning model. We demonstrate the utility of proposed drug discovery methods and make suggestions for further potential application of the work to achieve desirable outcomes in the healthcare field.
Collapse
Affiliation(s)
- Oleksandr Narykov
- Computing, Environment and Life Sciences, Argonne National Laboratory, 9700 S Cass Ave, Lemont, IL 60439, United States
| | - Yitan Zhu
- Computing, Environment and Life Sciences, Argonne National Laboratory, 9700 S Cass Ave, Lemont, IL 60439, United States
| | - Thomas Brettin
- Computing, Environment and Life Sciences, Argonne National Laboratory, 9700 S Cass Ave, Lemont, IL 60439, United States
| | - Yvonne A Evrard
- Leidos Biomedical Research, Frederick National Laboratory for Cancer Research, 8560 Progress Drive, Frederick, MD 21702, United States
| | - Alexander Partin
- Computing, Environment and Life Sciences, Argonne National Laboratory, 9700 S Cass Ave, Lemont, IL 60439, United States
| | - Fangfang Xia
- Computing, Environment and Life Sciences, Argonne National Laboratory, 9700 S Cass Ave, Lemont, IL 60439, United States
| | - Maulik Shukla
- Computing, Environment and Life Sciences, Argonne National Laboratory, 9700 S Cass Ave, Lemont, IL 60439, United States
| | - Priyanka Vasanthakumari
- Computing, Environment and Life Sciences, Argonne National Laboratory, 9700 S Cass Ave, Lemont, IL 60439, United States
| | - James H Doroshow
- Developmental Therapeutics Branch, National Cancer Institute, 31 Center Dr, Bethesda, MD 20892, United States
| | - Rick L Stevens
- Computing, Environment and Life Sciences, Argonne National Laboratory, 9700 S Cass Ave, Lemont, IL 60439, United States
- Department of Computer Science, The University of Chicago, 5730 S Ellis Ave, Chicago, IL 60637, United States
| |
Collapse
|
8
|
Yin J, Zhang H, Sun X, You N, Mou M, Lu M, Pan Z, Li F, Li H, Zeng S, Zhu F. Decoding Drug Response With Structurized Gridding Map-Based Cell Representation. IEEE J Biomed Health Inform 2025; 29:1702-1713. [PMID: 38090819 DOI: 10.1109/jbhi.2023.3342280] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2025]
Abstract
A thorough understanding of cell-line drug response mechanisms is crucial for drug development, repurposing, and resistance reversal. While targeted anticancer therapies have shown promise, not all cancers have well-established biomarkers to stratify drug response. Single-gene associations only explain a small fraction of the observed drug sensitivity, so a more comprehensive method is needed. However, while deep learning models have shown promise in predicting drug response in cell lines, they still face significant challenges when it comes to their application in clinical applications. Therefore, this study proposed a new strategy called DD-Response for cell-line drug response prediction. First, a limitation of narrow modeling horizons was overcome to expand the model training domain by integrating multiple datasets through source-specific label binarization. Second, a modified representation based on a two-dimensional structurized gridding map (SGM) was developed for cell lines & drugs, avoiding feature correlation neglect and potential information loss. Third, a dual-branch, multi-channel convolutional neural network-based model for pairwise response prediction was constructed, enabling accurate outcomes and improved exploration of underlying mechanisms. As a result, the DD-Response demonstrated superior performance, captured cell-line characteristic variations, and provided insights into key factors impacting cell-line drug response. In addition, DD-Response exhibited scalability in predicting clinical patient responses to drug therapy. Overall, because of DD-response's excellent ability to predict drug response and capture key molecules behind them, DD-response is expected to greatly facilitate drug discovery, repurposing, resistance reversal, and therapeutic optimization.
Collapse
|
9
|
Peng W, Chen C, Dai W, Yu N, Wang J. Predicting Clinical Anticancer Drug Response of Patients by Using Domain Alignment and Prototypical Learning. IEEE J Biomed Health Inform 2025; 29:1534-1545. [PMID: 39292588 DOI: 10.1109/jbhi.2024.3462811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/20/2024]
Abstract
Anticancer drug response prediction is crucial in developing personalized treatment plans for cancer patients. However, High-quality patient anticancer drug response data are scarce and cell line data and patient data have different distributions, models trained solely on cell line data perform poorly. Some existing methods predict anticancer drug response by transferring knowledge from the cell line domain to the patient domain using transfer learning. However, the robustness of these classifiers is affected by anomalies in the cell line data, and they do not utilize the knowledge in the unlabeled target domain data. To this end, we proposed a model called DAPL to predict patient responses to anticancer drugs. The model extracts domain-invariant features from cell lines and patients by constructing multiple VAEs and extracts drug features using GNNs. These features are then combined for prototypical learning to train a classifier, resulting in better predictions of patient anticancer drug response. We used the cell line datasets CCLE and GDSC as source domains and the patient datasets TCGA and PDTC as target domains and conducted experiments. The results indicate that DAPL shows excellent performance in predicting patient anticancer drug response compared to other state-of-the-art methods.
Collapse
|
10
|
Prasanna S, Kumar A, Rao D, Simoes EJ, Rao P. A scalable tool for analyzing genomic variants of humans using knowledge graphs and graph machine learning. Front Big Data 2025; 7:1466391. [PMID: 39906190 PMCID: PMC11790625 DOI: 10.3389/fdata.2024.1466391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Accepted: 12/12/2024] [Indexed: 02/06/2025] Open
Abstract
Advances in high-throughput genome sequencing have enabled large-scale genome sequencing in clinical practice and research studies. By analyzing genomic variants of humans, scientists can gain better understanding of the risk factors of complex diseases such as cancer and COVID-19. To model and analyze the rich genomic data, knowledge graphs (KGs) and graph machine learning (GML) can be regarded as enabling technologies. In this article, we present a scalable tool called VariantKG for analyzing genomic variants of humans modeled using KGs and GML. Specifically, we used publicly available genome sequencing data from patients with COVID-19. VariantKG extracts variant-level genetic information output by a variant calling pipeline, annotates the variant data with additional metadata, and converts the annotated variant information into a KG represented using the Resource Description Framework (RDF). The resulting KG is further enhanced with patient metadata and stored in a scalable graph database that enables efficient RDF indexing and query processing. VariantKG employs the Deep Graph Library (DGL) to perform GML tasks such as node classification. A user can extract a subset of the KG and perform inference tasks using DGL. The user can monitor the training and testing performance and hardware utilization. We tested VariantKG for KG construction by using 1,508 genome sequences, leading to 4 billion RDF statements. We evaluated GML tasks using VariantKG by selecting a subset of 500 sequences from the KG and performing node classification using well-known GML techniques such as GraphSAGE, Graph Convolutional Network (GCN) and Graph Transformer. VariantKG has intuitive user interfaces and features enabling a low barrier to entry for KG construction, model inference, and model interpretation on genomic variants of humans.
Collapse
Affiliation(s)
- Shivika Prasanna
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, United States
| | - Ajay Kumar
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, United States
| | - Deepthi Rao
- Department of Pathology and Anatomical Sciences, University of Missouri, Columbia, MO, United States
| | - Eduardo J. Simoes
- Department of Biomedical Informatics, Biostatistics and Medical Epidemiology, University of Missouri, Columbia, MO, United States
| | - Praveen Rao
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, United States
| |
Collapse
|
11
|
Wang C, Kumar GA, Rajapakse JC. Drug discovery and mechanism prediction with explainable graph neural networks. Sci Rep 2025; 15:179. [PMID: 39747341 PMCID: PMC11696803 DOI: 10.1038/s41598-024-83090-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Accepted: 12/11/2024] [Indexed: 01/04/2025] Open
Abstract
Apprehension of drug action mechanism is paramount for drug response prediction and precision medicine. The unprecedented development of machine learning and deep learning algorithms has expedited the drug response prediction research. However, existing methods mainly focus on forward encoding of drugs, which is to obtain an accurate prediction of the response levels, but omitted to decipher the reaction mechanism between drug molecules and genes. We propose the eXplainable Graph-based Drug response Prediction (XGDP) approach that achieves a precise drug response prediction and reveals the comprehensive mechanism of action between drugs and their targets. XGDP represents drugs with molecular graphs, which naturally preserve the structural information of molecules and a Graph Neural Network module is applied to learn the latent features of molecules. Gene expression data from cancer cell lines are incorporated and processed by a Convolutional Neural Network module. A couple of deep learning attribution algorithms are leveraged to interpret interactions between drug molecular features and genes. We demonstrate that XGDP not only enhances the prediction accuracy compared to pioneering works but is also capable of capturing the salient functional groups of drugs and interactions with significant genes of cancer cells.
Collapse
Affiliation(s)
- Conghao Wang
- College of Computing and Data Science, Nanyang Technological University, Singapore, 639798, Singapore
| | - Gaurav Asok Kumar
- College of Computing and Data Science, Nanyang Technological University, Singapore, 639798, Singapore
| | - Jagath C Rajapakse
- College of Computing and Data Science, Nanyang Technological University, Singapore, 639798, Singapore.
| |
Collapse
|
12
|
Xiao M, Zheng Q, Popa P, Mi X, Hu J, Zou F, Zou B. Drug molecular representations for drug response predictions: a comprehensive investigation via machine learning methods. Sci Rep 2025; 15:20. [PMID: 39748003 PMCID: PMC11696021 DOI: 10.1038/s41598-024-84711-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Accepted: 12/26/2024] [Indexed: 01/04/2025] Open
Abstract
The integration of drug molecular representations into predictive models for Drug Response Prediction (DRP) is a standard procedure in pharmaceutical research and development. However, the comparative effectiveness of combining these representations with genetic profiles for DRP remains unclear. This study conducts a comprehensive evaluation of the efficacy of various drug molecular representations employing cutting-edge machine learning models under various experimental settings. Our findings reveal that the inclusion of molecular representations from either PubChem fingerprints or SMILES can significantly enhance the performance of DRPs when used in conjunction with deep learning models. However, the optimal choice of drug molecular representation can vary depending on the predictive model and the specific DRP task. The insights derived from our study offer useful guidance on selecting the most suitable drug molecular representations for constructing efficient predictive models for DRPs, aiding for drug repurposing, personalized medicine, and new drug discovery.
Collapse
Affiliation(s)
- Meisheng Xiao
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - Qianhui Zheng
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | | | - Xinlei Mi
- Gilead Science, Inc, Foster City, USA
| | - Jianhua Hu
- Department of Biostatistics, Columbia University, New York, USA
| | - Fei Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - Baiming Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, USA.
- School of Nursing, University of North Carolina at Chapel Hill, Chapel Hill, USA.
| |
Collapse
|
13
|
Gu Y, Zheng S, Zhang B, Kang H, Jiang R, Li J. Deep multiple instance learning on heterogeneous graph for drug-disease association prediction. Comput Biol Med 2025; 184:109403. [PMID: 39577348 DOI: 10.1016/j.compbiomed.2024.109403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 11/05/2024] [Accepted: 11/08/2024] [Indexed: 11/24/2024]
Abstract
Drug repositioning offers promising prospects for accelerating drug discovery by identifying potential drug-disease associations (DDAs) for existing drugs and diseases. Previous methods have generated meta-path-augmented node or graph embeddings for DDA prediction in drug-disease heterogeneous networks. However, these approaches rarely develop end-to-end frameworks for path instance-level representation learning as well as the further feature selection and aggregation. By leveraging the abundant topological information in path instances, more fine-grained and interpretable predictions can be achieved. To this end, we introduce deep multiple instance learning into drug repositioning by proposing a novel method called MilGNet. MilGNet employs a heterogeneous graph neural network (HGNN)-based encoder to learn drug and disease node embeddings. Treating each drug-disease pair as a bag, we designed a special quadruplet meta-path form and implemented a pseudo meta-path generator in MilGNet to obtain multiple meta-path instances based on network topology. Additionally, a bidirectional instance encoder enhances the representation of meta-path instances. Finally, MilGNet utilizes a multi-scale interpretable predictor to aggregate bag embeddings with an attention mechanism, providing predictions at both the bag and instance levels for accurate and explainable predictions. Comprehensive experiments on five benchmarks demonstrate that MilGNet significantly outperforms ten advanced methods. Notably, three case studies on one drug (Methotrexate) and two diseases (Renal Failure and Mismatch Repair Cancer Syndrome) highlight MilGNet's potential for discovering new indications, therapies, and generating rational meta-path instances to investigate possible treatment mechanisms. The source code is available at https://github.com/gu-yaowen/MilGNet.
Collapse
Affiliation(s)
- Yaowen Gu
- Institute of Medical Information, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS&PUMC), Beijing, 100020, China; Department of Chemistry, New York University, NY, 10027, USA.
| | - Si Zheng
- Institute of Medical Information, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS&PUMC), Beijing, 100020, China; Institute for Artificial Intelligence, Department of Computer Science and Technology, BNRist, Tsinghua University, Beijing, 100084, China
| | - Bowen Zhang
- Beijing StoneWise Technology Co Ltd., Beijing, 100080, China
| | - Hongyu Kang
- Institute of Medical Information, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS&PUMC), Beijing, 100020, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Jiao Li
- Institute of Medical Information, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS&PUMC), Beijing, 100020, China.
| |
Collapse
|
14
|
Dong Y, Zhang Y, Qian Y, Zhao Y, Yang Z, Feng X. ASGCL: Adaptive Sparse Mapping-based graph contrastive learning network for cancer drug response prediction. PLoS Comput Biol 2025; 21:e1012748. [PMID: 39883719 PMCID: PMC11781687 DOI: 10.1371/journal.pcbi.1012748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 12/23/2024] [Indexed: 02/01/2025] Open
Abstract
Personalized cancer drug treatment is emerging as a frontier issue in modern medical research. Considering the genomic differences among cancer patients, determining the most effective drug treatment plan is a complex and crucial task. In response to these challenges, this study introduces the Adaptive Sparse Graph Contrastive Learning Network (ASGCL), an innovative approach to unraveling latent interactions in the complex context of cancer cell lines and drugs. The core of ASGCL is the GraphMorpher module, an innovative component that enhances the input graph structure via strategic node attribute masking and topological pruning. By contrasting the augmented graph with the original input, the model delineates distinct positive and negative sample sets at both node and graph levels. This dual-level contrastive approach significantly amplifies the model's discriminatory prowess in identifying nuanced drug responses. Leveraging a synergistic combination of supervised and contrastive loss, ASGCL accomplishes end-to-end learning of feature representations, substantially outperforming existing methodologies. Comprehensive ablation studies underscore the efficacy of each component, corroborating the model's robustness. Experimental evaluations further illuminate ASGCL's proficiency in predicting drug responses, offering a potent tool for guiding clinical decision-making in cancer therapy.
Collapse
Affiliation(s)
- Yunyun Dong
- School of Software, Taiyuan University of Technology, Taiyuan, China
- Institute of Big Data Science and Industry, Shanxi University, Taiyuan, China
| | - Yuanrong Zhang
- School of Software, Taiyuan University of Technology, Taiyuan, China
| | - Yuhua Qian
- Institute of Big Data Science and Industry, Shanxi University, Taiyuan, China
- School of Computer and Information Technology, Shanxi University, Taiyuan, China
| | - Yiming Zhao
- School of Software, Taiyuan University of Technology, Taiyuan, China
| | - Ziting Yang
- School of Software, Taiyuan University of Technology, Taiyuan, China
| | - Xiufang Feng
- School of Software, Taiyuan University of Technology, Taiyuan, China
| |
Collapse
|
15
|
Sederman C, Yang CH, Cortes-Sanchez E, Di Sera T, Huang X, Scherer SD, Zhao L, Chu Z, White ER, Atkinson A, Wagstaff J, Varley KE, Lewis MT, Qiao Y, Welm BE, Welm AL, Marth GT. A precision oncology-focused deep learning framework for personalized selection of cancer therapy. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.12.628190. [PMID: 39763776 PMCID: PMC11702554 DOI: 10.1101/2024.12.12.628190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/15/2025]
Abstract
Precision oncology matches tumors to targeted therapies based on the presence of actionable molecular alterations. However, most tumors lack actionable alterations, restricting treatment options to cytotoxic chemotherapies for which few data-driven prioritization strategies currently exist. Here, we report an integrated computational/experimental treatment selection approach applicable for both chemotherapies and targeted agents irrespective of actionable alterations. We generated functional drug response data on a large collection of patient-derived tumor models and used it to train ScreenDL, a novel deep learning-based cancer drug response prediction model. ScreenDL leverages the combination of tumor omic and functional drug screening data to predict the most efficacious treatments. We show that ScreenDL accurately predicts response to drugs with diverse mechanisms, outperforming existing methods and approved biomarkers. In our preclinical study, this approach achieved superior clinical benefit and objective response rates in breast cancer patient-derived xenografts, suggesting that testing ScreenDL in clinical trials may be warranted.
Collapse
Affiliation(s)
- Casey Sederman
- Eccles Institute of Human Genetics, University of Utah, Salt Lake City, Utah, USA
| | - Chieh-Hsiang Yang
- Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah, USA
- Department of Oncological Sciences, University of Utah, Salt Lake City, UT, USA
| | - Emilio Cortes-Sanchez
- Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah, USA
- Department of Oncological Sciences, University of Utah, Salt Lake City, UT, USA
| | - Tony Di Sera
- Eccles Institute of Human Genetics, University of Utah, Salt Lake City, Utah, USA
| | - Xiaomeng Huang
- Eccles Institute of Human Genetics, University of Utah, Salt Lake City, Utah, USA
| | - Sandra D Scherer
- Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah, USA
- Department of Oncological Sciences, University of Utah, Salt Lake City, UT, USA
| | - Ling Zhao
- Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah, USA
- Department of Oncological Sciences, University of Utah, Salt Lake City, UT, USA
| | - Zhengtao Chu
- Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah, USA
- Department of Oncological Sciences, University of Utah, Salt Lake City, UT, USA
| | - Eliza R White
- Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah, USA
- Department of Oncological Sciences, University of Utah, Salt Lake City, UT, USA
| | - Aaron Atkinson
- Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah, USA
| | - Jadon Wagstaff
- Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah, USA
- Department of Oncological Sciences, University of Utah, Salt Lake City, UT, USA
| | - Katherine E Varley
- Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah, USA
- Department of Oncological Sciences, University of Utah, Salt Lake City, UT, USA
| | - Michael T Lewis
- Departments of Molecular and Cellular Biology and Radiology. Lester and Sue Smith Breast Center. Dan L Duncan Comprehensive Cancer Center. Baylor College of Medicine, Houston, Texas, USA
| | - Yi Qiao
- Department of Biomedical Informatics, School of Medicine, University of Utah, Salt Lake City, UT, USA
| | - Bryan E Welm
- Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah, USA
- Department of Surgery, University of Utah, Salt Lake City, UT, USA
| | - Alana L Welm
- Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah, USA
- Department of Oncological Sciences, University of Utah, Salt Lake City, UT, USA
| | - Gabor T Marth
- Eccles Institute of Human Genetics, University of Utah, Salt Lake City, Utah, USA
| |
Collapse
|
16
|
Jiang Z, Li P. DeepDR: a deep learning library for drug response prediction. Bioinformatics 2024; 40:btae688. [PMID: 39558584 PMCID: PMC11629690 DOI: 10.1093/bioinformatics/btae688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Revised: 10/29/2024] [Accepted: 11/13/2024] [Indexed: 11/20/2024] Open
Abstract
SUMMARY Accurate drug response prediction is critical to advancing precision medicine and drug discovery. Recent advances in deep learning (DL) have shown promise in predicting drug response; however, the lack of convenient tools to support such modeling limits their widespread application. To address this, we introduce DeepDR, the first DL library specifically developed for drug response prediction. DeepDR simplifies the process by automating drug and cell featurization, model construction, training, and inference, all achievable with brief programming. The library incorporates three types of drug features along with nine drug encoders, four types of cell features along with nine cell encoders, and two fusion modules, enabling the implementation of up to 135 DL models for drug response prediction. We also explored benchmarking performance with DeepDR, and the optimal models are available on a user-friendly visual interface. AVAILABILITY AND IMPLEMENTATION DeepDR can be installed from PyPI (https://pypi.org/project/deepdr). The source code and experimental data are available on GitHub (https://github.com/user15632/DeepDR).
Collapse
Affiliation(s)
- Zhengxiang Jiang
- School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi 710126, China
- School of Electronic Engineering, Xidian University, Xi’an, Shaanxi 710126, China
| | - Pengyong Li
- School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi 710126, China
| |
Collapse
|
17
|
Yang K, Cheng J, Cao S, Pan X, Shen HB, Yuan Y. Predicting transcriptional changes induced by molecules with MiTCP. Brief Bioinform 2024; 26:bbaf006. [PMID: 39847444 PMCID: PMC11756340 DOI: 10.1093/bib/bbaf006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Revised: 12/05/2024] [Accepted: 01/21/2025] [Indexed: 01/24/2025] Open
Abstract
Studying the changes in cellular transcriptional profiles induced by small molecules can significantly advance our understanding of cellular state alterations and response mechanisms under chemical perturbations, which plays a crucial role in drug discovery and screening processes. Considering that experimental measurements need substantial time and cost, we developed a deep learning-based method called Molecule-induced Transcriptional Change Predictor (MiTCP) to predict changes in transcriptional profiles (CTPs) of 978 landmark genes induced by molecules. MiTCP utilizes graph neural network-based approaches to simultaneously model molecular structure representation and gene co-expression relationships, and integrates them for CTP prediction. After training on the L1000 dataset, MiTCP achieves an average Pearson correlation coefficient (PCC) of 0.482 on the test set and an average PCC of 0.801 for predicting the top 50 differentially expressed genes, which outperforms other existing methods. Furthermore, we used MiTCP to predict CTPs of three cancer drugs, palbociclib, irinotecan and goserelin, and performed gene enrichment analysis on the top differentially expressed genes and found that the enriched pathways and Gene Ontology terms are highly relevant to the corresponding diseases, which reveals the potential of MiTCP in drug development.
Collapse
Affiliation(s)
- Kaiyuan Yang
- Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
| | - Jiabei Cheng
- Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
| | - Shenghao Cao
- Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
| | - Xiaoyong Pan
- Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
| | - Hong-Bin Shen
- Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
| | - Ye Yuan
- Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
- State Key Laboratory of Biopharmaceutical Preparation and Delivery, Institute of Process Engineering, Chinese Academy of Sciences, 1 North 2nd Street, Zhongguancun, Haidian District, Beijing 100190, China
| |
Collapse
|
18
|
Karampuri A, Jakkula BK, Perugu S. ResisenseNet hybrid neural network model for predicting drug sensitivity and repurposing in breast Cancer. Sci Rep 2024; 14:23949. [PMID: 39397003 PMCID: PMC11471817 DOI: 10.1038/s41598-024-71076-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Accepted: 08/23/2024] [Indexed: 10/15/2024] Open
Abstract
Breast cancer remains a leading cause of mortality among women worldwide, with drug resistance driven by transcription factors and mutations posing significant challenges. To address this, we present ResisenseNet, a predictive model for drug sensitivity and resistance. ResisenseNet integrates transcription factor expression, genomic markers, drugs, and molecular descriptors, employing a hybrid architecture of 1D-CNN + LSTM and DNN to effectively learn long-range and temporal patterns from amino acid sequences and transcription factor data. The model demonstrated exceptional predictive accuracy, achieving a validation accuracy of 0.9794 and a loss value of 0.042. Comprehensive validation included comparisons with state-of-the-art models and ablation studies, confirming the robustness of the developed architecture. ResisenseNet has been applied to repurpose existing anticancer drugs across 14 different cancers, with a focus on breast cancer. Among the malignancies studied, drugs targeting Low-grade Glioma (LGG) and Lung Adenocarcinoma (LUAD) showed increased sensitivity to breast cancer as per ResisenseNet's assessment. Further evaluation of the predicted sensitive drugs revealed that 14 had no prior history of anticancer activity against breast cancer. These drugs target key signaling pathways involved in breast cancer, presenting novel therapeutic opportunities. ResisenseNet addresses drug resistance by filtering ineffective compounds and enhancing chemotherapy for breast cancer. In vitro studies on sensitive drugs provide valuable insights into breast cancer prognosis, contributing to improved treatment strategies.
Collapse
Affiliation(s)
- Anush Karampuri
- Department of Biotechnology, National Institute of Technology, Warangal, 500604, India
| | - Bharath Kumar Jakkula
- Department of Biotechnology, National Institute of Technology, Warangal, 500604, India
| | - Shyam Perugu
- Department of Biotechnology, National Institute of Technology, Warangal, 500604, India.
| |
Collapse
|
19
|
Xia X, Zhu C, Zhong F, Liu L. TransCDR: a deep learning model for enhancing the generalizability of drug activity prediction through transfer learning and multimodal data fusion. BMC Biol 2024; 22:227. [PMID: 39385185 PMCID: PMC11462810 DOI: 10.1186/s12915-024-02023-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Accepted: 09/30/2024] [Indexed: 10/11/2024] Open
Abstract
BACKGROUND Accurate and robust drug response prediction is of utmost importance in precision medicine. Although many models have been developed to utilize the representations of drugs and cancer cell lines for predicting cancer drug responses (CDR), their performances can be improved by addressing issues such as insufficient data modality, suboptimal fusion algorithms, and poor generalizability for novel drugs or cell lines. RESULTS We introduce TransCDR, which uses transfer learning to learn drug representations and fuses multi-modality features of drugs and cell lines by a self-attention mechanism, to predict the IC50 values or sensitive states of drugs on cell lines. We are the first to systematically evaluate the generalization of the CDR prediction model to novel (i.e., never-before-seen) compound scaffolds and cell line clusters. TransCDR shows better generalizability than 8 state-of-the-art models. TransCDR outperforms its 5 variants that train drug encoders (i.e., RNN and AttentiveFP) from scratch under various scenarios. The most critical contributors among multiple drug notations and omics profiles are Extended Connectivity Fingerprint and genetic mutation. Additionally, the attention-based fusion module further enhances the predictive performance of TransCDR. TransCDR, trained on the GDSC dataset, demonstrates strong predictive performance on the external testing set CCLE. It is also utilized to predict missing CDRs on GDSC. Moreover, we investigate the biological mechanisms underlying drug response by classifying 7675 patients from TCGA into drug-sensitive or drug-resistant groups, followed by a Gene Set Enrichment Analysis. CONCLUSIONS TransCDR emerges as a potent tool with significant potential in drug response prediction.
Collapse
Affiliation(s)
- Xiaoqiong Xia
- Institutes of Biomedical Sciences, Fudan University, Shanghai, 200032, China
| | - Chaoyu Zhu
- Intelligent Medicine Institute, Fudan University, Shanghai, 200032, China
| | - Fan Zhong
- Intelligent Medicine Institute, Fudan University, Shanghai, 200032, China.
| | - Lei Liu
- Intelligent Medicine Institute, Fudan University, Shanghai, 200032, China.
- Shanghai Institute of Stem Cell Research and Clinical Translation, Shanghai, 200120, China.
| |
Collapse
|
20
|
Hu X, Zhang P, Zhang J, Deng L. DeepFusionCDR: Employing Multi-Omics Integration and Molecule-Specific Transformers for Enhanced Prediction of Cancer Drug Responses. IEEE J Biomed Health Inform 2024; 28:6248-6258. [PMID: 38935469 DOI: 10.1109/jbhi.2024.3417014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024]
Abstract
Deep learning approaches have demonstrated remarkable potential in predicting cancer drug responses (CDRs), using cell line and drug features. However, existing methods predominantly rely on single-omics data of cell lines, potentially overlooking the complex biological mechanisms governing cell line responses. This paper introduces DeepFusionCDR, a novel approach employing unsupervised contrastive learning to amalgamate multi-omics features, including mutation, transcriptome, methylome, and copy number variation data, from cell lines. Furthermore, we incorporate molecular SMILES-specific transformers to derive drug features from their chemical structures. The unified multi-omics and drug signatures are combined, and a multi-layer perceptron (MLP) is applied to predict IC50 values for cell line-drug pairs. Moreover, this MLP can discern whether a cell line is resistant or sensitive to a particular drug. We assessed DeepFusionCDR's performance on the GDSC dataset and juxtaposed it against cutting-edge methods, demonstrating its superior performance in regression and classification tasks. We also conducted ablation studies and case analyses to exhibit the effectiveness and versatility of our proposed approach. Our results underscore the potential of DeepFusionCDR to enhance CDR predictions by harnessing the power of multi-omics fusion and molecular-specific transformers. The prediction of DeepFusionCDR on TCGA patient data and case study highlight the practical application scenarios of DeepFusionCDR in real-world environments.
Collapse
|
21
|
Saranya KR, Vimina ER. DRN-CDR: A cancer drug response prediction model using multi-omics and drug features. Comput Biol Chem 2024; 112:108175. [PMID: 39191166 DOI: 10.1016/j.compbiolchem.2024.108175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 08/09/2024] [Accepted: 08/14/2024] [Indexed: 08/29/2024]
Abstract
Cancer drug response (CDR) prediction is an important area of research that aims to personalize cancer therapy, optimizing treatment plans for maximum effectiveness while minimizing potential negative effects. Despite the advancements in Deep learning techniques, the effective integration of multi-omics data for drug response prediction remains challenging. In this paper, a regression method using Deep ResNet for CDR (DRN-CDR) prediction is proposed. We aim to explore the potential of considering sole cancer genes in drug response prediction. Here the multi-omics data such as gene expressions, mutation data, and methylation data along with the molecular structural information of drugs were integrated to predict the IC50 values of drugs. Drug features are extracted by employing a Uniform Graph Convolution Network, while Cell line features are extracted using a combination of Convolutional Neural Network and Fully Connected Networks. These features are then concatenated and fed into a deep ResNet for the prediction of IC50 values between Drug - Cell line pairs. The proposed method yielded higher Pearson's correlation coefficient (rp) of 0.7938 with lowest Root Mean Squared Error (RMSE) value of 0.92 when compared with similar methods of tCNNS, MOLI, DeepCDR, TGSA, NIHGCN, DeepTTA, GraTransDRP and TSGCNN. Further, when the model is extended to a classification problem to categorize drugs as sensitive or resistant, we achieved AUC and AUPR measures of 0.7623 and 0.7691, respectively. The drugs such as Tivozanib, SNX-2112, CGP-60474, PHA-665752, Foretinib etc., exhibited low median IC50 values and were found to be effective anti-cancer drugs. The case studies with different TCGA cancer types also revealed the effectiveness of SNX-2112, CGP-60474, Foretinib, Cisplatin, Vinblastine etc. This consistent pattern strongly suggests the effectiveness of the model in predicting CDR.
Collapse
Affiliation(s)
- K R Saranya
- Department of Computer Science and IT, School of Computing, Amrita Vishwa Vidyapeetham, Kochi Campus, India
| | - E R Vimina
- Department of Computer Science and IT, School of Computing, Amrita Vishwa Vidyapeetham, Kochi Campus, India.
| |
Collapse
|
22
|
Gu Y, Xu Z, Yang C. Empowering Graph Neural Network-Based Computational Drug Repositioning with Large Language Model-Inferred Knowledge Representation. Interdiscip Sci 2024:10.1007/s12539-024-00654-7. [PMID: 39325266 DOI: 10.1007/s12539-024-00654-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 08/15/2024] [Accepted: 08/19/2024] [Indexed: 09/27/2024]
Abstract
Computational drug repositioning, through predicting drug-disease associations (DDA), offers significant potential for discovering new drug indications. Current methods incorporate graph neural networks (GNN) on drug-disease heterogeneous networks to predict DDAs, achieving notable performances compared to traditional machine learning and matrix factorization approaches. However, these methods depend heavily on network topology, hampered by incomplete and noisy network data, and overlook the wealth of biomedical knowledge available. Correspondingly, large language models (LLMs) excel in graph search and relational reasoning, which can possibly enhance the integration of comprehensive biomedical knowledge into drug and disease profiles. In this study, we first investigate the contribution of LLM-inferred knowledge representation in drug repositioning and DDA prediction. A zero-shot prompting template was designed for LLM to extract high-quality knowledge descriptions for drug and disease entities, followed by embedding generation from language models to transform the discrete text to continual numerical representation. Then, we proposed LLM-DDA with three different model architectures (LLM-DDANode Feat, LLM-DDADual GNN, LLM-DDAGNN-AE) to investigate the best fusion mode for LLM-based embeddings. Extensive experiments on four DDA benchmarks show that, LLM-DDAGNN-AE achieved the optimal performance compared to 11 baselines with the overall relative improvement in AUPR of 23.22%, F1-Score of 17.20%, and precision of 25.35%. Meanwhile, selected case studies of involving Prednisone and Allergic Rhinitis highlighted the model's capability to identify reliable DDAs and knowledge descriptions, supported by existing literature. This study showcases the utility of LLMs in drug repositioning with its generality and applicability in other biomedical relation prediction tasks.
Collapse
Affiliation(s)
- Yaowen Gu
- Department of Chemistry, New York University, New York, NY, 10003, USA
| | - Zidu Xu
- School of Nursing, Columbia University, 560 W 168th Street, New York, NY, 10032, USA.
| | - Carl Yang
- Department of Computer Science, Emory College of Arts and Sciences, Emory University, Atlanta, GA, 30322, USA
| |
Collapse
|
23
|
Gutierrez JJG, Lau E, Dharmapalan S, Parker M, Chen Y, Álvarez MA, Wang D. Multi-output prediction of dose-response curves enables drug repositioning and biomarker discovery. NPJ Precis Oncol 2024; 8:209. [PMID: 39304771 DOI: 10.1038/s41698-024-00691-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 08/28/2024] [Indexed: 09/22/2024] Open
Abstract
Drug response prediction is hampered by uncertainty in the measures of response and selection of doses. In this study, we propose a probabilistic multi-output model to simultaneously predict all dose-responses and uncover their biomarkers. By describing the relationship between genomic features and chemical properties to every response at every dose, our multi-output Gaussian Process (MOGP) models enable assessment of drug efficacy using any dose-response metric. This approach was tested across two drug screening studies and ten cancer types. Kullback-leibler divergence measured the importance of each feature and identified EZH2 gene as a novel biomarker of BRAF inhibitor response. We demonstrate the effectiveness of our MOGP models in accurately predicting dose-responses in different cancer types and when there is a limited number of drug screening experiments for training. Our findings highlight the potential of MOGP models in enhancing drug development pipelines by reducing data requirements and improving precision in dose-response predictions.
Collapse
Affiliation(s)
- Juan-José Giraldo Gutierrez
- National Heart and Lung Institute, Imperial College London, London, UK.
- Department of Computer Science, The University of Sheffield, Sheffield, UK.
| | - Evelyn Lau
- Institute for Human Development and Potential, Agency for Science Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Subhashini Dharmapalan
- Department of Computer Science, The University of Sheffield, Sheffield, UK
- Institute for Human Development and Potential, Agency for Science Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Melody Parker
- Nuffield Department of Clinical Medicine, University of Oxford, John Radcliffe Hospital, Oxford, UK
- Big Data Institute at the Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Yurui Chen
- Institute for Human Development and Potential, Agency for Science Technology and Research (A*STAR), Singapore, Republic of Singapore
- Department of Mathematics, National University of Singapore, Singapore, Republic of Singapore
| | - Mauricio A Álvarez
- Department of Computer Science, The University of Manchester, Manchester, UK
| | - Dennis Wang
- National Heart and Lung Institute, Imperial College London, London, UK.
- Department of Computer Science, The University of Sheffield, Sheffield, UK.
- Institute for Human Development and Potential, Agency for Science Technology and Research (A*STAR), Singapore, Republic of Singapore.
- Bioinformatics Institute (BII), Agency for Science Technology and Research (A*STAR), Singapore, Republic of Singapore.
| |
Collapse
|
24
|
Connell W, Garcia K, Goodarzi H, Keiser MJ. Learning chemical sensitivity reveals mechanisms of cellular response. Commun Biol 2024; 7:1149. [PMID: 39278951 PMCID: PMC11402971 DOI: 10.1038/s42003-024-06865-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 09/06/2024] [Indexed: 09/18/2024] Open
Abstract
Chemical probes interrogate disease mechanisms at the molecular level by linking genetic changes to observable traits. However, comprehensive chemical screens in diverse biological models are impractical. To address this challenge, we develop ChemProbe, a model that predicts cellular sensitivity to hundreds of molecular probes and drugs by learning to combine transcriptomes and chemical structures. Using ChemProbe, we infer the chemical sensitivity of cancer cell lines and tumor samples and analyze how the model makes predictions. We retrospectively evaluate drug response predictions for precision breast cancer treatment and prospectively validate chemical sensitivity predictions in new cellular models, including a genetically modified cell line. Our model interpretation analysis identifies transcriptome features reflecting compound targets and protein network modules, identifying genes that drive ferroptosis. ChemProbe is an interpretable in silico screening tool that allows researchers to measure cellular response to diverse compounds, facilitating research into molecular mechanisms of chemical sensitivity.
Collapse
Affiliation(s)
- William Connell
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, USA
- Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Kristle Garcia
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Department of Urology, University of California, San Francisco, San Francisco, CA, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
| | - Hani Goodarzi
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Department of Urology, University of California, San Francisco, San Francisco, CA, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
| | - Michael J Keiser
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, USA.
- Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, CA, USA.
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
25
|
Xu M, Zhu Z, Zhao Y, He K, Huang Q, Zhao Y. RedCDR: Dual Relation Distillation for Cancer Drug Response Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1468-1479. [PMID: 38776197 DOI: 10.1109/tcbb.2024.3404262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2024]
Abstract
Based on multi-omics data and drug information, predicting the response of cancer cell lines to drugs is a crucial area of research in modern oncology, as it can promote the development of personalized treatments. Despite the promising performance achieved by existing models, most of them overlook the variations among different omics and lack effective integration of multi-omics data. Moreover, the explicit modeling of cell line/drug attribute and cell line-drug association has not been thoroughly investigated in existing approaches. To address these issues, we propose RedCDR, a dual relation distillation model for cancer drug response (CDR) prediction. Specifically, a parallel dual-branch architecture is designed to enable both the independent learning and interactive fusion feasible for cell line/drug attribute and cell line-drug association information. To facilitate the adaptive interacting integration of multi-omics data, the proposed multi-omics encoder introduces the multiple similarity relations between cell lines and takes the importance of different omics data into account. To accomplish knowledge transfer from the two independent attribute and association branches to their fusion, a dual relation distillation mechanism consisting of representation distillation and prediction distillation is presented. Experiments conducted on the GDSC and CCLE datasets show that RedCDR outperforms previous state-of-the-art approaches in CDR prediction.
Collapse
|
26
|
Huang Z, Fan Z, Shen S, Wu M, Deng L. MolMVC: Enhancing molecular representations for drug-related tasks through multi-view contrastive learning. Bioinformatics 2024; 40:ii190-ii197. [PMID: 39230706 PMCID: PMC11373324 DOI: 10.1093/bioinformatics/btae386] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024] Open
Abstract
MOTIVATION Effective molecular representation is critical in drug development. The complex nature of molecules demands comprehensive multi-view representations, considering 1D, 2D, and 3D aspects, to capture diverse perspectives. Obtaining representations that encompass these varied structures is crucial for a holistic understanding of molecules in drug-related contexts. RESULTS In this study, we introduce an innovative multi-view contrastive learning framework for molecular representation, denoted as MolMVC. Initially, we use a Transformer encoder to capture 1D sequence information and a Graph Transformer to encode the intricate 2D and 3D structural details of molecules. Our approach incorporates a novel attention-guided augmentation scheme, leveraging prior knowledge to create positive samples tailored to different molecular data views. To align multi-view molecular positive samples effectively in latent space, we introduce an adaptive multi-view contrastive loss (AMCLoss). In particular, we calculate AMCLoss at various levels within the model to effectively capture the hierarchical nature of the molecular information. Eventually, we pre-train the encoders via minimizing AMCLoss to obtain the molecular representation, which can be used for various down-stream tasks. In our experiments, we evaluate the performance of our MolMVC on multiple tasks, including molecular property prediction (MPP), drug-target binding affinity (DTA) prediction and cancer drug response (CDR) prediction. The results demonstrate that the molecular representation learned by our MolMVC can enhance the predictive accuracy on these tasks and also reduce the computational costs. Furthermore, we showcase MolMVC's efficacy in drug repositioning across a spectrum of drug-related applications. AVAILABILITY AND IMPLEMENTATION The code and pre-trained model are publicly available at https://github.com/Hhhzj-7/MolMVC.
Collapse
Affiliation(s)
- Zhijian Huang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Ziyu Fan
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Siyuan Shen
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Min Wu
- Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
27
|
Hao M, Gong J, Zeng X, Liu C, Guo Y, Cheng X, Wang T, Ma J, Zhang X, Song L. Large-scale foundation model on single-cell transcriptomics. Nat Methods 2024; 21:1481-1491. [PMID: 38844628 DOI: 10.1038/s41592-024-02305-7] [Citation(s) in RCA: 66] [Impact Index Per Article: 66.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 05/10/2024] [Indexed: 08/10/2024]
Abstract
Large pretrained models have become foundation models leading to breakthroughs in natural language processing and related fields. Developing foundation models for deciphering the 'languages' of cells and facilitating biomedical research is promising yet challenging. Here we developed a large pretrained model scFoundation, also named 'xTrimoscFoundationα', with 100 million parameters covering about 20,000 genes, pretrained on over 50 million human single-cell transcriptomic profiles. scFoundation is a large-scale model in terms of the size of trainable parameters, dimensionality of genes and volume of training data. Its asymmetric transformer-like architecture and pretraining task design empower effectively capturing complex context relations among genes in a variety of cell types and states. Experiments showed its merit as a foundation model that achieved state-of-the-art performances in a diverse array of single-cell analysis tasks such as gene expression enhancement, tissue drug response prediction, single-cell drug response classification, single-cell perturbation prediction, cell type annotation and gene module inference.
Collapse
Affiliation(s)
- Minsheng Hao
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing, China
- BioMap, Beijing, China
| | | | | | | | | | | | | | - Jianzhu Ma
- Department of Electrical Engineering, Tsinghua University, Beijing, China.
- Institute for AI Industry Research, Tsinghua University, Beijing, China.
| | - Xuegong Zhang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing, China.
- School of Life Sciences and School of Medicine, Center for Synthetic and Systems Biology, Tsinghua University, Beijing, China.
| | - Le Song
- BioMap, Beijing, China.
- Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE.
| |
Collapse
|
28
|
Mohammadzadeh-Vardin T, Ghareyazi A, Gharizadeh A, Abbasi K, Rabiee HR. DeepDRA: Drug repurposing using multi-omics data integration with autoencoders. PLoS One 2024; 19:e0307649. [PMID: 39058696 PMCID: PMC11280260 DOI: 10.1371/journal.pone.0307649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 07/09/2024] [Indexed: 07/28/2024] Open
Abstract
Cancer treatment has become one of the biggest challenges in the world today. Different treatments are used against cancer; drug-based treatments have shown better results. On the other hand, designing new drugs for cancer is costly and time-consuming. Some computational methods, such as machine learning and deep learning, have been suggested to solve these challenges using drug repurposing. Despite the promise of classical machine-learning methods in repurposing cancer drugs and predicting responses, deep-learning methods performed better. This study aims to develop a deep-learning model that predicts cancer drug response based on multi-omics data, drug descriptors, and drug fingerprints and facilitates the repurposing of drugs based on those responses. To reduce multi-omics data's dimensionality, we use autoencoders. As a multi-task learning model, autoencoders are connected to MLPs. We extensively tested our model using three primary datasets: GDSC, CTRP, and CCLE to determine its efficacy. In multiple experiments, our model consistently outperforms existing state-of-the-art methods. Compared to state-of-the-art models, our model achieves an impressive AUPRC of 0.99. Furthermore, in a cross-dataset evaluation, where the model is trained on GDSC and tested on CCLE, it surpasses the performance of three previous works, achieving an AUPRC of 0.72. In conclusion, we presented a deep learning model that outperforms the current state-of-the-art regarding generalization. Using this model, we could assess drug responses and explore drug repurposing, leading to the discovery of novel cancer drugs. Our study highlights the potential for advanced deep learning to advance cancer therapeutic precision.
Collapse
Affiliation(s)
- Taha Mohammadzadeh-Vardin
- Department of Computer Engineering, Bioinformatics and Computational Biology Lab, Sharif University of Technology, Tehran, Iran
| | - Amin Ghareyazi
- Department of Computer Engineering, Bioinformatics and Computational Biology Lab, Sharif University of Technology, Tehran, Iran
| | - Ali Gharizadeh
- Department of Computer Engineering, Bioinformatics and Computational Biology Lab, Sharif University of Technology, Tehran, Iran
| | - Karim Abbasi
- Department of Computer Engineering, Bioinformatics and Computational Biology Lab, Sharif University of Technology, Tehran, Iran
- Faculty of Mathematics and Computer Science, Kharazmi University, Tehran, Iran
| | - Hamid R. Rabiee
- Department of Computer Engineering, Bioinformatics and Computational Biology Lab, Sharif University of Technology, Tehran, Iran
| |
Collapse
|
29
|
Lenhof K, Eckhart L, Rolli LM, Lenhof HP. Trust me if you can: a survey on reliability and interpretability of machine learning approaches for drug sensitivity prediction in cancer. Brief Bioinform 2024; 25:bbae379. [PMID: 39101498 PMCID: PMC11299037 DOI: 10.1093/bib/bbae379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 07/08/2024] [Accepted: 07/19/2024] [Indexed: 08/06/2024] Open
Abstract
With the ever-increasing number of artificial intelligence (AI) systems, mitigating risks associated with their use has become one of the most urgent scientific and societal issues. To this end, the European Union passed the EU AI Act, proposing solution strategies that can be summarized under the umbrella term trustworthiness. In anti-cancer drug sensitivity prediction, machine learning (ML) methods are developed for application in medical decision support systems, which require an extraordinary level of trustworthiness. This review offers an overview of the ML landscape of methods for anti-cancer drug sensitivity prediction, including a brief introduction to the four major ML realms (supervised, unsupervised, semi-supervised, and reinforcement learning). In particular, we address the question to what extent trustworthiness-related properties, more specifically, interpretability and reliability, have been incorporated into anti-cancer drug sensitivity prediction methods over the previous decade. In total, we analyzed 36 papers with approaches for anti-cancer drug sensitivity prediction. Our results indicate that the need for reliability has hardly been addressed so far. Interpretability, on the other hand, has often been considered for model development. However, the concept is rather used intuitively, lacking clear definitions. Thus, we propose an easily extensible taxonomy for interpretability, unifying all prevalent connotations explicitly or implicitly used within the field.
Collapse
Affiliation(s)
- Kerstin Lenhof
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, D-66123 Saarbrücken, Saarland, Germany
| | - Lea Eckhart
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, D-66123 Saarbrücken, Saarland, Germany
| | - Lisa-Marie Rolli
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, D-66123 Saarbrücken, Saarland, Germany
| | - Hans-Peter Lenhof
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, D-66123 Saarbrücken, Saarland, Germany
| |
Collapse
|
30
|
Shukla N, Shamim U, Agarwal P, Pandey R, Narayan J. From bench to bedside: potential of translational research in COVID-19 and beyond. Brief Funct Genomics 2024; 23:349-362. [PMID: 37986554 DOI: 10.1093/bfgp/elad051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 10/25/2023] [Accepted: 11/02/2023] [Indexed: 11/22/2023] Open
Abstract
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and coronavirus disease 2019 (COVID-19) have been around for more than 3 years now. However, due to constant viral evolution, novel variants are emerging, leaving old treatment protocols redundant. As treatment options dwindle, infection rates continue to rise and seasonal infection surges become progressively common across the world, rapid solutions are required. With genomic and proteomic methods generating enormous amounts of data to expand our understanding of SARS-CoV-2 biology, there is an urgent requirement for the development of novel therapeutic methods that can allow translational research to flourish. In this review, we highlight the current state of COVID-19 in the world and the effects of post-infection sequelae. We present the contribution of translational research in COVID-19, with various current and novel therapeutic approaches, including antivirals, monoclonal antibodies and vaccines, as well as alternate treatment methods such as immunomodulators, currently being studied and reiterate the importance of translational research in the development of various strategies to contain COVID-19.
Collapse
Affiliation(s)
- Nityendra Shukla
- CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mall Road, Near Jubilee Hall, New Delhi, 110007, India
| | - Uzma Shamim
- CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mall Road, Near Jubilee Hall, New Delhi, 110007, India
| | - Preeti Agarwal
- CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mall Road, Near Jubilee Hall, New Delhi, 110007, India
| | - Rajesh Pandey
- CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mall Road, Near Jubilee Hall, New Delhi, 110007, India
| | - Jitendra Narayan
- CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mall Road, Near Jubilee Hall, New Delhi, 110007, India
| |
Collapse
|
31
|
Wall P, Ideker T. Representing mutations for predicting cancer drug response. Bioinformatics 2024; 40:i160-i168. [PMID: 38940147 PMCID: PMC11256944 DOI: 10.1093/bioinformatics/btae209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION Predicting cancer drug response requires a comprehensive assessment of many mutations present across a tumor genome. While current drug response models generally use a binary mutated/unmutated indicator for each gene, not all mutations in a gene are equivalent. RESULTS Here, we construct and evaluate a series of predictive models based on leading methods for quantitative mutation scoring. Such methods include VEST4 and CADD, which score the impact of a mutation on gene function, and CHASMplus, which scores the likelihood a mutation drives cancer. The resulting predictive models capture cellular responses to dabrafenib, which targets BRAF-V600 mutations, whereas models based on binary mutation status do not. Performance improvements generalize to other drugs, extending genetic indications for PIK3CA, ERBB2, EGFR, PARP1, and ABL1 inhibitors. Introducing quantitative mutation features in drug response models increases performance and mechanistic understanding. AVAILABILITY AND IMPLEMENTATION Code and example datasets are available at https://github.com/pgwall/qms.
Collapse
Affiliation(s)
- Patrick Wall
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, United States
| | - Trey Ideker
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, United States
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, United States
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, United States
| |
Collapse
|
32
|
Abinas V, Abhinav U, Haneem EM, Vishnusankar A, Nazeer KAA. Integration of autoencoder and graph convolutional network for predicting breast cancer drug response. J Bioinform Comput Biol 2024; 22:2450013. [PMID: 39051144 DOI: 10.1142/s0219720024500136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
Background and objectives: Breast cancer is the most prevalent type of cancer among women. The effectiveness of anticancer pharmacological therapy may get adversely affected by tumor heterogeneity that includes genetic and transcriptomic features. This leads to clinical variability in patient response to therapeutic drugs. Anticancer drug design and cancer understanding require precise identification of cancer drug responses. The performance of drug response prediction models can be improved by integrating multi-omics data and drug structure data. Methods: In this paper, we propose an Autoencoder (AE) and Graph Convolutional Network (AGCN) for drug response prediction, which integrates multi-omics data and drug structure data. Specifically, we first converted the high dimensional representation of each omic data to a lower dimensional representation using an AE for each omic data set. Subsequently, these individual features are combined with drug structure data obtained using a Graph Convolutional Network and given to a Convolutional Neural Network to calculate IC[Formula: see text] values for every combination of cell lines and drugs. Then a threshold IC[Formula: see text] value is obtained for each drug by performing K-means clustering of their known IC[Formula: see text] values. Finally, with the help of this threshold value, cell lines are classified as either sensitive or resistant to each drug. Results: Experimental results indicate that AGCN has an accuracy of 0.82 and performs better than many existing methods. In addition to that, we have done external validation of AGCN using data taken from The Cancer Genome Atlas (TCGA) clinical database, and we got an accuracy of 0.91. Conclusion: According to the results obtained, concatenating multi-omics data with drug structure data using AGCN for drug response prediction tasks greatly improves the accuracy of the prediction task.
Collapse
Affiliation(s)
- V Abinas
- Department of Computer Science and Engineering, National Institute of Technology Calicut, Calicut, Kerala, India
| | - U Abhinav
- Department of Computer Science and Engineering, National Institute of Technology Calicut, Calicut, Kerala, India
| | - E M Haneem
- Department of Computer Science and Engineering, National Institute of Technology Calicut, Calicut, Kerala, India
| | - A Vishnusankar
- Department of Computer Science and Engineering, National Institute of Technology Calicut, Calicut, Kerala, India
| | - K A Abdul Nazeer
- Department of Computer Science and Engineering, National Institute of Technology Calicut, Calicut, Kerala, India
| |
Collapse
|
33
|
Campana PA, Prasse P, Lienhard M, Thedinga K, Herwig R, Scheffer T. Cancer drug sensitivity estimation using modular deep Graph Neural Networks. NAR Genom Bioinform 2024; 6:lqae043. [PMID: 38680251 PMCID: PMC11055499 DOI: 10.1093/nargab/lqae043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 03/01/2024] [Accepted: 04/17/2024] [Indexed: 05/01/2024] Open
Abstract
Computational drug sensitivity models have the potential to improve therapeutic outcomes by identifying targeted drugs components that are tailored to the transcriptomic profile of a given primary tumor. The SMILES representation of molecules that is used by state-of-the-art drug-sensitivity models is not conducive for neural networks to generalize to new drugs, in part because the distance between atoms does not generally correspond to the distance between their representation in the SMILES strings. Graph-attention networks, on the other hand, are high-capacity models that require large training-data volumes which are not available for drug-sensitivity estimation. We develop a modular drug-sensitivity graph-attentional neural network. The modular architecture allows us to separately pre-train the graph encoder and graph-attentional pooling layer on related tasks for which more data are available. We observe that this model outperforms reference models for the use cases of precision oncology and drug discovery; in particular, it is better able to predict the specific interaction between drug and cell line that is not explained by the general cytotoxicity of the drug and the overall survivability of the cell line. The complete source code is available at https://zenodo.org/doi/10.5281/zenodo.8020945. All experiments are based on the publicly available GDSC data.
Collapse
Affiliation(s)
- Pedro A Campana
- University of Potsdam, Department of Computer Science, Potsdam, Germany
| | - Paul Prasse
- University of Potsdam, Department of Computer Science, Potsdam, Germany
| | - Matthias Lienhard
- Max Planck Institute for Molecular Genetics, Department Computational Molecular Biology, Berlin, Germany
| | - Kristina Thedinga
- Max Planck Institute for Molecular Genetics, Department Computational Molecular Biology, Berlin, Germany
| | - Ralf Herwig
- Max Planck Institute for Molecular Genetics, Department Computational Molecular Biology, Berlin, Germany
| | - Tobias Scheffer
- University of Potsdam, Department of Computer Science, Potsdam, Germany
| |
Collapse
|
34
|
Dey V, Ning X. Improving Anticancer Drug Selection and Prioritization via Neural Learning to Rank. J Chem Inf Model 2024; 64:4071-4088. [PMID: 38740382 PMCID: PMC11134508 DOI: 10.1021/acs.jcim.3c01060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 03/27/2024] [Accepted: 04/16/2024] [Indexed: 05/16/2024]
Abstract
Personalized cancer treatment requires a thorough understanding of complex interactions between drugs and cancer cell lines in varying genetic and molecular contexts. To address this, high-throughput screening has been used to generate large-scale drug response data, facilitating data-driven computational models. Such models can capture complex drug-cell line interactions across various contexts in a fully data-driven manner. However, accurately prioritizing the most effective drugs for each cell line still remains a significant challenge. To address this, we developed multiple neural ranking approaches that leverage large-scale drug response data across multiple cell lines from diverse cancer types. Unlike existing approaches that primarily utilize regression and classification techniques for drug response prediction, we formulated the objective of drug selection and prioritization as a drug ranking problem. In this work, we proposed multiple pairwise and listwise neural ranking methods that learn latent representations of drugs and cell lines and then use those representations to score drugs in each cell line via a learnable scoring function. Specifically, we developed neural pairwise and listwise ranking methods, Pair-PushC and List-One on top of the existing methods, pLETORg and ListNet, respectively. Additionally, we proposed a novel listwise ranking method, List-All, that focuses on all the effective drugs instead of the top effective drug, unlike List-One. We also provide an exhaustive empirical evaluation with state-of-the-art regression and ranking baselines on large-scale data sets across multiple experimental settings. Our results demonstrate that our proposed ranking methods mostly outperform the best baselines with significant improvements of as much as 25.6% in terms of selecting truly effective drugs within the top 20 predicted drugs (i.e., hit@20) across 50% test cell lines. Furthermore, our analyses suggest that the learned latent spaces from our proposed methods demonstrate informative clustering structures and capture relevant underlying biological features. Moreover, our comprehensive evaluation provides a thorough and objective comparison of the performance of different methods (including our proposed ones).
Collapse
Affiliation(s)
- Vishal Dey
- Department
of Computer Science and Engineering, The
Ohio State University, Columbus, Ohio 43210, United States
| | - Xia Ning
- Department
of Computer Science and Engineering, The
Ohio State University, Columbus, Ohio 43210, United States
- Biomedical
Informatics, The Ohio State University, Columbus, Ohio 43210, United States
- Translational
Data Analytics Institute, The Ohio State
University, Columbus, Ohio 43210, United States
| |
Collapse
|
35
|
Kim Y, Han Y, Hopper C, Lee J, Joo JI, Gong JR, Lee CK, Jang SH, Kang J, Kim T, Cho KH. A gray box framework that optimizes a white box logical model using a black box optimizer for simulating cellular responses to perturbations. CELL REPORTS METHODS 2024; 4:100773. [PMID: 38744288 PMCID: PMC11133856 DOI: 10.1016/j.crmeth.2024.100773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Revised: 03/19/2024] [Accepted: 04/19/2024] [Indexed: 05/16/2024]
Abstract
Predicting cellular responses to perturbations requires interpretable insights into molecular regulatory dynamics to perform reliable cell fate control, despite the confounding non-linearity of the underlying interactions. There is a growing interest in developing machine learning-based perturbation response prediction models to handle the non-linearity of perturbation data, but their interpretation in terms of molecular regulatory dynamics remains a challenge. Alternatively, for meaningful biological interpretation, logical network models such as Boolean networks are widely used in systems biology to represent intracellular molecular regulation. However, determining the appropriate regulatory logic of large-scale networks remains an obstacle due to the high-dimensional and discontinuous search space. To tackle these challenges, we present a scalable derivative-free optimizer trained by meta-reinforcement learning for Boolean network models. The logical network model optimized by the trained optimizer successfully predicts anti-cancer drug responses of cancer cell lines, while simultaneously providing insight into their underlying molecular regulatory mechanisms.
Collapse
Affiliation(s)
- Yunseong Kim
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Younghyun Han
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Corbin Hopper
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Jonghoon Lee
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Jae Il Joo
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Jeong-Ryeol Gong
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Chun-Kyung Lee
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Seong-Hoon Jang
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Junsoo Kang
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Taeyoung Kim
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Kwang-Hyun Cho
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea.
| |
Collapse
|
36
|
Lawrence PJ, Burns B, Ning X. Enhancing drug and cell line representations via contrastive learning for improved anti-cancer drug prioritization. NPJ Precis Oncol 2024; 8:106. [PMID: 38762647 PMCID: PMC11102516 DOI: 10.1038/s41698-024-00589-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 03/22/2024] [Indexed: 05/20/2024] Open
Abstract
Due to cancer's complex nature and variable response to therapy, precision oncology informed by omics sequence analysis has become the current standard of care. However, the amount of data produced for each patient makes it difficult to quickly identify the best treatment regimen. Moreover, limited data availability has hindered computational methods' abilities to learn patterns associated with effective drug-cell line pairs. In this work, we propose the use of contrastive learning to improve learned drug and cell line representations by preserving relationship structures associated with drug mechanisms of action and cell line cancer types. In addition to achieving enhanced performance relative to a state-of-the-art method, we find that classifiers using our learned representations exhibit a more balanced reliance on drug- and cell line-derived features when making predictions. This facilitates more personalized drug prioritizations that are informed by signals related to drug resistance.
Collapse
Affiliation(s)
- Patrick J Lawrence
- Biomedical Informatics Department, The Ohio State University, 1800 Cannon Drive, Lincoln Tower 250, Columbus, OH, 43210, USA
| | - Benjamin Burns
- Computer Science and Engineering Department, The Ohio State University, 2015 Neil Avenue, Columbus, OH, 43210, USA
| | - Xia Ning
- Biomedical Informatics Department, The Ohio State University, 1800 Cannon Drive, Lincoln Tower 250, Columbus, OH, 43210, USA.
- Computer Science and Engineering Department, The Ohio State University, 2015 Neil Avenue, Columbus, OH, 43210, USA.
- Translational Data Analytics Institute, The Ohio State University, 1760 Neil Avenue, Columbus, OH, 43210, USA.
| |
Collapse
|
37
|
Yao R, Shen Z, Xu X, Ling G, Xiang R, Song T, Zhai F, Zhai Y. Knowledge mapping of graph neural networks for drug discovery: a bibliometric and visualized analysis. Front Pharmacol 2024; 15:1393415. [PMID: 38799167 PMCID: PMC11116974 DOI: 10.3389/fphar.2024.1393415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 04/12/2024] [Indexed: 05/29/2024] Open
Abstract
Introduction In recent years, graph neural network has been extensively applied to drug discovery research. Although researchers have made significant progress in this field, there is less research on bibliometrics. The purpose of this study is to conduct a comprehensive bibliometric analysis of graph neural network applications in drug discovery in order to identify current research hotspots and trends, as well as serve as a reference for future research. Methods Publications from 2017 to 2023 about the application of graph neural network in drug discovery were collected from the Web of Science Core Collection. Bibliometrix, VOSviewer, and Citespace were mainly used for bibliometric studies. Results and Discussion In this paper, a total of 652 papers from 48 countries/regions were included. Research interest in this field is continuously increasing. China and the United States have a significant advantage in terms of funding, the number of publications, and collaborations with other institutions and countries. Although some cooperation networks have been formed in this field, extensive worldwide cooperation still needs to be strengthened. The results of the keyword analysis clarified that graph neural network has primarily been applied to drug-target interaction, drug repurposing, and drug-drug interaction, while graph convolutional neural network and its related optimization methods are currently the core algorithms in this field. Data availability and ethical supervision, balancing computing resources, and developing novel graph neural network models with better interpretability are the key technical issues currently faced. This paper analyzes the current state, hot spots, and trends of graph neural network applications in drug discovery through bibliometric approaches, as well as the current issues and challenges in this field. These findings provide researchers with valuable insights on the current status and future directions of this field.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Fei Zhai
- Faculty of Medical Device, Shenyang Pharmaceutical University, Shenyang, China
| | - Yuxuan Zhai
- Faculty of Medical Device, Shenyang Pharmaceutical University, Shenyang, China
| |
Collapse
|
38
|
Ovchinnikova K, Born J, Chouvardas P, Rapsomaniki M, Kruithof-de Julio M. Overcoming limitations in current measures of drug response may enable AI-driven precision oncology. NPJ Precis Oncol 2024; 8:95. [PMID: 38658785 PMCID: PMC11043358 DOI: 10.1038/s41698-024-00583-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 03/22/2024] [Indexed: 04/26/2024] Open
Abstract
Machine learning (ML) models of drug sensitivity prediction are becoming increasingly popular in precision oncology. Here, we identify a fundamental limitation in standard measures of drug sensitivity that hinders the development of personalized prediction models - they focus on absolute effects but do not capture relative differences between cancer subtypes. Our work suggests that using z-scored drug response measures mitigates these limitations and leads to meaningful predictions, opening the door for sophisticated ML precision oncology models.
Collapse
Affiliation(s)
- Katja Ovchinnikova
- Urology Research Laboratory, Department for BioMedical Research, University of Bern, Bern, Switzerland
| | | | - Panagiotis Chouvardas
- Urology Research Laboratory, Department for BioMedical Research, University of Bern, Bern, Switzerland
- Department of Urology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | | | - Marianna Kruithof-de Julio
- Urology Research Laboratory, Department for BioMedical Research, University of Bern, Bern, Switzerland.
- Department of Urology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland.
| |
Collapse
|
39
|
Hsu YC, Chiu YC, Lu TP, Hsiao TH, Chen Y. Predicting drug response through tumor deconvolution by cancer cell lines. PATTERNS (NEW YORK, N.Y.) 2024; 5:100949. [PMID: 38645769 PMCID: PMC11026976 DOI: 10.1016/j.patter.2024.100949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 02/07/2024] [Accepted: 02/12/2024] [Indexed: 04/23/2024]
Abstract
Large-scale cancer drug sensitivity data have become available for a collection of cancer cell lines, but only limited drug response data from patients are available. Bridging the gap in pharmacogenomics knowledge between in vitro and in vivo datasets remains challenging. In this study, we trained a deep learning model, Scaden-CA, for deconvoluting tumor data into proportions of cancer-type-specific cell lines. Then, we developed a drug response prediction method using the deconvoluted proportions and the drug sensitivity data from cell lines. The Scaden-CA model showed excellent performance in terms of concordance correlation coefficients (>0.9 for model testing) and the correctly deconvoluted rate (>70% across most cancers) for model validation using Cancer Cell Line Encyclopedia (CCLE) bulk RNA data. We applied the model to tumors in The Cancer Genome Atlas (TCGA) dataset and examined associations between predicted cell viability and mutation status or gene expression levels to understand underlying mechanisms of potential value for drug repurposing.
Collapse
Affiliation(s)
- Yu-Ching Hsu
- Bioinformatics Program, Taiwan International Graduate Program, National Taiwan University, Taipei 115, Taiwan
- Bioinformatics Program, Institute of Statistical Science, Taiwan International Graduate Program, Academia Sinica, Taipei 115, Taiwan
- Institute of Health Data Analytics and Statistics, Department of Public Health, College of Public Health, National Taiwan University, Taipei 100, Taiwan
- Greehey Children’s Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX 78229, USA
| | - Yu-Chiao Chiu
- Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15261, USA
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA 15232, USA
| | - Tzu-Pin Lu
- Institute of Health Data Analytics and Statistics, Department of Public Health, College of Public Health, National Taiwan University, Taipei 100, Taiwan
| | - Tzu-Hung Hsiao
- Department of Medical Research, Taichung Veterans General Hospital, Taichung 40705, Taiwan
| | - Yidong Chen
- Greehey Children’s Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX 78229, USA
- Department of Population Health Sciences, University of Texas Health San Antonio, San Antonio, TX 78229, USA
| |
Collapse
|
40
|
Abbasi M, Carvalho FG, Ribeiro B, Arrais JP. Predicting drug activity against cancer through genomic profiles and SMILES. Artif Intell Med 2024; 150:102820. [PMID: 38553160 DOI: 10.1016/j.artmed.2024.102820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 01/09/2024] [Accepted: 02/21/2024] [Indexed: 04/02/2024]
Abstract
Due to the constant increase in cancer rates, the disease has become a leading cause of death worldwide, enhancing the need for its detection and treatment. In the era of personalized medicine, the main goal is to incorporate individual variability in order to choose more precisely which therapy and prevention strategies suit each person. However, predicting the sensitivity of tumors to anticancer treatments remains a challenge. In this work, we propose two deep neural network models to predict the impact of anticancer drugs in tumors through the half-maximal inhibitory concentration (IC50). These models join biological and chemical data to apprehend relevant features of the genetic profile and the drug compounds, respectively. In order to predict the drug response in cancer cell lines, this study employed different DL methods, resorting to Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs). In the first stage, two autoencoders were pre-trained with high-dimensional gene expression and mutation data of tumors. Afterward, this genetic background is transferred to the prediction models that return the IC50 value that portrays the potency of a substance in inhibiting a cancer cell line. When comparing RSEM Expected counts and TPM as methods for displaying gene expression data, RSEM has been shown to perform better in deep models and CNNs model can obtain better insight in these types of data. Moreover, the obtained results reflect the effectiveness of the extracted deep representations in the prediction of the IC50 value that portrays the potency of a substance in inhibiting a tumor, achieving a performance of a mean squared error of 1.06 and surpassing previous state-of-the-art models.
Collapse
Affiliation(s)
- Maryam Abbasi
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal; Polytechnic Institute of Coimbra, Applied Research Institute, Coimbra, Portugal; Research Centre for Natural Resources Environment and Society (CERNAS), Polytechnic Institute of Coimbra, Coimbra, Portugal.
| | - Filipa G Carvalho
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal
| | - Bernardete Ribeiro
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal
| | - Joel P Arrais
- Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal
| |
Collapse
|
41
|
Chen Y, Zhang L. Hi-GeoMVP: a hierarchical geometry-enhanced deep learning model for drug response prediction. Bioinformatics 2024; 40:btae204. [PMID: 38614131 PMCID: PMC11060866 DOI: 10.1093/bioinformatics/btae204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 02/11/2024] [Accepted: 04/11/2024] [Indexed: 04/15/2024] Open
Abstract
MOTIVATION Personalized cancer treatments require accurate drug response predictions. Existing deep learning methods show promise but higher accuracy is needed to serve the purpose of precision medicine. The prediction accuracy can be improved with not only topology but geometrical information of drugs. RESULTS A novel deep learning methodology for drug response prediction is presented, named Hi-GeoMVP. It synthesizes hierarchical drug representation with multi-omics data, leveraging graph neural networks and variational autoencoders for detailed drug and cell line representations. Multi-task learning is employed to make better prediction, while both 2D and 3D molecular representations capture comprehensive drug information. Testing on the GDSC dataset confirms Hi-GeoMVP's enhanced performance, surpassing prior state-of-the-art methods by improving the Pearson correlation coefficient from 0.934 to 0.941 and decreasing the root mean square error from 0.969 to 0.931. In the case of blind test, Hi-GeoMVP demonstrated robustness, outperforming the best previous models with a superior Pearson correlation coefficient in the drug-blind test. These results underscore Hi-GeoMVP's capabilities in drug response prediction, implying its potential for precision medicine. AVAILABILITY AND IMPLEMENTATION The source code is available at https://github.com/matcyr/Hi-GeoMVP.
Collapse
Affiliation(s)
- Yurui Chen
- Department of Mathematics and the Centre for Data Science and Machine Learning, National University of Singapore, Singapore 119076, Singapore
| | - Louxin Zhang
- Department of Mathematics and the Centre for Data Science and Machine Learning, National University of Singapore, Singapore 119076, Singapore
| |
Collapse
|
42
|
Wang H, Lin K, Zhang Q, Shi J, Song X, Wu J, Zhao C, He K. HyperTMO: a trusted multi-omics integration framework based on hypergraph convolutional network for patient classification. Bioinformatics 2024; 40:btae159. [PMID: 38530977 PMCID: PMC11212491 DOI: 10.1093/bioinformatics/btae159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 02/02/2024] [Accepted: 03/24/2024] [Indexed: 03/28/2024] Open
Abstract
MOTIVATION The rapid development of high-throughput biomedical technologies can provide researchers with detailed multi-omics data. The multi-omics integrated analysis approach based on machine learning contributes a more comprehensive perspective to human disease research. However, there are still significant challenges in representing single-omics data and integrating multi-omics information. RESULTS This article presents HyperTMO, a Trusted Multi-Omics integration framework based on Hypergraph convolutional network for patient classification. HyperTMO constructs hypergraph structures to represent the association between samples in single-omics data, then evidence extraction is performed by hypergraph convolutional network, and multi-omics information is integrated at an evidence level. Last, we experimentally demonstrate that HyperTMO outperforms other state-of-the-art methods in breast cancer subtype classification and Alzheimer's disease classification tasks using multi-omics data from TCGA (BRCA) and ROSMAP datasets. Importantly, HyperTMO is the first attempt to integrate hypergraph structure, evidence theory, and multi-omics integration for patient classification. Its accurate and robust properties bring great potential for applications in clinical diagnosis. AVAILABILITY AND IMPLEMENTATION HyperTMO and datasets are publicly available at https://github.com/ippousyuga/HyperTMO.
Collapse
Affiliation(s)
- Haohua Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Kai Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Qiang Zhang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Jinlong Shi
- Research Center for Medical Big Data, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing 100039, China
| | - Xinyu Song
- Research Center for Medical Big Data, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing 100039, China
| | - Jue Wu
- Research Center for Medical Big Data, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing 100039, China
| | - Chenghui Zhao
- Research Center for Medical Big Data, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing 100039, China
| | - Kunlun He
- Research Center for Medical Big Data, Medical Innovation Research Division of Chinese PLA General Hospital, Beijing 100039, China
| |
Collapse
|
43
|
Nguyen T, Campbell A, Kumar A, Amponsah E, Fiterau M, Shahriyari L. Optimal fusion of genotype and drug embeddings in predicting cancer drug response. Brief Bioinform 2024; 25:bbae227. [PMID: 38754407 PMCID: PMC11097979 DOI: 10.1093/bib/bbae227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 04/14/2024] [Accepted: 04/25/2024] [Indexed: 05/18/2024] Open
Abstract
Predicting cancer drug response using both genomics and drug features has shown some success compared to using genomics features alone. However, there has been limited research done on how best to combine or fuse the two types of features. Using a visible neural network with two deep learning branches for genes and drug features as the base architecture, we experimented with different fusion functions and fusion points. Our experiments show that injecting multiplicative relationships between gene and drug latent features into the original concatenation-based architecture DrugCell significantly improved the overall predictive performance and outperformed other baseline models. We also show that different fusion methods respond differently to different fusion points, indicating that the relationship between drug features and different hierarchical biological level of gene features is optimally captured using different methods. Considering both predictive performance and runtime speed, tensor product partial is the best-performing fusion function to combine late-stage representations of drug and gene features to predict cancer drug response.
Collapse
Affiliation(s)
- Trang Nguyen
- Department of Computer Science, University of Massachusetts Amherst, Amherst 01002, MA, United States
| | - Anthony Campbell
- Department of Computer Science, University of Massachusetts Amherst, Amherst 01002, MA, United States
| | - Ankit Kumar
- Department of Mathematics and Statistics, University of Massachusetts Amherst, Amherst 01002, MA, United States
| | - Edwin Amponsah
- Department of Mathematics and Statistics, University of Massachusetts Amherst, Amherst 01002, MA, United States
| | - Madalina Fiterau
- Department of Computer Science, University of Massachusetts Amherst, Amherst 01002, MA, United States
| | - Leili Shahriyari
- Department of Mathematics and Statistics, University of Massachusetts Amherst, Amherst 01002, MA, United States
| |
Collapse
|
44
|
Hajim WI, Zainudin S, Mohd Daud K, Alheeti K. Optimized models and deep learning methods for drug response prediction in cancer treatments: a review. PeerJ Comput Sci 2024; 10:e1903. [PMID: 38660174 PMCID: PMC11042005 DOI: 10.7717/peerj-cs.1903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 01/31/2024] [Indexed: 04/26/2024]
Abstract
Recent advancements in deep learning (DL) have played a crucial role in aiding experts to develop personalized healthcare services, particularly in drug response prediction (DRP) for cancer patients. The DL's techniques contribution to this field is significant, and they have proven indispensable in the medical field. This review aims to analyze the diverse effectiveness of various DL models in making these predictions, drawing on research published from 2017 to 2023. We utilized the VOS-Viewer 1.6.18 software to create a word cloud from the titles and abstracts of the selected studies. This study offers insights into the focus areas within DL models used for drug response. The word cloud revealed a strong link between certain keywords and grouped themes, highlighting terms such as deep learning, machine learning, precision medicine, precision oncology, drug response prediction, and personalized medicine. In order to achieve an advance in DRP using DL, the researchers need to work on enhancing the models' generalizability and interoperability. It is also crucial to develop models that not only accurately represent various architectures but also simplify these architectures, balancing the complexity with the predictive capabilities. In the future, researchers should try to combine methods that make DL models easier to understand; this will make DRP reviews more open and help doctors trust the decisions made by DL models in cancer DRP.
Collapse
Affiliation(s)
- Wesam Ibrahim Hajim
- Department of Applied Geology, College of Sciences, Tirkit University, Tikrit, Salah ad Din, Iraq
- Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Selangor, Malaysia
| | - Suhaila Zainudin
- Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Selangor, Malaysia
| | - Kauthar Mohd Daud
- Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Selangor, Malaysia
| | - Khattab Alheeti
- Department of Computer Networking Systems, College of Computer Sciences and Information Technology, University of Anbar, Al Anbar, Ramadi, Iraq
| |
Collapse
|
45
|
Lao C, Zheng P, Chen H, Liu Q, An F, Li Z. DeepAEG: a model for predicting cancer drug response based on data enhancement and edge-collaborative update strategies. BMC Bioinformatics 2024; 25:105. [PMID: 38461284 PMCID: PMC10925015 DOI: 10.1186/s12859-024-05723-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Accepted: 02/27/2024] [Indexed: 03/11/2024] Open
Abstract
MOTIVATION The prediction of cancer drug response is a challenging subject in modern personalized cancer therapy due to the uncertainty of drug efficacy and the heterogeneity of patients. It has been shown that the characteristics of the drug itself and the genomic characteristics of the patient can greatly influence the results of cancer drug response. Therefore, accurate, efficient, and comprehensive methods for drug feature extraction and genomics integration are crucial to improve the prediction accuracy. RESULTS Accurate prediction of cancer drug response is vital for guiding the design of anticancer drugs. In this study, we propose an end-to-end deep learning model named DeepAEG which is based on a complete-graph update mode to predict IC50. Specifically, we integrate an edge update mechanism on the basis of a hybrid graph convolutional network to comprehensively learn the potential high-dimensional representation of topological structures in drugs, including atomic characteristics and chemical bond information. Additionally, we present a novel approach for enhancing simplified molecular input line entry specification data by employing sequence recombination to eliminate the defect of single sequence representation of drug molecules. Our extensive experiments show that DeepAEG outperforms other existing methods across multiple evaluation parameters in multiple test sets. Furthermore, we identify several potential anticancer agents, including bortezomib, which has proven to be an effective clinical treatment option. Our results highlight the potential value of DeepAEG in guiding the design of specific cancer treatment regimens.
Collapse
Affiliation(s)
- Chuanqi Lao
- Research Center for Graph Computing, Zhejiang Lab, Yuhang, Hangzhou, 311121, Zhejiang, China
| | - Pengfei Zheng
- Research Center for Graph Computing, Zhejiang Lab, Yuhang, Hangzhou, 311121, Zhejiang, China
| | - Hongyang Chen
- Research Center for Graph Computing, Zhejiang Lab, Yuhang, Hangzhou, 311121, Zhejiang, China.
| | - Qiao Liu
- Department of Statistics, Stanford University, Stanford, Palo Alto, CA, 94305, USA
| | - Feng An
- Research Center for Graph Computing, Zhejiang Lab, Yuhang, Hangzhou, 311121, Zhejiang, China
| | - Zhao Li
- Research Center for Graph Computing, Zhejiang Lab, Yuhang, Hangzhou, 311121, Zhejiang, China
| |
Collapse
|
46
|
Zhao X, Singhal A, Park S, Kong J, Bachelder R, Ideker T. Cancer Mutations Converge on a Collection of Protein Assemblies to Predict Resistance to Replication Stress. Cancer Discov 2024; 14:508-523. [PMID: 38236062 PMCID: PMC10905674 DOI: 10.1158/2159-8290.cd-23-0641] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 10/25/2023] [Accepted: 12/21/2023] [Indexed: 01/19/2024]
Abstract
Rapid proliferation is a hallmark of cancer associated with sensitivity to therapeutics that cause DNA replication stress (RS). Many tumors exhibit drug resistance, however, via molecular pathways that are incompletely understood. Here, we develop an ensemble of predictive models that elucidate how cancer mutations impact the response to common RS-inducing (RSi) agents. The models implement recent advances in deep learning to facilitate multidrug prediction and mechanistic interpretation. Initial studies in tumor cells identify 41 molecular assemblies that integrate alterations in hundreds of genes for accurate drug response prediction. These cover roles in transcription, repair, cell-cycle checkpoints, and growth signaling, of which 30 are shown by loss-of-function genetic screens to regulate drug sensitivity or replication restart. The model translates to cisplatin-treated cervical cancer patients, highlighting an RTK-JAK-STAT assembly governing resistance. This study defines a compendium of mechanisms by which mutations affect therapeutic responses, with implications for precision medicine. SIGNIFICANCE Zhao and colleagues use recent advances in machine learning to study the effects of tumor mutations on the response to common therapeutics that cause RS. The resulting predictive models integrate numerous genetic alterations distributed across a constellation of molecular assemblies, facilitating a quantitative and interpretable assessment of drug response. This article is featured in Selected Articles from This Issue, p. 384.
Collapse
Affiliation(s)
- Xiaoyu Zhao
- Division of Human Genomics and Precision Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
| | - Akshat Singhal
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California
| | - Sungjoon Park
- Division of Human Genomics and Precision Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
| | - JungHo Kong
- Division of Human Genomics and Precision Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
- Moores Cancer Center, School of Medicine, University of California, San Diego, La Jolla, California
| | - Robin Bachelder
- Division of Human Genomics and Precision Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
| | - Trey Ideker
- Division of Human Genomics and Precision Medicine, Department of Medicine, University of California, San Diego, La Jolla, California
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California
- Moores Cancer Center, School of Medicine, University of California, San Diego, La Jolla, California
- Department of Bioengineering, University of California, San Diego, La Jolla, California
| |
Collapse
|
47
|
Lin CX, Guan Y, Li HD. Artificial intelligence approaches for molecular representation in drug response prediction. Curr Opin Struct Biol 2024; 84:102747. [PMID: 38091924 DOI: 10.1016/j.sbi.2023.102747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/26/2023] [Accepted: 11/26/2023] [Indexed: 02/09/2024]
Abstract
Drug response prediction is essential for drug development and disease treatment. One key question in predicting drug response is the representation of molecules, which has been greatly advanced by artificial intelligence (AI) techniques in recent years. In this review, we first describe different types of representation methods, pinpointing their key principles and discussing their limitations. Thereafter we discuss potential ways how these methods could be further developed. We expect that this review will provide useful guidance for researchers in the community.
Collapse
Affiliation(s)
- Cui-Xiang Lin
- School of Mathematics and Computational Science, Xiangtan University, Xiangtan, 411105, Hunan Province, PR China
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.
| | - Hong-Dong Li
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, PR China.
| |
Collapse
|
48
|
Liu H, Peng W, Dai W, Lin J, Fu X, Liu L, Liu L, Yu N. Improving anti-cancer drug response prediction using multi-task learning on graph convolutional networks. Methods 2024; 222:41-50. [PMID: 38157919 DOI: 10.1016/j.ymeth.2023.11.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 09/19/2023] [Accepted: 11/19/2023] [Indexed: 01/03/2024] Open
Abstract
Predicting the therapeutic effect of anti-cancer drugs on tumors based on the characteristics of tumors and patients is one of the important contents of precision oncology. Existing computational methods regard the drug response prediction problem as a classification or regression task. However, few of them consider leveraging the relationship between the two tasks. In this work, we propose a Multi-task Interaction Graph Convolutional Network (MTIGCN) for anti-cancer drug response prediction. MTIGCN first utilizes an graph convolutional network-based model to produce embeddings for both cell lines and drugs. After that, the model employs multi-task learning to predict anti-cancer drug response, which involves training the model on three different tasks simultaneously: the main task of the drug sensitive or resistant classification task and the two auxiliary tasks of regression prediction and similarity network reconstruction. By sharing parameters and optimizing the losses of different tasks simultaneously, MTIGCN enhances the feature representation and reduces overfitting. The results of the experiments on two in vitro datasets demonstrated that MTIGCN outperformed seven state-of-the-art baseline methods. Moreover, the well-trained model on the in vitro dataset GDSC exhibited good performance when applied to predict drug responses in in vivo datasets PDX and TCGA. The case study confirmed the model's ability to discover unknown drug responses in cell lines.
Collapse
Affiliation(s)
- Hancheng Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China
| | - Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China.
| | - Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China.
| | - Jiangzhen Lin
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China
| | - Xiaodong Fu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China
| | - Li Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China.
| | - Lijun Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650050, China
| | - Ning Yu
- State University of New York, The College at Brockport, Department of Computing Sciences, 350 New Campus Drive, Brockport NY 14422.
| |
Collapse
|
49
|
Vasanthakumari P, Zhu Y, Brettin T, Partin A, Shukla M, Xia F, Narykov O, Weil MR, Stevens RL. A Comprehensive Investigation of Active Learning Strategies for Conducting Anti-Cancer Drug Screening. Cancers (Basel) 2024; 16:530. [PMID: 38339281 PMCID: PMC10854925 DOI: 10.3390/cancers16030530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 01/12/2024] [Accepted: 01/22/2024] [Indexed: 02/12/2024] Open
Abstract
It is well-known that cancers of the same histology type can respond differently to a treatment. Thus, computational drug response prediction is of paramount importance for both preclinical drug screening studies and clinical treatment design. To build drug response prediction models, treatment response data need to be generated through screening experiments and used as input to train the prediction models. In this study, we investigate various active learning strategies of selecting experiments to generate response data for the purposes of (1) improving the performance of drug response prediction models built on the data and (2) identifying effective treatments. Here, we focus on constructing drug-specific response prediction models for cancer cell lines. Various approaches have been designed and applied to select cell lines for screening, including a random, greedy, uncertainty, diversity, combination of greedy and uncertainty, sampling-based hybrid, and iteration-based hybrid approach. All of these approaches are evaluated and compared using two criteria: (1) the number of identified hits that are selected experiments validated to be responsive, and (2) the performance of the response prediction model trained on the data of selected experiments. The analysis was conducted for 57 drugs and the results show a significant improvement on identifying hits using active learning approaches compared with the random and greedy sampling method. Active learning approaches also show an improvement on response prediction performance for some of the drugs and analysis runs compared with the greedy sampling method.
Collapse
Affiliation(s)
- Priyanka Vasanthakumari
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Yitan Zhu
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Thomas Brettin
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (T.B.); (R.L.S.)
| | - Alexander Partin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Maulik Shukla
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Fangfang Xia
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Oleksandr Narykov
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA; (Y.Z.); (A.P.); (M.S.); (F.X.); (O.N.)
| | - Michael Ryan Weil
- Cancer Research Technology Program, Cancer Data Science Initiatives, Frederick National Laboratory for Cancer Research, Frederick, MD 21701, USA;
| | - Rick L. Stevens
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA; (T.B.); (R.L.S.)
- Department of Computer Science, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
50
|
Kim J, Park SH, Lee H. PANCDR: precise medicine prediction using an adversarial network for cancer drug response. Brief Bioinform 2024; 25:bbae088. [PMID: 38487849 PMCID: PMC10940842 DOI: 10.1093/bib/bbae088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 01/09/2024] [Accepted: 02/16/2024] [Indexed: 03/18/2024] Open
Abstract
Pharmacogenomics aims to provide personalized therapy to patients based on their genetic variability. However, accurate prediction of cancer drug response (CDR) is challenging due to genetic heterogeneity. Since clinical data are limited, most studies predicting drug response use preclinical data to train models. However, such models might not be generalizable to external clinical data due to differences between the preclinical and clinical datasets. In this study, a Precision Medicine Prediction using an Adversarial Network for Cancer Drug Response (PANCDR) model is proposed. PANCDR consists of two sub-models, an adversarial model and a CDR prediction model. The adversarial model reduces the gap between the preclinical and clinical datasets, while the CDR prediction model extracts features and predicts responses. PANCDR was trained using both preclinical data and unlabeled clinical data. Subsequently, it was tested on external clinical data, including The Cancer Genome Atlas and brain tumor patients. PANCDR outperformed other machine learning models in predicting external test data. Our results demonstrate the robustness of PANCDR and its potential in precision medicine by recommending patient-specific drug candidates. The PANCDR codes and data are available at https://github.com/DMCB-GIST/PANCDR.
Collapse
Affiliation(s)
- Juyeon Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, 61005, Gwangju, South Korea
| | - Sung-Hye Park
- Department of Pathology, Seoul National University Hospital, Seoul National University College of Medicine, 03080, Seoul, South Korea
- Neuroscience Research Institute, Seoul National University College of Medicine, 03080, Seoul, South Korea
| | - Hyunju Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, 61005, Gwangju, South Korea
- Artificial Intelligence Graduate School, Gwangju Institute of Science and Technology, 61005, Gwangju, South Korea
| |
Collapse
|