1
|
Carli F, De Oliveira Rosa N, Blotas S, Di Chiaro P, Bisceglia L, Morelli M, Lessi F, Di Stefano AL, Mazzanti CM, Natoli G, Raimondi F. CellHit: a web server to predict and analyze cancer patients' drug responsiveness. Nucleic Acids Res 2025:gkaf414. [PMID: 40377071 DOI: 10.1093/nar/gkaf414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2025] [Revised: 04/17/2025] [Accepted: 05/02/2025] [Indexed: 05/18/2025] Open
Abstract
We present the CellHit web server (https://cellhit.bioinfolab.sns.it/), a web-based platform designed to predict and analyze cancer patients' responsiveness to drugs using transcriptomic data. By leveraging extensive pharmacogenomics datasets from the Genomics of Drug Sensitivity in Cancer v1 and v2 (GDSC) and Profiling Relative Inhibition Simultaneously in Mixtures (PRISM) and transcriptomic data from the Cancer Cell Line Encyclopedia (CCLE) and The Cancer Genome Atlas Program (TCGA). CellHit integrates a computational pipeline for preprocessing, gene imputation, and robust alignment between patient and cell line transcriptomic data with pre-trained SOTA models for drug sensitivity prediction. The pipeline employs batch correction, enhanced Celligner methodology, and Parametric UMAP for stable and actionable alignment. The intuitive interface requires no programming expertise, offering interactive visualizations, including low-dimensional embeddings and drug sensitivity heatmaps for the input transcriptomic samples. Results feature contextual metadata, SHAP-based feature importance, and transcriptomic neighbors from reference datasets, simplifying interpretation and hypothesis generation. CellHit provides precomputed predictions across TCGA samples and offers the ability to run custom analyses online on input samples, democratizing precision oncology by enabling rapid, interpretable predictions accessible the research community.
Collapse
Affiliation(s)
- Francesco Carli
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy
- Department of Computer Science, Univerisity of Pisa, Largo B. Pontecorvo 3, 56127 Pisa, Italy
| | - Natalia De Oliveira Rosa
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy
| | - Simon Blotas
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy
| | - Pierluigi Di Chiaro
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Via Ripamonti 435, 20141 Milano, Italy
| | - Luisa Bisceglia
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy
| | - Mariangela Morelli
- Fondazione Pisana per la Scienza, Pisa, Via F. Giovannini 13, 56017 Pisa, Italy
| | - Francesca Lessi
- Fondazione Pisana per la Scienza, Pisa, Via F. Giovannini 13, 56017 Pisa, Italy
| | - Anna Luisa Di Stefano
- Neurosurgical Department of Spedali Riuniti di Livorno, Via V. Alfieri 36, 57124 Livorno, Italy
| | | | - Gioacchino Natoli
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Via Ripamonti 435, 20141 Milano, Italy
| | - Francesco Raimondi
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy
| |
Collapse
|
2
|
Miao R, Zhong BJ, Mei XY, Dong X, Ou YD, Liang Y, Yu HY, Wang Y, Dong ZH. A semi-supervised weighted SPCA- and convolution KAN-based model for drug response prediction. Front Genet 2025; 16:1532651. [PMID: 40191608 PMCID: PMC11968432 DOI: 10.3389/fgene.2025.1532651] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2024] [Accepted: 02/24/2025] [Indexed: 04/09/2025] Open
Abstract
Motivation Predicting the response of cell lines to characteristic drugs based on multi-omics gene information has become the core problem of precision oncology. At present, drug response prediction using multi-omics gene data faces the following three main challenges: first, how to design a gene probe feature extraction model with biological interpretation and high performance; second, how to develop multi-omics weighting modules for reasonably fusing genetic data of different lengths and noise conditions; third, how to construct deep learning models that can handle small sample sizes while minimizing the risk of possible overfitting. Results We propose an innovative drug response prediction model (NMDP). First, the NMDP model introduces an interpretable semi-supervised weighted SPCA module to solve the feature extraction problem in multi-omics gene data. Next, we construct a multi-omics data fusion framework based on sample similarity networks, bimodal tests, and variance information, which solves the data fusion problem and enables the NMDP model to focus on more relevant genomic data. Finally, we combine a one-dimensional convolution method and Kolmogorov-Arnold networks (KANs) to predict the drug response. We conduct five sets of real data experiments and compare NMDP against seven advanced drug response prediction methods. The results show that NMDP achieves the best performance, with sensitivity and specificity reaching 0.92 and 0.93, respectively-an improvement of 11%-57% compared to other models. Bio-enrichment experiments strongly support the biological interpretation of the NMDP model and its ability to identify potential targets for drug activity prediction.
Collapse
Affiliation(s)
- Rui Miao
- Basic Teaching Department, Zhuhai Campus of Zunyi Medical University, Zhu Hai, China
| | - Bing-Jie Zhong
- Basic Teaching Department, Zhuhai Campus of Zunyi Medical University, Zhu Hai, China
| | - Xin-Yue Mei
- Institute of Systems Engineering, Macau University of Science and Technology, Macau, China
| | - Xin Dong
- Institute of Systems Engineering, Macau University of Science and Technology, Macau, China
| | - Yang-Dong Ou
- School of Biomedical Engineering, Guangdong Medical University, Dongguan, China
| | | | - Hao-Yang Yu
- Basic Teaching Department, Zhuhai Campus of Zunyi Medical University, Zhu Hai, China
| | - Ying Wang
- Basic Teaching Department, Zhuhai Campus of Zunyi Medical University, Zhu Hai, China
| | - Zi-Han Dong
- Basic Teaching Department, Zhuhai Campus of Zunyi Medical University, Zhu Hai, China
| |
Collapse
|
3
|
Jayagopal A, Walsh RJ, Hariprasannan KK, Mariappan R, Mahapatra D, Jaynes PW, Lim D, Peng Tan DS, Tan TZ, Pitt JJ, Jeyasekharan AD, Rajan V. A multi-task domain-adapted model to predict chemotherapy response from mutations in recurrently altered cancer genes. iScience 2025; 28:111992. [PMID: 40160429 PMCID: PMC11952854 DOI: 10.1016/j.isci.2025.111992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 08/23/2024] [Accepted: 02/06/2025] [Indexed: 04/02/2025] Open
Abstract
Next-generation sequencing (NGS) is increasingly utilized in oncological practice; however, only a minority of patients benefit from targeted therapy. Developing drug response prediction (DRP) models is important for the "untargetable" majority. Prior DRP models typically use whole-transcriptome and whole-exome sequencing data, which are clinically unavailable. We aim to develop a DRP model toward the repurposing of chemotherapy, requiring only information from clinical-grade NGS (cNGS) panels of restricted gene sets. Data sparsity and limited patient drug response information make this challenging. We firstly show that existing DRPs perform equally with whole-exome versus cNGS (∼300 genes) data. Drug IDentifier (DruID) is then described, a DRP model for restricted gene sets using transfer learning, variant annotations, domain-invariant representation learning, and multi-task learning. DruID outperformed state-of-the-art DRP methods on pan-cancer data and showed robust response classification on two real-world clinical datasets, representing a step toward a clinically applicable DRP tool.
Collapse
Affiliation(s)
- Aishwarya Jayagopal
- Department of Information Systems and Analytics, School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Robert J. Walsh
- Department of Haematology-Oncology, National University Cancer Institute, NUHS Tower Block, Level 7, 1E Kent Ridge Road, Singapore 119228, Singapore
| | - Krishna Kumar Hariprasannan
- Department of Information Systems and Analytics, School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Ragunathan Mariappan
- Department of Information Systems and Analytics, School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Debabrata Mahapatra
- Department of Computer Science, School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Patrick William Jaynes
- Cancer Science Institute of Singapore, National University of Singapore, Center for Translational Medicine, 14 Medical Drive, #12-01, Singapore 117599, Singapore
| | - Diana Lim
- Department of Pathology, National University Health System, 1E Kent Ridge Road Singapore 119228, Singapore
| | - David Shao Peng Tan
- Department of Haematology-Oncology, National University Cancer Institute, NUHS Tower Block, Level 7, 1E Kent Ridge Road, Singapore 119228, Singapore
- Cancer Science Institute of Singapore, National University of Singapore, Center for Translational Medicine, 14 Medical Drive, #12-01, Singapore 117599, Singapore
- Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore. 1E Kent Ridge Road, NUHS Tower Block, Level 10, Singapore 119228, Singapore
| | - Tuan Zea Tan
- Cancer Science Institute of Singapore, National University of Singapore, Center for Translational Medicine, 14 Medical Drive, #12-01, Singapore 117599, Singapore
| | - Jason J. Pitt
- Cancer Science Institute of Singapore, National University of Singapore, Center for Translational Medicine, 14 Medical Drive, #12-01, Singapore 117599, Singapore
| | - Anand D. Jeyasekharan
- Department of Haematology-Oncology, National University Cancer Institute, NUHS Tower Block, Level 7, 1E Kent Ridge Road, Singapore 119228, Singapore
- Cancer Science Institute of Singapore, National University of Singapore, Center for Translational Medicine, 14 Medical Drive, #12-01, Singapore 117599, Singapore
| | - Vaibhav Rajan
- Department of Information Systems and Analytics, School of Computing, National University of Singapore, Singapore 117417, Singapore
| |
Collapse
|
4
|
Carli F, Di Chiaro P, Morelli M, Arora C, Bisceglia L, De Oliveira Rosa N, Cortesi A, Franceschi S, Lessi F, Di Stefano AL, Santonocito OS, Pasqualetti F, Aretini P, Miglionico P, Diaferia GR, Giannotti F, Liò P, Duran-Frigola M, Mazzanti CM, Natoli G, Raimondi F. Learning and actioning general principles of cancer cell drug sensitivity. Nat Commun 2025; 16:1654. [PMID: 39952993 PMCID: PMC11828915 DOI: 10.1038/s41467-025-56827-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 02/03/2025] [Indexed: 02/17/2025] Open
Abstract
High-throughput screening of drug sensitivity of cancer cell lines (CCLs) holds the potential to unlock anti-tumor therapies. In this study, we leverage such datasets to predict drug response using cell line transcriptomics, focusing on models' interpretability and deployment on patients' data. We use large language models (LLMs) to match drug to mechanisms of action (MOA)-related pathways. Genes crucial for prediction are enriched in drug-MOAs, suggesting that our models learn the molecular determinants of response. Furthermore, by using only LLM-curated, MOA-genes, we enhance the predictive accuracy of our models. To enhance translatability, we align RNAseq data from CCLs, used for training, to those from patient samples, used for inference. We validated our approach on TCGA samples, where patients' best scoring drugs match those prescribed for their cancer type. We further predict and experimentally validate effective drugs for the patients of two highly lethal solid tumors, i.e., pancreatic cancer and glioblastoma.
Collapse
Affiliation(s)
- Francesco Carli
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Pisa, Italy.
- Department of Computer Science, Univerisity of Pisa, Pisa, Italy.
| | - Pierluigi Di Chiaro
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milano, Italy
| | | | - Chakit Arora
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Pisa, Italy
| | - Luisa Bisceglia
- Laboratorio di Biologia Bio@SNS, Scuola Normale Superiore, Pisa, Italy
| | | | - Alice Cortesi
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milano, Italy
| | | | | | | | | | | | | | | | - Giuseppe R Diaferia
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milano, Italy
- Botton-Champalimaud Pancreatic Cancer Center, Champalimaud Foundation, Lisbon, Portugal
| | | | - Pietro Liò
- Department of Computer Science and Technology, University of Cambridge, Cambridge, UK
| | | | | | - Gioacchino Natoli
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milano, Italy
| | | |
Collapse
|
5
|
Dong X, Liu H, Tong T, Wu L, Wang J, You T, Wei Y, Yi X, Yang H, Hu J, Wang H, Wang X, Li MJ. Personalized prediction of anticancer potential of non-oncology drugs through learning from genome derived molecular pathways. NPJ Precis Oncol 2025; 9:36. [PMID: 39905223 PMCID: PMC11794852 DOI: 10.1038/s41698-025-00813-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 01/19/2025] [Indexed: 02/06/2025] Open
Abstract
Advances in cancer genomics have significantly expanded our understanding of cancer biology. However, the high cost of drug development limits our ability to translate this knowledge into precise treatments. Approved non-oncology drugs, comprising a large repository of chemical entities, offer a promising avenue for repurposing in cancer therapy. Herein we present CHANCE, a supervised machine learning model designed to predict the anticancer activities of non-oncology drugs for specific patients by simultaneously considering personalized coding and non-coding mutations. Utilizing protein-protein interaction networks, CHANCE harmonizes multilevel mutation annotations and integrates pharmacological information across different drugs into a single model. We systematically benchmarked the performance of CHANCE and show its predictions are better than previous model and highly interpretable. Applying CHANCE to approximately 5000 cancer samples indicated that >30% might respond to at least one non-oncology drug, with 11% non-oncology drugs predicted to have anticancer activities. Moreover, CHANCE predictions suggested an association between SMAD7 mutations and aspirin treatment response. Experimental validation using tumor cells derived from seven patients with pancreatic or esophageal cancer confirmed the potential anticancer activity of at least one non-oncology drug for five of these patients. To summarize, CHANCE offers a personalized and interpretable approach, serving as a valuable tool for mining non-oncology drugs in the precision oncology era.
Collapse
Affiliation(s)
- Xiaobao Dong
- Department of Genetics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Precision Medicine Research Center, The Second Hospital of Tianjin Medical University; Tianjin Medical University, Tianjin, China
| | - Huanhuan Liu
- Department of Bioinformatics, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Ting Tong
- Department of Gastroenterology, The Third Xiangya Hospital, Hunan Key Laboratory of Non-resolving Inflammation and Cancer, Central South University, Changsha, China
- Endoscopic Center, The First Affiliated Hospital of Xiamen University, Xiamen University, Xiamen, China
| | - Liuxing Wu
- Department of Bioinformatics, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Jianhua Wang
- Department of Bioinformatics, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Tianyi You
- Department of Bioinformatics, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Yongjian Wei
- Department of Bioinformatics, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Xianfu Yi
- Department of Bioinformatics, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Hongxi Yang
- Department of Bioinformatics, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Jie Hu
- Biobank of Hefei Cancer Hospital, Chinese Academy of Sciences, Hefei, China
| | - Haitao Wang
- Department of Oncology, Tianjin Institute of Urology, The Second Hospital of Tianjin Medical University, Tianjin, China.
| | - Xiaoyan Wang
- Department of Gastroenterology, The Third Xiangya Hospital, Hunan Key Laboratory of Non-resolving Inflammation and Cancer, Central South University, Changsha, China.
| | - Mulin Jun Li
- Department of Genetics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Precision Medicine Research Center, The Second Hospital of Tianjin Medical University; Tianjin Medical University, Tianjin, China.
- Department of Bioinformatics, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China.
- Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou, China.
| |
Collapse
|
6
|
Saeed D, Xing H, AlBadani B, Feng L, Al-Sabri R, Abdullah M, Rehman A. MGATAF: multi-channel graph attention network with adaptive fusion for cancer-drug response prediction. BMC Bioinformatics 2025; 26:19. [PMID: 39825219 PMCID: PMC11742231 DOI: 10.1186/s12859-024-05987-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Accepted: 11/12/2024] [Indexed: 01/20/2025] Open
Abstract
BACKGROUND Drug response prediction is critical in precision medicine to determine the most effective and safe treatments for individual patients. Traditional prediction methods relying on demographic and genetic data often fall short in accuracy and robustness. Recent graph-based models, while promising, frequently neglect the critical role of atomic interactions and fail to integrate drug fingerprints with SMILES for comprehensive molecular graph construction. RESULTS We introduce multimodal multi-channel graph attention network with adaptive fusion (MGATAF), a framework designed to enhance drug response predictions by capturing both local and global interactions among graph nodes. MGATAF improves drug representation by integrating SMILES and fingerprints, resulting in more precise predictions of drug effects. The methodology involves constructing multimodal molecular graphs, employing multi-channel graph attention networks to capture diverse interactions, and using adaptive fusion to integrate these interactions at multiple abstraction levels. Empirical results demonstrate MGATAF's superior performance compared to traditional and other graph-based techniques. For example, on the GDSC dataset, MGATAF achieved a 5.12% improvement in the Pearson correlation coefficient (PCC), reaching 0.9312 with an RMSE of 0.0225. Similarly, in new cell-line tests, MGATAF outperformed baselines with a PCC of 0.8536 and an RMSE of 0.0321 on the GDSC dataset, and a PCC of 0.7364 with an RMSE of 0.0531 on the CCLE dataset. CONCLUSIONS MGATAF significantly advances drug response prediction by effectively integrating multiple molecular data types and capturing complex interactions. This framework enhances prediction accuracy and offers a robust tool for personalized medicine, potentially leading to more effective and safer treatments for patients. Future research can expand on this work by exploring additional data modalities and refining the adaptive fusion mechanisms.
Collapse
Affiliation(s)
- Dhekra Saeed
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756, Sichuan, China.
| | - Huanlai Xing
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756, Sichuan, China.
| | - Barakat AlBadani
- School of Computer Science and Engineering, Central South University, Changsha, 410083, Hunan, China
| | - Li Feng
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756, Sichuan, China
| | - Raeed Al-Sabri
- Faculty of Computer Sciences and Information Systems, Thamar University, Dhamar, 87246, Yemen
| | - Monir Abdullah
- College of Computing and Information Technology, University of Bisha, Bisha, 67714, Saudi Arabia
| | - Amir Rehman
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756, Sichuan, China
| |
Collapse
|
7
|
Ha S, Park J, Jo K. Comparative analysis of regression algorithms for drug response prediction using GDSC dataset. BMC Res Notes 2025; 18:10. [PMID: 39806500 PMCID: PMC11726955 DOI: 10.1186/s13104-024-07026-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Accepted: 12/02/2024] [Indexed: 01/16/2025] Open
Abstract
BACKGROUND Drug response prediction can infer the relationship between an individual's genetic profile and a drug, which can be used to determine the choice of treatment for an individual patient. Prediction of drug response is recently being performed using machine learning technology. However, high-throughput sequencing data produces thousands of features per patient. In addition, it is difficult for researchers to know which algorithm is appropriate for prediction as various regression and feature selection algorithms exist. METHODS We compared and evaluated the performance of 13 representative regression algorithms using Genomics of Drug Sensitivity in Cancer (GDSC) dataset. Three analyses was conducted to show the effect of feature selection methods, multiomics information, and drug categories on drug response prediction. RESULTS In the experiments, Support Vector Regression algorithm and gene features selected with LINC L1000 dataset showed the best performance in terms of accuracy and execution time. However, integration of mutation and copy number variation information did not contribute to the prediction. Among the drug groups, responses of drugs related with hormone-related pathway were predicted with relatively high accuracy. CONCLUSION This study can help bioinformatics researchers design data processing steps and select algorithms for drug response prediction, and develop a new drug response prediction model based on the GDSC or other high-throughput sequencing datasets.
Collapse
Affiliation(s)
- Soojung Ha
- Department of Computer Engineering, Chungbuk National University, Chungdae-ro 1, Cheongju, 28644, Republic of Korea
| | - Juho Park
- Department of Computer Engineering, Chungbuk National University, Chungdae-ro 1, Cheongju, 28644, Republic of Korea
| | - Kyuri Jo
- Department of Computer Engineering, Chungbuk National University, Chungdae-ro 1, Cheongju, 28644, Republic of Korea.
| |
Collapse
|
8
|
Firoozbakht F, Yousefi B, Tsoy O, Baumbach J, Schwikowski B. Comparative evaluation of feature reduction methods for drug response prediction. Sci Rep 2024; 14:30885. [PMID: 39730699 DOI: 10.1038/s41598-024-81866-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Accepted: 11/29/2024] [Indexed: 12/29/2024] Open
Abstract
Personalized medicine aims to tailor medical treatments to individual patients, and predicting drug responses from molecular profiles using machine learning is crucial for this goal. However, the high dimensionality of the molecular profiles compared to the limited number of samples presents significant challenges. Knowledge-based feature selection methods are particularly suitable for drug response prediction, as they leverage biological insights to reduce dimensionality and improve model interpretability. This study presents the first comparative evaluation of nine different knowledge-based and data-driven feature reduction methods on cell line and tumor data. Our analysis employs six distinct machine learning models, with a total of more than 6,000 runs to ensure a robust evaluation. Our findings indicate that transcription factor activities outperform other methods in predicting drug responses, effectively distinguishing between sensitive and resistant tumors for seven of the 20 drugs evaluated.
Collapse
Affiliation(s)
- Farzaneh Firoozbakht
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Behnam Yousefi
- Computational Systems Biomedicine Lab, Institut Pasteur, Université Paris Cité, Paris, France
- École Doctorale Complexite du vivant, Sorbonne Université, Paris, France
- Institute of Medical Systems Biology, Center for Biomedical AI (bAIome), Center for Molecular Neurobiology (ZMNH), University Medical Center Hamburg-Eppendorf, 20251, Hamburg, Germany
| | - Olga Tsoy
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Computational BioMedicine Lab, University of Southern Denmark, Odense, Denmark
| | - Benno Schwikowski
- Computational Systems Biomedicine Lab, Institut Pasteur, Université Paris Cité, Paris, France.
| |
Collapse
|
9
|
Yang K, Cheng J, Cao S, Pan X, Shen HB, Yuan Y. Predicting transcriptional changes induced by molecules with MiTCP. Brief Bioinform 2024; 26:bbaf006. [PMID: 39847444 PMCID: PMC11756340 DOI: 10.1093/bib/bbaf006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Revised: 12/05/2024] [Accepted: 01/21/2025] [Indexed: 01/24/2025] Open
Abstract
Studying the changes in cellular transcriptional profiles induced by small molecules can significantly advance our understanding of cellular state alterations and response mechanisms under chemical perturbations, which plays a crucial role in drug discovery and screening processes. Considering that experimental measurements need substantial time and cost, we developed a deep learning-based method called Molecule-induced Transcriptional Change Predictor (MiTCP) to predict changes in transcriptional profiles (CTPs) of 978 landmark genes induced by molecules. MiTCP utilizes graph neural network-based approaches to simultaneously model molecular structure representation and gene co-expression relationships, and integrates them for CTP prediction. After training on the L1000 dataset, MiTCP achieves an average Pearson correlation coefficient (PCC) of 0.482 on the test set and an average PCC of 0.801 for predicting the top 50 differentially expressed genes, which outperforms other existing methods. Furthermore, we used MiTCP to predict CTPs of three cancer drugs, palbociclib, irinotecan and goserelin, and performed gene enrichment analysis on the top differentially expressed genes and found that the enriched pathways and Gene Ontology terms are highly relevant to the corresponding diseases, which reveals the potential of MiTCP in drug development.
Collapse
Affiliation(s)
- Kaiyuan Yang
- Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
| | - Jiabei Cheng
- Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
| | - Shenghao Cao
- Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
| | - Xiaoyong Pan
- Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
| | - Hong-Bin Shen
- Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
| | - Ye Yuan
- Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
- State Key Laboratory of Biopharmaceutical Preparation and Delivery, Institute of Process Engineering, Chinese Academy of Sciences, 1 North 2nd Street, Zhongguancun, Haidian District, Beijing 100190, China
| |
Collapse
|
10
|
Qin G, Zhang Y, Tyner JW, Kemp CJ, Shmulevich I. Knowledge graphs facilitate prediction of drug response for acute myeloid leukemia. iScience 2024; 27:110755. [PMID: 39280607 PMCID: PMC11401200 DOI: 10.1016/j.isci.2024.110755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 05/04/2024] [Accepted: 08/14/2024] [Indexed: 09/18/2024] Open
Abstract
Acute myeloid leukemia (AML) is a highly aggressive and heterogeneous disease, underscoring the need for improved therapeutic options and methods to optimally predict responses. With the wealth of available data resources, including clinical features, multiomics analysis, and ex vivo drug screening from AML patients, development of drug response prediction models has become feasible. Knowledge graphs (KGs) embed the relationships between different entities or features, allowing for explanation of a wide breadth of drug sensitivity and resistance mechanisms. We designed AML drug response prediction models guided by KGs. Our models included engineered features, relative gene expression between marker genes for each drug and regulators (e.g., transcription factors). We identified relative gene expression of FGD4-MIR4519, NPC2-GATA2, and BCL2-NFKB2 as predictive features for venetoclax ex vivo drug response. The KG-guided models provided high accuracy in independent test sets, overcame potential platform batch effects, and provided candidate drug sensitivity biomarkers for further validation.
Collapse
Affiliation(s)
- Guangrong Qin
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Yue Zhang
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Jeffrey W. Tyner
- Knight Cancer Institute, Oregon Health & Science University, Portland, OR 97239, USA
| | | | | |
Collapse
|
11
|
Gutierrez JJG, Lau E, Dharmapalan S, Parker M, Chen Y, Álvarez MA, Wang D. Multi-output prediction of dose-response curves enables drug repositioning and biomarker discovery. NPJ Precis Oncol 2024; 8:209. [PMID: 39304771 DOI: 10.1038/s41698-024-00691-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 08/28/2024] [Indexed: 09/22/2024] Open
Abstract
Drug response prediction is hampered by uncertainty in the measures of response and selection of doses. In this study, we propose a probabilistic multi-output model to simultaneously predict all dose-responses and uncover their biomarkers. By describing the relationship between genomic features and chemical properties to every response at every dose, our multi-output Gaussian Process (MOGP) models enable assessment of drug efficacy using any dose-response metric. This approach was tested across two drug screening studies and ten cancer types. Kullback-leibler divergence measured the importance of each feature and identified EZH2 gene as a novel biomarker of BRAF inhibitor response. We demonstrate the effectiveness of our MOGP models in accurately predicting dose-responses in different cancer types and when there is a limited number of drug screening experiments for training. Our findings highlight the potential of MOGP models in enhancing drug development pipelines by reducing data requirements and improving precision in dose-response predictions.
Collapse
Affiliation(s)
- Juan-José Giraldo Gutierrez
- National Heart and Lung Institute, Imperial College London, London, UK.
- Department of Computer Science, The University of Sheffield, Sheffield, UK.
| | - Evelyn Lau
- Institute for Human Development and Potential, Agency for Science Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Subhashini Dharmapalan
- Department of Computer Science, The University of Sheffield, Sheffield, UK
- Institute for Human Development and Potential, Agency for Science Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Melody Parker
- Nuffield Department of Clinical Medicine, University of Oxford, John Radcliffe Hospital, Oxford, UK
- Big Data Institute at the Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Yurui Chen
- Institute for Human Development and Potential, Agency for Science Technology and Research (A*STAR), Singapore, Republic of Singapore
- Department of Mathematics, National University of Singapore, Singapore, Republic of Singapore
| | - Mauricio A Álvarez
- Department of Computer Science, The University of Manchester, Manchester, UK
| | - Dennis Wang
- National Heart and Lung Institute, Imperial College London, London, UK.
- Department of Computer Science, The University of Sheffield, Sheffield, UK.
- Institute for Human Development and Potential, Agency for Science Technology and Research (A*STAR), Singapore, Republic of Singapore.
- Bioinformatics Institute (BII), Agency for Science Technology and Research (A*STAR), Singapore, Republic of Singapore.
| |
Collapse
|
12
|
Mohammadzadeh-Vardin T, Ghareyazi A, Gharizadeh A, Abbasi K, Rabiee HR. DeepDRA: Drug repurposing using multi-omics data integration with autoencoders. PLoS One 2024; 19:e0307649. [PMID: 39058696 PMCID: PMC11280260 DOI: 10.1371/journal.pone.0307649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 07/09/2024] [Indexed: 07/28/2024] Open
Abstract
Cancer treatment has become one of the biggest challenges in the world today. Different treatments are used against cancer; drug-based treatments have shown better results. On the other hand, designing new drugs for cancer is costly and time-consuming. Some computational methods, such as machine learning and deep learning, have been suggested to solve these challenges using drug repurposing. Despite the promise of classical machine-learning methods in repurposing cancer drugs and predicting responses, deep-learning methods performed better. This study aims to develop a deep-learning model that predicts cancer drug response based on multi-omics data, drug descriptors, and drug fingerprints and facilitates the repurposing of drugs based on those responses. To reduce multi-omics data's dimensionality, we use autoencoders. As a multi-task learning model, autoencoders are connected to MLPs. We extensively tested our model using three primary datasets: GDSC, CTRP, and CCLE to determine its efficacy. In multiple experiments, our model consistently outperforms existing state-of-the-art methods. Compared to state-of-the-art models, our model achieves an impressive AUPRC of 0.99. Furthermore, in a cross-dataset evaluation, where the model is trained on GDSC and tested on CCLE, it surpasses the performance of three previous works, achieving an AUPRC of 0.72. In conclusion, we presented a deep learning model that outperforms the current state-of-the-art regarding generalization. Using this model, we could assess drug responses and explore drug repurposing, leading to the discovery of novel cancer drugs. Our study highlights the potential for advanced deep learning to advance cancer therapeutic precision.
Collapse
Affiliation(s)
- Taha Mohammadzadeh-Vardin
- Department of Computer Engineering, Bioinformatics and Computational Biology Lab, Sharif University of Technology, Tehran, Iran
| | - Amin Ghareyazi
- Department of Computer Engineering, Bioinformatics and Computational Biology Lab, Sharif University of Technology, Tehran, Iran
| | - Ali Gharizadeh
- Department of Computer Engineering, Bioinformatics and Computational Biology Lab, Sharif University of Technology, Tehran, Iran
| | - Karim Abbasi
- Department of Computer Engineering, Bioinformatics and Computational Biology Lab, Sharif University of Technology, Tehran, Iran
- Faculty of Mathematics and Computer Science, Kharazmi University, Tehran, Iran
| | - Hamid R. Rabiee
- Department of Computer Engineering, Bioinformatics and Computational Biology Lab, Sharif University of Technology, Tehran, Iran
| |
Collapse
|
13
|
Lenhof K, Eckhart L, Rolli LM, Lenhof HP. Trust me if you can: a survey on reliability and interpretability of machine learning approaches for drug sensitivity prediction in cancer. Brief Bioinform 2024; 25:bbae379. [PMID: 39101498 PMCID: PMC11299037 DOI: 10.1093/bib/bbae379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 07/08/2024] [Accepted: 07/19/2024] [Indexed: 08/06/2024] Open
Abstract
With the ever-increasing number of artificial intelligence (AI) systems, mitigating risks associated with their use has become one of the most urgent scientific and societal issues. To this end, the European Union passed the EU AI Act, proposing solution strategies that can be summarized under the umbrella term trustworthiness. In anti-cancer drug sensitivity prediction, machine learning (ML) methods are developed for application in medical decision support systems, which require an extraordinary level of trustworthiness. This review offers an overview of the ML landscape of methods for anti-cancer drug sensitivity prediction, including a brief introduction to the four major ML realms (supervised, unsupervised, semi-supervised, and reinforcement learning). In particular, we address the question to what extent trustworthiness-related properties, more specifically, interpretability and reliability, have been incorporated into anti-cancer drug sensitivity prediction methods over the previous decade. In total, we analyzed 36 papers with approaches for anti-cancer drug sensitivity prediction. Our results indicate that the need for reliability has hardly been addressed so far. Interpretability, on the other hand, has often been considered for model development. However, the concept is rather used intuitively, lacking clear definitions. Thus, we propose an easily extensible taxonomy for interpretability, unifying all prevalent connotations explicitly or implicitly used within the field.
Collapse
Affiliation(s)
- Kerstin Lenhof
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, D-66123 Saarbrücken, Saarland, Germany
| | - Lea Eckhart
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, D-66123 Saarbrücken, Saarland, Germany
| | - Lisa-Marie Rolli
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, D-66123 Saarbrücken, Saarland, Germany
| | - Hans-Peter Lenhof
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, D-66123 Saarbrücken, Saarland, Germany
| |
Collapse
|
14
|
Yeh SJ, Paithankar S, Chen R, Xing J, Sun M, Liu K, Zhou J, Chen B. TransCell: In Silico Characterization of Genomic Landscape and Cellular Responses by Deep Transfer Learning. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzad008. [PMID: 39240541 PMCID: PMC11378636 DOI: 10.1093/gpbjnl/qzad008] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 06/30/2023] [Accepted: 09/20/2023] [Indexed: 09/07/2024]
Abstract
Gene expression profiling of new or modified cell lines becomes routine today; however, obtaining comprehensive molecular characterization and cellular responses for a variety of cell lines, including those derived from underrepresented groups, is not trivial when resources are minimal. Using gene expression to predict other measurements has been actively explored; however, systematic investigation of its predictive power in various measurements has not been well studied. Here, we evaluated commonly used machine learning methods and presented TransCell, a two-step deep transfer learning framework that utilized the knowledge derived from pan-cancer tumor samples to predict molecular features and responses. Among these models, TransCell had the best performance in predicting metabolite, gene effect score (or genetic dependency), and drug sensitivity, and had comparable performance in predicting mutation, copy number variation, and protein expression. Notably, TransCell improved the performance by over 50% in drug sensitivity prediction and achieved a correlation of 0.7 in gene effect score prediction. Furthermore, predicted drug sensitivities revealed potential repurposing candidates for new 100 pediatric cancer cell lines, and predicted gene effect scores reflected BRAF resistance in melanoma cell lines. Together, we investigated the predictive power of gene expression in six molecular measurement types and developed a web portal (http://apps.octad.org/transcell/) that enables the prediction of 352,000 genomic and cellular response features solely from gene expression profiles.
Collapse
Affiliation(s)
- Shan-Ju Yeh
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA
| | - Shreya Paithankar
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA
| | - Ruoqiao Chen
- Department of Pharmacology and Toxicology, Michigan State University, Grand Rapids, MI 49503, USA
| | - Jing Xing
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA
| | - Mengying Sun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Ke Liu
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA
| | - Jiayu Zhou
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Bin Chen
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA
- Department of Pharmacology and Toxicology, Michigan State University, Grand Rapids, MI 49503, USA
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
15
|
Bang D, Koo B, Kim S. Transfer learning of condition-specific perturbation in gene interactions improves drug response prediction. Bioinformatics 2024; 40:i130-i139. [PMID: 38940127 PMCID: PMC11256952 DOI: 10.1093/bioinformatics/btae249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
SUMMARY Drug response is conventionally measured at the cell level, often quantified by metrics like IC50. However, to gain a deeper understanding of drug response, cellular outcomes need to be understood in terms of pathway perturbation. This perspective leads us to recognize a challenge posed by the gap between two widely used large-scale databases, LINCS L1000 and GDSC, measuring drug response at different levels-L1000 captures information at the gene expression level, while GDSC operates at the cell line level. Our study aims to bridge this gap by integrating the two databases through transfer learning, focusing on condition-specific perturbations in gene interactions from L1000 to interpret drug response integrating both gene and cell levels in GDSC. This transfer learning strategy involves pretraining on the transcriptomic-level L1000 dataset, with parameter-frozen fine-tuning to cell line-level drug response. Our novel condition-specific gene-gene attention (CSG2A) mechanism dynamically learns gene interactions specific to input conditions, guided by both data and biological network priors. The CSG2A network, equipped with transfer learning strategy, achieves state-of-the-art performance in cell line-level drug response prediction. In two case studies, well-known mechanisms of drugs are well represented in both the learned gene-gene attention and the predicted transcriptomic profiles. This alignment supports the modeling power in terms of interpretability and biological relevance. Furthermore, our model's unique capacity to capture drug response in terms of both pathway perturbation and cell viability extends predictions to the patient level using TCGA data, demonstrating its expressive power obtained from both gene and cell levels. AVAILABILITY AND IMPLEMENTATION The source code for the CSG2A network is available at https://github.com/eugenebang/CSG2A.
Collapse
Affiliation(s)
- Dongmin Bang
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea
- AIGENDRUG Co., Ltd., Seoul, 08758, Republic of Korea
| | - Bonil Koo
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea
- AIGENDRUG Co., Ltd., Seoul, 08758, Republic of Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea
- AIGENDRUG Co., Ltd., Seoul, 08758, Republic of Korea
- Department of Computer Science and Engineering, Seoul National University, Seoul, 08826, Republic of Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, 08826, Republic of Korea
| |
Collapse
|
16
|
Tang X, Tran A, Tan J, Gerstein MB. MolLM: a unified language model for integrating biomedical text with 2D and 3D molecular representations. Bioinformatics 2024; 40:i357-i368. [PMID: 38940177 PMCID: PMC11256921 DOI: 10.1093/bioinformatics/btae260] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION The current paradigm of deep learning models for the joint representation of molecules and text primarily relies on 1D or 2D molecular formats, neglecting significant 3D structural information that offers valuable physical insight. This narrow focus inhibits the models' versatility and adaptability across a wide range of modalities. Conversely, the limited research focusing on explicit 3D representation tends to overlook textual data within the biomedical domain. RESULTS We present a unified pre-trained language model, MolLM, that concurrently captures 2D and 3D molecular information alongside biomedical text. MolLM consists of a text Transformer encoder and a molecular Transformer encoder, designed to encode both 2D and 3D molecular structures. To support MolLM's self-supervised pre-training, we constructed 160K molecule-text pairings. Employing contrastive learning as a supervisory signal for learning, MolLM demonstrates robust molecular representation capabilities across four downstream tasks, including cross-modal molecule and text matching, property prediction, captioning, and text-prompted molecular editing. Through ablation, we demonstrate that the inclusion of explicit 3D representations improves performance in these downstream tasks. AVAILABILITY AND IMPLEMENTATION Our code, data, pre-trained model weights, and examples of using our model are all available at https://github.com/gersteinlab/MolLM. In particular, we provide Jupyter Notebooks offering step-by-step guidance on how to use MolLM to extract embeddings for both molecules and text.
Collapse
Affiliation(s)
- Xiangru Tang
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Andrew Tran
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Jeffrey Tan
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Mark B Gerstein
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT 06520, United States
- Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, CT 06520, United States
- Department of Statistics & Data Science, Yale University, New Haven, CT 06520, United States
- Department of Biomedical Informatics & Data Science, Yale University, New Haven, CT 06520, United States
| |
Collapse
|
17
|
Yang X, Tang X, Li C, Han H. Singular value thresholding two-stage matrix completion for drug sensitivity discovery. Comput Biol Chem 2024; 110:108071. [PMID: 38718497 DOI: 10.1016/j.compbiolchem.2024.108071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 04/06/2024] [Accepted: 04/11/2024] [Indexed: 05/27/2024]
Abstract
Incomplete data presents significant challenges in drug sensitivity analysis, especially in critical areas like oncology, where precision is paramount. Our study introduces an innovative imputation method designed specifically for low-rank matrices, addressing the crucial challenge of data completion in anticancer drug sensitivity testing. Our method unfolds in two main stages: Initially, the singular value thresholding algorithm is employed for preliminary matrix completion, establishing a solid foundation for subsequent steps. Then, the matrix rows are segmented into distinct blocks based on hierarchical clustering of correlation coefficients, applying singular value thresholding to the largest block, which has been proved to possess the largest entropy. This is followed by a refined data restoration process, where the reconstructed largest block is integrated into the initial matrix completion to achieve the final matrix completion. Compared to other methods, our approach not only improves the accuracy of data restoration but also ensures the integrity and reliability of the imputed values, establishing it as a robust tool for future drug sensitivity analysis.
Collapse
Affiliation(s)
- Xuemei Yang
- School of Mathematics and Statistics, Xianyang Normal University, Xianyang, 712000, China.
| | - Xiaoduan Tang
- School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China.
| | - Chun Li
- College of Elementary Education, Hainan Normal University, Haikou 571158, China; Key Laboratory of Data Science and Intelligence Education of Ministry of Education, Hainan Normal University, Haikou 571158, China.
| | - Henry Han
- The Laboratory of Data Science and Artificial Intelligence Innovation, Department of Computer Science, School of Engineering and Computer Science, Baylor University, Waco, TX 76798 USA.
| |
Collapse
|
18
|
Sotudian S, Paschalidis IC. ITNR: Inversion Transformer-based Neural Ranking for cancer drug recommendations. Comput Biol Med 2024; 172:108312. [PMID: 38503090 PMCID: PMC10990436 DOI: 10.1016/j.compbiomed.2024.108312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Revised: 03/09/2024] [Accepted: 03/12/2024] [Indexed: 03/21/2024]
Abstract
Personalized drug response prediction is an approach for tailoring effective therapeutic strategies for patients based on their tumors' genomic characterization. While machine learning methods are widely employed in the literature, they often struggle to capture drug-cell line relations across various cell lines. In addressing this challenge, our study introduces a novel listwise Learning-to-Rank (LTR) model named Inversion Transformer-based Neural Ranking (ITNR). ITNR utilizes genomic features and a transformer architecture to decipher functional relationships and construct models that can predict patient-specific drug responses. Our experiments were conducted on three major drug response data sets, showing that ITNR reliably and consistently outperforms state-of-the-art LTR models.
Collapse
Affiliation(s)
- Shahabeddin Sotudian
- Department of Electrical and Computer Engineering, Division of Systems Engineering, Boston University, Boston, MA, USA.
| | - Ioannis Ch Paschalidis
- Department of Electrical and Computer Engineering, Division of Systems Engineering, Boston University, Boston, MA, USA; Department of Biomedical Engineering, and Faculty of Computing and Data Sciences, Boston University, Boston, MA, USA.
| |
Collapse
|
19
|
Chen Y, Zhang L. Hi-GeoMVP: a hierarchical geometry-enhanced deep learning model for drug response prediction. Bioinformatics 2024; 40:btae204. [PMID: 38614131 PMCID: PMC11060866 DOI: 10.1093/bioinformatics/btae204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 02/11/2024] [Accepted: 04/11/2024] [Indexed: 04/15/2024] Open
Abstract
MOTIVATION Personalized cancer treatments require accurate drug response predictions. Existing deep learning methods show promise but higher accuracy is needed to serve the purpose of precision medicine. The prediction accuracy can be improved with not only topology but geometrical information of drugs. RESULTS A novel deep learning methodology for drug response prediction is presented, named Hi-GeoMVP. It synthesizes hierarchical drug representation with multi-omics data, leveraging graph neural networks and variational autoencoders for detailed drug and cell line representations. Multi-task learning is employed to make better prediction, while both 2D and 3D molecular representations capture comprehensive drug information. Testing on the GDSC dataset confirms Hi-GeoMVP's enhanced performance, surpassing prior state-of-the-art methods by improving the Pearson correlation coefficient from 0.934 to 0.941 and decreasing the root mean square error from 0.969 to 0.931. In the case of blind test, Hi-GeoMVP demonstrated robustness, outperforming the best previous models with a superior Pearson correlation coefficient in the drug-blind test. These results underscore Hi-GeoMVP's capabilities in drug response prediction, implying its potential for precision medicine. AVAILABILITY AND IMPLEMENTATION The source code is available at https://github.com/matcyr/Hi-GeoMVP.
Collapse
Affiliation(s)
- Yurui Chen
- Department of Mathematics and the Centre for Data Science and Machine Learning, National University of Singapore, Singapore 119076, Singapore
| | - Louxin Zhang
- Department of Mathematics and the Centre for Data Science and Machine Learning, National University of Singapore, Singapore 119076, Singapore
| |
Collapse
|
20
|
Li P, Jiang Z, Liu T, Liu X, Qiao H, Yao X. Improving drug response prediction via integrating gene relationships with deep learning. Brief Bioinform 2024; 25:bbae153. [PMID: 38600666 PMCID: PMC11006795 DOI: 10.1093/bib/bbae153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 03/05/2024] [Accepted: 03/18/2024] [Indexed: 04/12/2024] Open
Abstract
Predicting the drug response of cancer cell lines is crucial for advancing personalized cancer treatment, yet remains challenging due to tumor heterogeneity and individual diversity. In this study, we present a deep learning-based framework named Deep neural network Integrating Prior Knowledge (DIPK) (DIPK), which adopts self-supervised techniques to integrate multiple valuable information, including gene interaction relationships, gene expression profiles and molecular topologies, to enhance prediction accuracy and robustness. We demonstrated the superior performance of DIPK compared to existing methods on both known and novel cells and drugs, underscoring the importance of gene interaction relationships in drug response prediction. In addition, DIPK extends its applicability to single-cell RNA sequencing data, showcasing its capability for single-cell-level response prediction and cell identification. Further, we assess the applicability of DIPK on clinical data. DIPK accurately predicted a higher response to paclitaxel in the pathological complete response (pCR) group compared to the residual disease group, affirming the better response of the pCR group to the chemotherapy compound. We believe that the integration of DIPK into clinical decision-making processes has the potential to enhance individualized treatment strategies for cancer patients.
Collapse
Affiliation(s)
- Pengyong Li
- School of Computer Science and Technology,Xidian University, 710126 Xi’an, Shaanxi, China
- State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, 519020 Macau, China
| | - Zhengxiang Jiang
- School of Electronic Engineering, Xidian University, 710126 Xi’an, Shaanxi, China
| | - Tianxiao Liu
- School of Computer Science and Technology,Xidian University, 710126 Xi’an, Shaanxi, China
| | - Xinyu Liu
- Beijing Laboratory of Biomedical Materials, Department of Geriatric Dentistry, Peking University School and Hospital of Stomatology, 100081 Beijing, China
| | - Hui Qiao
- Department of Oncology, Tai’an Municipal Hospital, 271021 Tai’an, Shandong, China
| | - Xiaojun Yao
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, 999078 Macao, China
| |
Collapse
|
21
|
Hajim WI, Zainudin S, Mohd Daud K, Alheeti K. Optimized models and deep learning methods for drug response prediction in cancer treatments: a review. PeerJ Comput Sci 2024; 10:e1903. [PMID: 38660174 PMCID: PMC11042005 DOI: 10.7717/peerj-cs.1903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 01/31/2024] [Indexed: 04/26/2024]
Abstract
Recent advancements in deep learning (DL) have played a crucial role in aiding experts to develop personalized healthcare services, particularly in drug response prediction (DRP) for cancer patients. The DL's techniques contribution to this field is significant, and they have proven indispensable in the medical field. This review aims to analyze the diverse effectiveness of various DL models in making these predictions, drawing on research published from 2017 to 2023. We utilized the VOS-Viewer 1.6.18 software to create a word cloud from the titles and abstracts of the selected studies. This study offers insights into the focus areas within DL models used for drug response. The word cloud revealed a strong link between certain keywords and grouped themes, highlighting terms such as deep learning, machine learning, precision medicine, precision oncology, drug response prediction, and personalized medicine. In order to achieve an advance in DRP using DL, the researchers need to work on enhancing the models' generalizability and interoperability. It is also crucial to develop models that not only accurately represent various architectures but also simplify these architectures, balancing the complexity with the predictive capabilities. In the future, researchers should try to combine methods that make DL models easier to understand; this will make DRP reviews more open and help doctors trust the decisions made by DL models in cancer DRP.
Collapse
Affiliation(s)
- Wesam Ibrahim Hajim
- Department of Applied Geology, College of Sciences, Tirkit University, Tikrit, Salah ad Din, Iraq
- Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Selangor, Malaysia
| | - Suhaila Zainudin
- Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Selangor, Malaysia
| | - Kauthar Mohd Daud
- Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Selangor, Malaysia
| | - Khattab Alheeti
- Department of Computer Networking Systems, College of Computer Sciences and Information Technology, University of Anbar, Al Anbar, Ramadi, Iraq
| |
Collapse
|
22
|
Qin Y, Huo M, Liu X, Li SC. Biomarkers and computational models for predicting efficacy to tumor ICI immunotherapy. Front Immunol 2024; 15:1368749. [PMID: 38524135 PMCID: PMC10957591 DOI: 10.3389/fimmu.2024.1368749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 02/27/2024] [Indexed: 03/26/2024] Open
Abstract
Numerous studies have shown that immune checkpoint inhibitor (ICI) immunotherapy has great potential as a cancer treatment, leading to significant clinical improvements in numerous cases. However, it benefits a minority of patients, underscoring the importance of discovering reliable biomarkers that can be used to screen for potential beneficiaries and ultimately reduce the risk of overtreatment. Our comprehensive review focuses on the latest advancements in predictive biomarkers for ICI therapy, particularly emphasizing those that enhance the efficacy of programmed cell death protein 1 (PD-1)/programmed cell death-ligand 1 (PD-L1) inhibitors and cytotoxic T-lymphocyte antigen-4 (CTLA-4) inhibitors immunotherapies. We explore biomarkers derived from various sources, including tumor cells, the tumor immune microenvironment (TIME), body fluids, gut microbes, and metabolites. Among them, tumor cells-derived biomarkers include tumor mutational burden (TMB) biomarker, tumor neoantigen burden (TNB) biomarker, microsatellite instability (MSI) biomarker, PD-L1 expression biomarker, mutated gene biomarkers in pathways, and epigenetic biomarkers. TIME-derived biomarkers include immune landscape of TIME biomarkers, inhibitory checkpoints biomarkers, and immune repertoire biomarkers. We also discuss various techniques used to detect and assess these biomarkers, detailing their respective datasets, strengths, weaknesses, and evaluative metrics. Furthermore, we present a comprehensive review of computer models for predicting the response to ICI therapy. The computer models include knowledge-based mechanistic models and data-based machine learning (ML) models. Among the knowledge-based mechanistic models are pharmacokinetic/pharmacodynamic (PK/PD) models, partial differential equation (PDE) models, signal networks-based models, quantitative systems pharmacology (QSP) models, and agent-based models (ABMs). ML models include linear regression models, logistic regression models, support vector machine (SVM)/random forest/extra trees/k-nearest neighbors (KNN) models, artificial neural network (ANN) and deep learning models. Additionally, there are hybrid models of systems biology and ML. We summarized the details of these models, outlining the datasets they utilize, their evaluation methods/metrics, and their respective strengths and limitations. By summarizing the major advances in the research on predictive biomarkers and computer models for the therapeutic effect and clinical utility of tumor ICI, we aim to assist researchers in choosing appropriate biomarkers or computer models for research exploration and help clinicians conduct precision medicine by selecting the best biomarkers.
Collapse
Affiliation(s)
- Yurong Qin
- Department of Computer Science, City University of Hong Kong, Kowloon, China
- City University of Hong Kong Shenzhen Research Institute, Shenzhen, Guangdong, China
| | - Miaozhe Huo
- Department of Computer Science, City University of Hong Kong, Kowloon, China
- City University of Hong Kong Shenzhen Research Institute, Shenzhen, Guangdong, China
| | - Xingwu Liu
- School of Mathematical Sciences, Dalian University of Technology, Dalian, Liaoning, China
| | - Shuai Cheng Li
- Department of Computer Science, City University of Hong Kong, Kowloon, China
- City University of Hong Kong Shenzhen Research Institute, Shenzhen, Guangdong, China
| |
Collapse
|
23
|
Branson N, Cutillas PR, Bessant C. Comparison of multiple modalities for drug response prediction with learning curves using neural networks and XGBoost. BIOINFORMATICS ADVANCES 2023; 4:vbad190. [PMID: 38282976 PMCID: PMC10812874 DOI: 10.1093/bioadv/vbad190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 12/19/2023] [Accepted: 12/22/2023] [Indexed: 01/30/2024]
Abstract
Motivation Anti-cancer drug response prediction is a central problem within stratified medicine. Transcriptomic profiles of cancer cell lines are typically used for drug response prediction, but we hypothesize that proteomics or phosphoproteomics might be more suitable as they give a more direct insight into cellular processes. However, there has not yet been a systematic comparison between all three of these datatypes using consistent evaluation criteria. Results Due to the limited number of cell lines with phosphoproteomics profiles we use learning curves, a plot of predictive performance as a function of dataset size, to compare the current performance and predict the future performance of the three omics datasets with more data. We use neural networks and XGBoost and compare them against a simple rule-based benchmark. We show that phosphoproteomics slightly outperforms RNA-seq and proteomics using the 38 cell lines with profiles of all three omics data types. Furthermore, using the 877 cell lines with proteomics and RNA-seq profiles, we show that RNA-seq slightly outperforms proteomics. With the learning curves we predict that the mean squared error using the phosphoproteomics dataset would decrease by ∼ 15 % if a dataset of the same size as the proteomics/transcriptomics was collected. For the cell lines with proteomics and RNA-seq profiles the learning curves reveal that for smaller dataset sizes neural networks outperform XGBoost and vice versa for larger datasets. Furthermore, the trajectory of the XGBoost curve suggests that it will improve faster than the neural networks as more data are collected. Availability and implementation See https://github.com/Nik-BB/Learning-curves-for-DRP for the code used.
Collapse
Affiliation(s)
- Nikhil Branson
- School of Biological and Behavioural Sciences, Queen Mary University of London, London E1 4NS, United Kingdom
- Digital Environment Research Institute, Queen Mary University of London, London E1 1HH, United Kingdom
| | - Pedro R Cutillas
- Centre for Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ, United Kingdom
| | - Conrad Bessant
- School of Biological and Behavioural Sciences, Queen Mary University of London, London E1 4NS, United Kingdom
- Digital Environment Research Institute, Queen Mary University of London, London E1 1HH, United Kingdom
| |
Collapse
|
24
|
Piochi LF, Preto AJ, Moreira IS. DELFOS-drug efficacy leveraging forked and specialized networks-benchmarking scRNA-seq data in multi-omics-based prediction of cancer sensitivity. Bioinformatics 2023; 39:btad645. [PMID: 37862234 PMCID: PMC10627353 DOI: 10.1093/bioinformatics/btad645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 09/28/2023] [Accepted: 10/19/2023] [Indexed: 10/22/2023] Open
Abstract
MOTIVATION Cancer is currently one of the most notorious diseases, with over 1 million deaths in the European Union alone in 2022. As each tumor can be composed of diverse cell types with distinct genotypes, cancer cells can acquire resistance to different compounds. Moreover, anticancer drugs can display severe side effects, compromising patient well-being. Therefore, novel strategies for identifying the optimal set of compounds to treat each tumor have become an important research topic in recent decades. RESULTS To address this challenge, we developed a novel drug response prediction algorithm called Drug Efficacy Leveraging Forked and Specialized networks (DELFOS). Our model learns from multi-omics data from over 65 cancer cell lines, as well as structural data from over 200 compounds, for the prediction of drug sensitivity. We also evaluated the benefits of incorporating single-cell expression data to predict drug response. DELFOS was validated using datasets with unseen cell lines or drugs and compared with other state-of-the-art algorithms, achieving a high prediction performance on several correlation and error metrics. Overall, DELFOS can effectively leverage multi-omics data for the prediction of drug responses in thousands of drug-cell line pairs. AVAILABILITY AND IMPLEMENTATION The DELFOS pipeline and associated data are available at github.com/MoreiraLAB/delfos.
Collapse
Affiliation(s)
- Luiz Felipe Piochi
- Department of Life Sciences, University of Coimbra, Coimbra 3000-456, Portugal
- CNC—Center for Neuroscience and Cell Biology, Center for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
- CIBB—Center for Innovative Biomedicine and Biotechnology, Coimbra 3004-504, Portugal
| | - António J Preto
- CNC—Center for Neuroscience and Cell Biology, Center for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
- CIBB—Center for Innovative Biomedicine and Biotechnology, Coimbra 3004-504, Portugal
- PhD Programme in Experimental Biology and Biomedicine, Institute for Interdisciplinary Research (IIIUC), University of Coimbra, Coimbra 3030-789, Portugal
| | - Irina S Moreira
- Department of Life Sciences, University of Coimbra, Coimbra 3000-456, Portugal
- CNC—Center for Neuroscience and Cell Biology, Center for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
- CIBB—Center for Innovative Biomedicine and Biotechnology, Coimbra 3004-504, Portugal
| |
Collapse
|
25
|
Xu X, Qi Z, Han X, Xu A, Geng Z, He X, Ren Y, Duo Z. Predicting anticancer drug sensitivity on distributed data sources using federated deep learning. Heliyon 2023; 9:e18615. [PMID: 37593639 PMCID: PMC10427996 DOI: 10.1016/j.heliyon.2023.e18615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 07/12/2023] [Accepted: 07/24/2023] [Indexed: 08/19/2023] Open
Abstract
Drug sensitivity prediction plays a crucial role in precision cancer therapy. Collaboration among medical institutions can lead to better performance in drug sensitivity prediction. However, patient privacy and data protection regulation remain a severe impediment to centralized prediction studies. For the first time, we proposed a federated drug sensitivity prediction model with high generalization, combining distributed data sources while protecting private data. Cell lines are first classified into three categories using the waterfall method. Focal loss for solving class imbalance is then embedded into the horizontal federated deep learning framework, i.e., HFDL-fl is presented. Applying HFDL-fl to homogeneous and heterogeneous data, we obtained HFDL-Cross and HFDL-Within. Our comprehensive experiments demonstrated that (i) collaboration by HFDL-fl outperforms private model on local data, (ii) focal loss function can effectively improve model performance to classify cell lines in sensitive and resistant categories, and (iii) HFDL-fl is not significantly affected by data heterogeneity. To summarize, HFDL-fl provides a valuable solution to break down the barriers between medical institutions for privacy-preserving drug sensitivity prediction and therefore facilitates the development of cancer precision medicine and other privacy-related biomedical research.
Collapse
Affiliation(s)
- Xiaolu Xu
- School of Computer and Artificial Intelligence, Liaoning Normal University, Dalian 116029, China
| | - Zitong Qi
- Department of Statistics, University of Washington, Seattle, WA 98195, USA
| | - Xiumei Han
- College of Artificial Intelligence, Dalian Maritime University, Dalian 116026, China
| | - Aiguo Xu
- Department of Oncology, The Second People's Hospital of Lianyungang, Lianyungang 222023, China
| | - Zhaohong Geng
- Department of Cardiology, Second Affiliated Hospital of Dalian Medical University, Dalian 116023, China
| | - Xinyu He
- School of Computer and Artificial Intelligence, Liaoning Normal University, Dalian 116029, China
| | - Yonggong Ren
- School of Computer and Artificial Intelligence, Liaoning Normal University, Dalian 116029, China
| | - Zhaojun Duo
- School of Computer and Artificial Intelligence, Liaoning Normal University, Dalian 116029, China
| |
Collapse
|
26
|
Oloulade BM, Gao J, Chen J, Al-Sabri R, Wu Z. Cancer drug response prediction with surrogate modeling-based graph neural architecture search. Bioinformatics 2023; 39:btad478. [PMID: 37555809 PMCID: PMC10432359 DOI: 10.1093/bioinformatics/btad478] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 06/01/2023] [Accepted: 08/08/2023] [Indexed: 08/10/2023] Open
Abstract
MOTIVATION Understanding drug-response differences in cancer treatments is one of the most challenging aspects of personalized medicine. Recently, graph neural networks (GNNs) have become state-of-the-art methods in many graph representation learning scenarios in bioinformatics. However, building an optimal handcrafted GNN model for a particular drug sensitivity dataset requires manual design and fine-tuning of the hyperparameters for the GNN model, which is time-consuming and requires expert knowledge. RESULTS In this work, we propose AutoCDRP, a novel framework for automated cancer drug-response predictor using GNNs. Our approach leverages surrogate modeling to efficiently search for the most effective GNN architecture. AutoCDRP uses a surrogate model to predict the performance of GNN architectures sampled from a search space, allowing it to select the optimal architecture based on evaluation performance. Hence, AutoCDRP can efficiently identify the optimal GNN architecture by exploring the performance of all GNN architectures in the search space. Through comprehensive experiments on two benchmark datasets, we demonstrate that the GNN architecture generated by AutoCDRP surpasses state-of-the-art designs. Notably, the optimal GNN architecture identified by AutoCDRP consistently outperforms the best baseline architecture from the first epoch, providing further evidence of its effectiveness. AVAILABILITY AND IMPLEMENTATION https://github.com/BeObm/AutoCDRP.
Collapse
Affiliation(s)
| | - Jianliang Gao
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jiamin Chen
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Raeed Al-Sabri
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Zhenpeng Wu
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
27
|
Shahzad M, Tahir MA, Alhussein M, Mobin A, Shams Malick RA, Anwar MS. NeuPD-A Neural Network-Based Approach to Predict Antineoplastic Drug Response. Diagnostics (Basel) 2023; 13:2043. [PMID: 37370938 DOI: 10.3390/diagnostics13122043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 06/01/2023] [Accepted: 06/05/2023] [Indexed: 06/29/2023] Open
Abstract
With the beginning of the high-throughput screening, in silico-based drug response analysis has opened lots of research avenues in the field of personalized medicine. For a decade, many different predicting techniques have been recommended for the antineoplastic (anti-cancer) drug response, but still, there is a need for improvements in drug sensitivity prediction. The intent of this research study is to propose a framework, namely NeuPD, to validate the potential anti-cancer drugs against a panel of cancer cell lines in publicly available datasets. The datasets used in this work are Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE). As not all drugs are effective on cancer cell lines, we have worked on 10 essential drugs from the GDSC dataset that have achieved the best modeling results in previous studies. We also extracted 1610 essential oncogene expressions from 983 cell lines from the same dataset. Whereas, from the CCLE dataset, 16,383 gene expressions from 1037 cell lines and 24 drugs have been used in our experiments. For dimensionality reduction, Pearson correlation is applied to best fit the model. We integrate the genomic features of cell lines and drugs' fingerprints to fit the neural network model. For evaluation of the proposed NeuPD framework, we have used repeated K-fold cross-validation with 5 times repeats where K = 10 to demonstrate the performance in terms of root mean square error (RMSE) and coefficient determination (R2). The results obtained on the GDSC dataset that were measured using these cost functions show that our proposed NeuPD framework has outperformed existing approaches with an RMSE of 0.490 and R2 of 0.929.
Collapse
Affiliation(s)
- Muhammad Shahzad
- FAST School of Computing, National University of Computer and Emerging Sciences (NUCES-FAST), Karachi 75030, Pakistan
| | - Muhammad Atif Tahir
- FAST School of Computing, National University of Computer and Emerging Sciences (NUCES-FAST), Karachi 75030, Pakistan
| | - Musaed Alhussein
- Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, P.O. Box 51178, Riyadh 11543, Saudi Arabia
| | - Ansharah Mobin
- FAST School of Computing, National University of Computer and Emerging Sciences (NUCES-FAST), Karachi 75030, Pakistan
| | - Rauf Ahmed Shams Malick
- FAST School of Computing, National University of Computer and Emerging Sciences (NUCES-FAST), Karachi 75030, Pakistan
| | - Muhammad Shahid Anwar
- Department of AI and Software, Gachon University, Seongnam-si 13120, Republic of Korea
| |
Collapse
|
28
|
Zhang M, Gao H, Liao X, Ning B, Gu H, Yu B. DBGRU-SE: predicting drug-drug interactions based on double BiGRU and squeeze-and-excitation attention mechanism. Brief Bioinform 2023:7176312. [PMID: 37225428 DOI: 10.1093/bib/bbad184] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 04/03/2023] [Accepted: 04/23/2023] [Indexed: 05/26/2023] Open
Abstract
The prediction of drug-drug interactions (DDIs) is essential for the development and repositioning of new drugs. Meanwhile, they play a vital role in the fields of biopharmaceuticals, disease diagnosis and pharmacological treatment. This article proposes a new method called DBGRU-SE for predicting DDIs. Firstly, FP3 fingerprints, MACCS fingerprints, Pubchem fingerprints and 1D and 2D molecular descriptors are used to extract the feature information of the drugs. Secondly, Group Lasso is used to remove redundant features. Then, SMOTE-ENN is applied to balance the data to obtain the best feature vectors. Finally, the best feature vectors are fed into the classifier combining BiGRU and squeeze-and-excitation (SE) attention mechanisms to predict DDIs. After applying five-fold cross-validation, The ACC values of DBGRU-SE model on the two datasets are 97.51 and 94.98%, and the AUC are 99.60 and 98.85%, respectively. The results showed that DBGRU-SE had good predictive performance for drug-drug interactions.
Collapse
Affiliation(s)
| | - Hongli Gao
- Qingdao University of Science and Technology, China
| | - Xin Liao
- Qingdao University of Science and Technology, China
| | - Baoxing Ning
- Qingdao University of Science and Technology, China
| | - Haiming Gu
- Qingdao University of Science and Technology, China
| | - Bin Yu
- Qingdao University of Science and Technology, China
| |
Collapse
|
29
|
Singh DP, Kaushik B. CTDN (Convolutional Temporal Based Deep- Neural Network): An Improvised Stacked Hybrid Computational Approach for Anticancer Drug Response Prediction. Comput Biol Chem 2023; 105:107868. [PMID: 37257399 DOI: 10.1016/j.compbiolchem.2023.107868] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 03/31/2023] [Accepted: 04/04/2023] [Indexed: 06/02/2023]
Abstract
The characterization of drug - metabolizing enzymes is a significant problem for customized therapy. It is important to choose the right drugs for cancer victims, and the ability to forecast how those drugs will react is usually based on the available information, genetic sequence, and structural properties. To the finest of our knowledge, this is the first study to evaluate optimization algorithms for selection of features and pharmacogenetics categorization using classification methods based on a successful evolutionary algorithm using datasets from the Cancer Cell Line Encyclopaedia (CCLE) and Genomics of Drug Sensitivity in Cancer (GDSC). The study proposes the uses of Firefly and Grey Wolf Optimization techniques for feature extraction, while comparing the traditional Machine Learning (ML), ensemble ML and Stacking Algorithm with the proposed Convolutional Temporal Deep Neural Network or CTDN. With the potential to increase efficiency from the suggested intelligible classifier model for a suggestive chemotherapeutic drugs response prediction, our study is important in particular for selecting an acceptable feature selection method. The comparison analysis demonstrates that the proposed model not only surpasses the prior state-of-the-art methods, but also uses Grey Wolf and Fire Fly Optimization to lessen multicollinearity and overfitting.
Collapse
Affiliation(s)
- Davinder Paul Singh
- School of Computer Science and Engineering, Shri Mata Vaishno Devi University, Katra 182320, Jammu and Kashmir, India.
| | - Baijnath Kaushik
- School of Computer Science and Engineering, Shri Mata Vaishno Devi University, Katra 182320, Jammu and Kashmir, India
| |
Collapse
|
30
|
Peng W, Chen T, Liu H, Dai W, Yu N, Lan W. Improving drug response prediction based on two-space graph convolution. Comput Biol Med 2023; 158:106859. [PMID: 37023539 DOI: 10.1016/j.compbiomed.2023.106859] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Revised: 02/22/2023] [Accepted: 03/30/2023] [Indexed: 04/03/2023]
Abstract
Patients with the same cancer types may present different genomic features and therefore have different drug sensitivities. Accordingly, correctly predicting patients' responses to the drugs can guide treatment decisions and improve the outcome of cancer patients. Existing computational methods leverage the graph convolution network model to aggregate features of different types of nodes in the heterogeneous network. They most fail to consider the similarity between homogeneous nodes. To this end, we propose an algorithm based on two-space graph convolutional neural networks, TSGCNN, to predict the response of anticancer drugs. TSGCNN first constructs the cell line feature space and the drug feature space and separately performs the graph convolution operation on the feature spaces to diffuse similarity information among homogeneous nodes. After that, we generate a heterogeneous network based on the known cell line and drug relationship and perform graph convolution operations on the heterogeneous network to collect the features of different types of nodes. Subsequently, the algorithm produces the final feature representations for cell lines and drugs by adding their self features, the feature space representations, and the heterogeneous space representations. Finally, we leverage the linear correlation coefficient decoder to reconstruct the cell line-drug correlation matrix for drug response prediction based on the final representations. We tested our model on the Cancer Drug Sensitivity Data (GDSC) and Cancer Cell Line Encyclopedia (CCLE) databases. The results indicate that TSGCNN shows excellent performance drug response prediction compared with other eight state-of-the-art methods.
Collapse
Affiliation(s)
- Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050, China.
| | - Tielin Chen
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650050, China
| | - Hancheng Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650050, China
| | - Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650050, China; Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050, China
| | - Ning Yu
- State University of New York, The College at Brockport, Department of Computing Sciences, 350 New Campus Drive, Brockport, NY 14422, United States of America
| | - Wei Lan
- School of Computer Electronic and Information, Guangxi University, Nanning, Guangxi 530004, China
| |
Collapse
|
31
|
Badwan BA, Liaropoulos G, Kyrodimos E, Skaltsas D, Tsirigos A, Gorgoulis VG. Machine learning approaches to predict drug efficacy and toxicity in oncology. CELL REPORTS METHODS 2023; 3:100413. [PMID: 36936080 PMCID: PMC10014302 DOI: 10.1016/j.crmeth.2023.100413] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
In recent years, there has been a surge of interest in using machine learning algorithms (MLAs) in oncology, particularly for biomedical applications such as drug discovery, drug repurposing, diagnostics, clinical trial design, and pharmaceutical production. MLAs have the potential to provide valuable insights and predictions in these areas by representing both the disease state and the therapeutic agents used to treat it. To fully utilize the capabilities of MLAs in oncology, it is important to understand the fundamental concepts underlying these algorithms and how they can be applied to assess the efficacy and toxicity of therapeutics. In this perspective, we lay out approaches to represent both the disease state and the therapeutic agents used by MLAs to derive novel insights and make relevant predictions.
Collapse
Affiliation(s)
| | | | - Efthymios Kyrodimos
- First ENT Department, Hippocration Hospital, National Kapodistrian University of Athens, Athens, GR 11527, Greece
| | | | - Aristotelis Tsirigos
- Department of Medicine, New York University School of Medicine, New York, NY 10016, USA
- Department of Pathology, New York University School of Medicine, New York, NY 10016, USA
| | - Vassilis G. Gorgoulis
- Intelligencia Inc, New York, NY 10014, USA
- Department of Histology and Embryology, Faculty of Medicine, School of Health Sciences, National Kapodistrian University of Athens, Athens 11527, Greece
- Ninewells Hospital and Medical School, University of Dundee, Dundee DD1 9SY, UK
- Biomedical Research Foundation, Academy of Athens, Athens 11527, Greece
- Molecular and Clinical Cancer Sciences, Manchester Cancer Research Centre, Manchester Academic Health Sciences Centre, University of Manchester, Manchester M20 4GJ, UK
| |
Collapse
|
32
|
Partin A, Brettin TS, Zhu Y, Narykov O, Clyde A, Overbeek J, Stevens RL. Deep learning methods for drug response prediction in cancer: Predominant and emerging trends. Front Med (Lausanne) 2023; 10:1086097. [PMID: 36873878 PMCID: PMC9975164 DOI: 10.3389/fmed.2023.1086097] [Citation(s) in RCA: 39] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 01/23/2023] [Indexed: 02/17/2023] Open
Abstract
Cancer claims millions of lives yearly worldwide. While many therapies have been made available in recent years, by in large cancer remains unsolved. Exploiting computational predictive models to study and treat cancer holds great promise in improving drug development and personalized design of treatment plans, ultimately suppressing tumors, alleviating suffering, and prolonging lives of patients. A wave of recent papers demonstrates promising results in predicting cancer response to drug treatments while utilizing deep learning methods. These papers investigate diverse data representations, neural network architectures, learning methodologies, and evaluations schemes. However, deciphering promising predominant and emerging trends is difficult due to the variety of explored methods and lack of standardized framework for comparing drug response prediction models. To obtain a comprehensive landscape of deep learning methods, we conducted an extensive search and analysis of deep learning models that predict the response to single drug treatments. A total of 61 deep learning-based models have been curated, and summary plots were generated. Based on the analysis, observable patterns and prevalence of methods have been revealed. This review allows to better understand the current state of the field and identify major challenges and promising solution paths.
Collapse
Affiliation(s)
- Alexander Partin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Thomas S. Brettin
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Yitan Zhu
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Oleksandr Narykov
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Austin Clyde
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Jamie Overbeek
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
| | - Rick L. Stevens
- Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL, United States
- Department of Computer Science, The University of Chicago, Chicago, IL, United States
| |
Collapse
|
33
|
Wang C, Lye X, Kaalia R, Kumar P, Rajapakse JC. Deep learning and multi-omics approach to predict drug responses in cancer. BMC Bioinformatics 2022; 22:632. [PMID: 36443676 PMCID: PMC9703655 DOI: 10.1186/s12859-022-04964-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 09/25/2022] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Cancers are genetically heterogeneous, so anticancer drugs show varying degrees of effectiveness on patients due to their differing genetic profiles. Knowing patient's responses to numerous cancer drugs are needed for personalized treatment for cancer. By using molecular profiles of cancer cell lines available from Cancer Cell Line Encyclopedia (CCLE) and anticancer drug responses available in the Genomics of Drug Sensitivity in Cancer (GDSC), we will build computational models to predict anticancer drug responses from molecular features. RESULTS We propose a novel deep neural network model that integrates multi-omics data available as gene expressions, copy number variations, gene mutations, reverse phase protein array expressions, and metabolomics expressions, in order to predict cellular responses to known anti-cancer drugs. We employ a novel graph embedding layer that incorporates interactome data as prior information for prediction. Moreover, we propose a novel attention layer that effectively combines different omics features, taking their interactions into account. The network outperformed feedforward neural networks and reported 0.90 for [Formula: see text] values for prediction of drug responses from cancer cell lines data available in CCLE and GDSC. CONCLUSION The outstanding results of our experiments demonstrate that the proposed method is capable of capturing the interactions of genes and proteins, and integrating multi-omics features effectively. Furthermore, both the results of ablation studies and the investigations of the attention layer imply that gene mutation has a greater influence on the prediction of drug responses than other omics data types. Therefore, we conclude that our approach can not only predict the anti-cancer drug response precisely but also provides insights into reaction mechanisms of cancer cell lines and drugs as well.
Collapse
Affiliation(s)
- Conghao Wang
- grid.59025.3b0000 0001 2224 0361School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798 Singapore
| | - Xintong Lye
- grid.59025.3b0000 0001 2224 0361School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798 Singapore
| | - Rama Kaalia
- grid.59025.3b0000 0001 2224 0361School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798 Singapore
| | - Parvin Kumar
- grid.59025.3b0000 0001 2224 0361School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798 Singapore
| | - Jagath C. Rajapakse
- grid.59025.3b0000 0001 2224 0361School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798 Singapore
| |
Collapse
|
34
|
Multi-Omics Alleviates the Limitations of Panel Sequencing for Cancer Drug Response Prediction. Cancers (Basel) 2022; 14:cancers14225604. [PMID: 36428696 PMCID: PMC9688044 DOI: 10.3390/cancers14225604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 11/10/2022] [Accepted: 11/12/2022] [Indexed: 11/17/2022] Open
Abstract
Comprehensive genomic profiling using cancer gene panels has been shown to improve treatment options for a variety of cancer types. However, genomic aberrations detected via such gene panels do not necessarily serve as strong predictors of drug sensitivity. In this study, using pharmacogenomics datasets of cell lines, patient-derived xenografts, and ex vivo treated fresh tumor specimens, we demonstrate that utilizing the transcriptome on top of gene panel features substantially improves drug response prediction performance in cancer.
Collapse
|
35
|
Lin X, Liu J, Zou Y, Tao C, Chen J. Xanthotoxol suppresses non-small cell lung cancer progression and might improve patients' prognosis. PHYTOMEDICINE : INTERNATIONAL JOURNAL OF PHYTOTHERAPY AND PHYTOPHARMACOLOGY 2022; 105:154364. [PMID: 35932608 DOI: 10.1016/j.phymed.2022.154364] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Revised: 06/17/2022] [Accepted: 07/26/2022] [Indexed: 05/16/2023]
Abstract
BACKGROUND Developing novel and effective drugs with less toxicity is urgent for non-small cell lung cancer (NSCLC) therapy. Xanthotoxol (Xan) is the major natural component of the medical plant Angelica dahurica with potential anti-cancer activities. PURPOSE In this study, we aimed to demonstrate the effect and underlying mechanism of Xan in NSCLC and evaluate the effectiveness of Xan in NSCLC patients. METHODS CCK8, colony formation, EdU, flow cytometry, and transwell assays were carried out to investigate the anti-NSCLC activity of Xan in vitro. In addition, the xenograft mouse model was established to evaluate the anti-NSCLC effect of Xan in vivo. Moreover, bioinformatics analysis was performed to establish a prediction model based on RNA sequencing data. Furthermore, Western blot was used to detect the expression of proteins regulated by Xan. RESULTS Xan inhibited the cell viability, colony formation capacity, DNA replication, cell cycle transition, migration and invasion, as well as inducing apoptosis of NSCLC cells. In addition, Xan suppressed NSCLC xenograft growth in vivo without obvious toxicity. Interestingly, bioinformatics analyses based on the RNA sequencing data indicated that Xan exerted inhibitory effects on NSCLC cells by down-regulating signals contributing to NSCLC progression and demonstrated that Xan was effective in ameliorating the prognosis of NSCLC patients with a new proposed prediction model. Moreover, Xan was shown to regulate cell cycle arrest, apoptosis, and epithelial-mesenchymal transition (EMT)-associated genes through downregulating PI3K-AKT signaling, thus suppressing NSCLC proliferation and metastasis. CONCLUSIONS Taken together, our work proved that Xan induced cell cycle arrest, facilitated apoptosis, and inhibited EMT processes through downregulating the PI3K-AKT pathway to suppress NSCLC progress. Moreover, we also proposed a new model for evaluating Xan as a novel and effective drug in NSCLC treatments.
Collapse
Affiliation(s)
- Xian Lin
- Department of Rheumatism and Immunology, Peking University Shenzhen Hospital, Shenzhen Peking University-The Hong Kong University of Science and Technology Medical Center, Shenzhen, Guangdong 518036, China; Shenzhen Key Laboratory of Inflammatory and Immunology Diseases, Shenzhen, Guangdong 518036, China
| | - Jingfeng Liu
- Department of Rheumatism and Immunology, Peking University Shenzhen Hospital, Shenzhen Peking University-The Hong Kong University of Science and Technology Medical Center, Shenzhen, Guangdong 518036, China; Shenzhen Key Laboratory of Inflammatory and Immunology Diseases, Shenzhen, Guangdong 518036, China
| | - Yujiao Zou
- Department of Radiation Oncology, Zhujiang Hospital, Southern Medical University, Guangzhou, Guangdong 510000, China
| | - Cheng Tao
- School of Pharmacy, Guangdong Medical University, Dongguan, Guangdong 523808, China
| | - Jian Chen
- Department of Rheumatism and Immunology, Peking University Shenzhen Hospital, Shenzhen Peking University-The Hong Kong University of Science and Technology Medical Center, Shenzhen, Guangdong 518036, China; Shenzhen Key Laboratory of Inflammatory and Immunology Diseases, Shenzhen, Guangdong 518036, China.
| |
Collapse
|
36
|
Wang Z, Wang Z, Huang Y, Lu L, Fu Y. A multi-view multi-omics model for cancer drug response prediction. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03294-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
37
|
Bernstam EV, Shireman PK, Meric‐Bernstam F, N. Zozus M, Jiang X, Brimhall BB, Windham AK, Schmidt S, Visweswaran S, Ye Y, Goodrum H, Ling Y, Barapatre S, Becich MJ. Artificial intelligence in clinical and translational science: Successes, challenges and opportunities. Clin Transl Sci 2022; 15:309-321. [PMID: 34706145 PMCID: PMC8841416 DOI: 10.1111/cts.13175] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Accepted: 10/01/2021] [Indexed: 01/12/2023] Open
Abstract
Artificial intelligence (AI) is transforming many domains, including finance, agriculture, defense, and biomedicine. In this paper, we focus on the role of AI in clinical and translational research (CTR), including preclinical research (T1), clinical research (T2), clinical implementation (T3), and public (or population) health (T4). Given the rapid evolution of AI in CTR, we present three complementary perspectives: (1) scoping literature review, (2) survey, and (3) analysis of federally funded projects. For each CTR phase, we addressed challenges, successes, failures, and opportunities for AI. We surveyed Clinical and Translational Science Award (CTSA) hubs regarding AI projects at their institutions. Nineteen of 63 CTSA hubs (30%) responded to the survey. The most common funding source (48.5%) was the federal government. The most common translational phase was T2 (clinical research, 40.2%). Clinicians were the intended users in 44.6% of projects and researchers in 32.3% of projects. The most common computational approaches were supervised machine learning (38.6%) and deep learning (34.2%). The number of projects steadily increased from 2012 to 2020. Finally, we analyzed 2604 AI projects at CTSA hubs using the National Institutes of Health Research Portfolio Online Reporting Tools (RePORTER) database for 2011-2019. We mapped available abstracts to medical subject headings and found that nervous system (16.3%) and mental disorders (16.2) were the most common topics addressed. From a computational perspective, big data (32.3%) and deep learning (30.0%) were most common. This work represents a snapshot in time of the role of AI in the CTSA program.
Collapse
Affiliation(s)
- Elmer V. Bernstam
- School of Biomedical InformaticsThe University of Texas Health Science Center at HoustonHoustonTexasUSA
- Division of General Internal MedicineDepartment of Internal MedicineMcGovern Medical SchoolThe University of Texas Health Science Center at HoustonHoustonTexasUSA
| | - Paula K. Shireman
- Departments of Surgery and MicrobiologyImmunology & Molecular GeneticsUniversity of Texas Health San AntonioSan AntonioTexasUSA
- University HealthSan AntonioTexasUSA
- South Texas Veterans Health Care SystemSan AntonioTexasUSA
| | - Funda Meric‐Bernstam
- Department of Investigational Cancer TherapeuticsThe University of Texas MD Anderson Cancer CenterHoustonTexasUSA
| | - Meredith N. Zozus
- Division of Clinical Research InformaticsDepartment of Population Health SciencesUniversity of Texas Health San AntonioSan AntonioTexasUSA
| | - Xiaoqian Jiang
- School of Biomedical InformaticsThe University of Texas Health Science Center at HoustonHoustonTexasUSA
| | - Bradley B. Brimhall
- University HealthSan AntonioTexasUSA
- Department of PathologyUniversity of Texas Health San AntonioSan AntonioTexasUSA
| | - Ashley K. Windham
- University HealthSan AntonioTexasUSA
- Department of PathologyUniversity of Texas Health San AntonioSan AntonioTexasUSA
| | - Susanne Schmidt
- Department of Population Health SciencesUniversity of Texas Health San AntonioSan AntonioTexasUSA
| | - Shyam Visweswaran
- Department of Biomedical InformaticsUniversity of Pittsburgh School of MedicinePittsburghPennsylvaniaUSA
| | - Ye Ye
- Department of Biomedical InformaticsUniversity of Pittsburgh School of MedicinePittsburghPennsylvaniaUSA
| | - Heath Goodrum
- School of Biomedical InformaticsThe University of Texas Health Science Center at HoustonHoustonTexasUSA
| | - Yaobin Ling
- School of Biomedical InformaticsThe University of Texas Health Science Center at HoustonHoustonTexasUSA
| | - Seemran Barapatre
- Department of Biomedical InformaticsUniversity of Pittsburgh School of MedicinePittsburghPennsylvaniaUSA
| | - Michael J. Becich
- Department of Biomedical InformaticsUniversity of Pittsburgh School of MedicinePittsburghPennsylvaniaUSA
| |
Collapse
|
38
|
Sienkiewicz K, Chen J, Chatrath A, Lawson JT, Sheffield NC, Zhang L, Ratan A. Detecting molecular subtypes from multi-omics datasets using SUMO. CELL REPORTS METHODS 2022; 2:100152. [PMID: 35211690 PMCID: PMC8865426 DOI: 10.1016/j.crmeth.2021.100152] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 08/27/2021] [Accepted: 12/21/2021] [Indexed: 12/31/2022]
Abstract
We present a data integration framework that uses non-negative matrix factorization of patient-similarity networks to integrate continuous multi-omics datasets for molecular subtyping. It is demonstrated to have the capability to handle missing data without using imputation and to be consistently among the best in detecting subtypes with differential prognosis and enrichment of clinical associations in a large number of cancers. When applying the approach to data from individuals with lower-grade gliomas, we identify a subtype with a significantly worse prognosis. Tumors assigned to this subtype are hypomethylated genome wide with a gain of AP-1 occupancy in demethylated distal enhancers. The tumors are also enriched for somatic chromosome 7 (chr7) gain, chr10 loss, and other molecular events that have been suggested as diagnostic markers for "IDH wild type, with molecular features of glioblastoma" by the cIMPACT-NOW consortium but have yet to be included in the World Health Organization (WHO) guidelines.
Collapse
Affiliation(s)
- Karolina Sienkiewicz
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
| | - Jinyu Chen
- Department of Mathematics and Computational Biology Program, National University of Singapore, Singapore 119076, Singapore
| | - Ajay Chatrath
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908, USA
| | - John T. Lawson
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | - Nathan C. Sheffield
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908, USA
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA 22908, USA
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
- University of Virginia Cancer Center, Charlottesville, VA 22908, USA
| | - Louxin Zhang
- Department of Mathematics and Computational Biology Program, National University of Singapore, Singapore 119076, Singapore
| | - Aakrosh Ratan
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
- University of Virginia Cancer Center, Charlottesville, VA 22908, USA
| |
Collapse
|
39
|
Xia F, Allen J, Balaprakash P, Brettin T, Garcia-Cardona C, Clyde A, Cohn J, Doroshow J, Duan X, Dubinkina V, Evrard Y, Fan YJ, Gans J, He S, Lu P, Maslov S, Partin A, Shukla M, Stahlberg E, Wozniak JM, Yoo H, Zaki G, Zhu Y, Stevens R. A cross-study analysis of drug response prediction in cancer cell lines. Brief Bioinform 2022; 23:bbab356. [PMID: 34524425 PMCID: PMC8769697 DOI: 10.1093/bib/bbab356] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 07/26/2021] [Accepted: 08/11/2021] [Indexed: 11/28/2022] Open
Abstract
To enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross-validation within a single study to assess model accuracy. While an essential first step, cross-validation within a biological data set typically provides an overly optimistic estimate of the prediction performance on independent test sets. To provide a more rigorous assessment of model generalizability between different studies, we use machine learning to analyze five publicly available cell line-based data sets: National Cancer Institute 60, ancer Therapeutics Response Portal (CTRP), Genomics of Drug Sensitivity in Cancer, Cancer Cell Line Encyclopedia and Genentech Cell Line Screening Initiative (gCSI). Based on observed experimental variability across studies, we explore estimates of prediction upper bounds. We report performance results of a variety of machine learning models, with a multitasking deep neural network achieving the best cross-study generalizability. By multiple measures, models trained on CTRP yield the most accurate predictions on the remaining testing data, and gCSI is the most predictable among the cell line data sets included in this study. With these experiments and further simulations on partial data, two lessons emerge: (1) differences in viability assays can limit model generalizability across studies and (2) drug diversity, more than tumor diversity, is crucial for raising model generalizability in preclinical screening.
Collapse
Affiliation(s)
| | | | | | | | | | - Austin Clyde
- Argonne National Laboratory
- University of Chicago
| | | | | | | | | | | | - Ya Ju Fan
- Lawrence Livermore National Laboratory
| | | | | | - Pinyi Lu
- Frederick National Laboratory for Cancer Research
| | | | | | | | | | | | | | - George Zaki
- Frederick National Laboratory for Cancer Research
| | | | - Rick Stevens
- Argonne National Laboratory
- University of Chicago
| |
Collapse
|
40
|
Firoozbakht F, Yousefi B, Schwikowski B. An overview of machine learning methods for monotherapy drug response prediction. Brief Bioinform 2022; 23:bbab408. [PMID: 34619752 PMCID: PMC8769705 DOI: 10.1093/bib/bbab408] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 08/25/2021] [Accepted: 09/06/2021] [Indexed: 12/11/2022] Open
Abstract
For an increasing number of preclinical samples, both detailed molecular profiles and their responses to various drugs are becoming available. Efforts to understand, and predict, drug responses in a data-driven manner have led to a proliferation of machine learning (ML) methods, with the longer term ambition of predicting clinical drug responses. Here, we provide a uniquely wide and deep systematic review of the rapidly evolving literature on monotherapy drug response prediction, with a systematic characterization and classification that comprises more than 70 ML methods in 13 subclasses, their input and output data types, modes of evaluation, and code and software availability. ML experts are provided with a fundamental understanding of the biological problem, and how ML methods are configured for it. Biologists and biomedical researchers are introduced to the basic principles of applicable ML methods, and their application to the problem of drug response prediction. We also provide systematic overviews of commonly used data sources used for training and evaluation methods.
Collapse
Affiliation(s)
- Farzaneh Firoozbakht
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
| | - Behnam Yousefi
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
- Sorbonne Université, École Doctorale Complexite du Vivant, Paris, France
| | - Benno Schwikowski
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
| |
Collapse
|
41
|
An X, Chen X, Yi D, Li H, Guan Y. Representation of molecules for drug response prediction. Brief Bioinform 2021; 23:6375515. [PMID: 34571534 DOI: 10.1093/bib/bbab393] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 08/28/2021] [Accepted: 08/30/2021] [Indexed: 12/18/2022] Open
Abstract
The rapid development of machine learning and deep learning algorithms in the recent decade has spurred an outburst of their applications in many research fields. In the chemistry domain, machine learning has been widely used to aid in drug screening, drug toxicity prediction, quantitative structure-activity relationship prediction, anti-cancer synergy score prediction, etc. This review is dedicated to the application of machine learning in drug response prediction. Specifically, we focus on molecular representations, which is a crucial element to the success of drug response prediction and other chemistry-related prediction tasks. We introduce three types of commonly used molecular representation methods, together with their implementation and application examples. This review will serve as a brief introduction of the broad field of molecular representations.
Collapse
Affiliation(s)
- Xin An
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Xi Chen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Daiyao Yi
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Hongyang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
42
|
Chen Y, Zhang L. How much can deep learning improve prediction of the responses to drugs in cancer cell lines? Brief Bioinform 2021; 23:6370847. [PMID: 34529029 DOI: 10.1093/bib/bbab378] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 08/21/2021] [Accepted: 08/24/2021] [Indexed: 12/24/2022] Open
Abstract
The drug response prediction problem arises from personalized medicine and drug discovery. Deep neural networks have been applied to the multi-omics data being available for over 1000 cancer cell lines and tissues for better drug response prediction. We summarize and examine state-of-the-art deep learning methods that have been published recently. Although significant progresses have been made in deep learning approach in drug response prediction, deep learning methods show their weakness for predicting the response of a drug that does not appear in the training dataset. In particular, all the five evaluated deep learning methods performed worst than the similarity-regularized matrix factorization (SRMF) method in our drug blind test. We outline the challenges in applying deep learning approach to drug response prediction and suggest unique opportunities for deep learning integrated with established bioinformatics analyses to overcome some of these challenges.
Collapse
Affiliation(s)
- Yurui Chen
- Department of Mathematics and Computational Biology Programme, National University of Singapore, 119076, Singapore
| | - Louxin Zhang
- Department of Mathematics and Computational Biology Programme, National University of Singapore, 119076, Singapore
| |
Collapse
|
43
|
Zuo Z, Wang P, Chen X, Tian L, Ge H, Qian D. SWnet: a deep learning model for drug response prediction from cancer genomic signatures and compound chemical structures. BMC Bioinformatics 2021; 22:434. [PMID: 34507532 PMCID: PMC8434731 DOI: 10.1186/s12859-021-04352-9] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 08/31/2021] [Indexed: 12/13/2022] Open
Abstract
Background One of the major challenges in precision medicine is accurate prediction of individual patient’s response to drugs. A great number of computational methods have been developed to predict compounds activity using genomic profiles or chemical structures, but more exploration is yet to be done to combine genetic mutation, gene expression, and cheminformatics in one machine learning model. Results We presented here a novel deep-learning model that integrates gene expression, genetic mutation, and chemical structure of compounds in a multi-task convolutional architecture. We applied our model to the Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) datasets. We selected relevant cancer-related genes based on oncology genetics database and L1000 landmark genes, and used their expression and mutations as genomic features in model training. We obtain the cheminformatics features for compounds from PubChem or ChEMBL. Our finding is that combining gene expression, genetic mutation, and cheminformatics features greatly enhances the predictive performance. Conclusion We implemented an extended Graph Neural Network for molecular graphs and Convolutional Neural Network for gene features. With the employment of multi-tasking and self-attention functions to monitor the similarity between compounds, our model outperforms recently published methods using the same training and testing datasets. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04352-9.
Collapse
Affiliation(s)
- Zhaorui Zuo
- Institute of Medical Robotics, Shanghai Jiao Tong University, 2F of the Translational Medicine Building, No. 800 Dongchuan Road, Shanghai, 200000, China
| | - Penglei Wang
- Institute of Medical Robotics, Shanghai Jiao Tong University, 2F of the Translational Medicine Building, No. 800 Dongchuan Road, Shanghai, 200000, China
| | - Xiaowei Chen
- Novartis Institutes for Biomedical Research, 4218 Jinke Road, Pudong, Shanghai, 201203, China
| | - Li Tian
- Novartis Institutes for Biomedical Research, 4218 Jinke Road, Pudong, Shanghai, 201203, China
| | - Hui Ge
- Novartis Institutes for Biomedical Research, 4218 Jinke Road, Pudong, Shanghai, 201203, China.
| | - Dahong Qian
- Institute of Medical Robotics, Shanghai Jiao Tong University, 2F of the Translational Medicine Building, No. 800 Dongchuan Road, Shanghai, 200000, China.
| |
Collapse
|
44
|
Feng F, Shen B, Mou X, Li Y, Li H. Large-scale pharmacogenomic studies and drug response prediction for personalized cancer medicine. J Genet Genomics 2021; 48:540-551. [PMID: 34023295 DOI: 10.1016/j.jgg.2021.03.007] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 03/26/2021] [Accepted: 03/28/2021] [Indexed: 12/26/2022]
Abstract
The response rate of most anti-cancer drugs is limited because of the high heterogeneity of cancer and the complex mechanism of drug action. Personalized treatment that stratifies patients into subgroups using molecular biomarkers is promising to improve clinical benefit. With the accumulation of preclinical models and advances in computational approaches of drug response prediction, pharmacogenomics has made great success over the last 20 years and is increasingly used in the clinical practice of personalized cancer medicine. In this article, we first summarize FDA-approved pharmacogenomic biomarkers and large-scale pharmacogenomic studies of preclinical cancer models such as patient-derived cell lines, organoids, and xenografts. Furthermore, we comprehensively review the recent developments of computational methods in drug response prediction, covering network, machine learning, and deep learning technologies and strategies to evaluate immunotherapy response. In the end, we discuss challenges and propose possible solutions for further improvement.
Collapse
Affiliation(s)
- Fangyoumin Feng
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Bihan Shen
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Xiaoqin Mou
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yixue Li
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China; Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 330106, China
| | - Hong Li
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| |
Collapse
|
45
|
Park Y, Heider D, Hauschild AC. Integrative Analysis of Next-Generation Sequencing for Next-Generation Cancer Research toward Artificial Intelligence. Cancers (Basel) 2021; 13:3148. [PMID: 34202427 PMCID: PMC8269018 DOI: 10.3390/cancers13133148] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 06/16/2021] [Accepted: 06/21/2021] [Indexed: 12/18/2022] Open
Abstract
The rapid improvement of next-generation sequencing (NGS) technologies and their application in large-scale cohorts in cancer research led to common challenges of big data. It opened a new research area incorporating systems biology and machine learning. As large-scale NGS data accumulated, sophisticated data analysis methods became indispensable. In addition, NGS data have been integrated with systems biology to build better predictive models to determine the characteristics of tumors and tumor subtypes. Therefore, various machine learning algorithms were introduced to identify underlying biological mechanisms. In this work, we review novel technologies developed for NGS data analysis, and we describe how these computational methodologies integrate systems biology and omics data. Subsequently, we discuss how deep neural networks outperform other approaches, the potential of graph neural networks (GNN) in systems biology, and the limitations in NGS biomedical research. To reflect on the various challenges and corresponding computational solutions, we will discuss the following three topics: (i) molecular characteristics, (ii) tumor heterogeneity, and (iii) drug discovery. We conclude that machine learning and network-based approaches can add valuable insights and build highly accurate models. However, a well-informed choice of learning algorithm and biological network information is crucial for the success of each specific research question.
Collapse
Affiliation(s)
- Youngjun Park
- Department of Mathematics and Computer Science, Philipps-University of Marburg, 35032 Marburg, Germany; (Y.P.); (D.H.)
| | - Dominik Heider
- Department of Mathematics and Computer Science, Philipps-University of Marburg, 35032 Marburg, Germany; (Y.P.); (D.H.)
| | - Anne-Christin Hauschild
- Department of Mathematics and Computer Science, Philipps-University of Marburg, 35032 Marburg, Germany; (Y.P.); (D.H.)
- Department of Medical Informatics, University Medical Center Göttingen, 37075 Göttingen, Germany
| |
Collapse
|
46
|
Majumdar A, Liu Y, Lu Y, Wu S, Cheng L. kESVR: An Ensemble Model for Drug Response Prediction in Precision Medicine Using Cancer Cell Lines Gene Expression. Genes (Basel) 2021; 12:genes12060844. [PMID: 34070793 PMCID: PMC8229729 DOI: 10.3390/genes12060844] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Revised: 05/25/2021] [Accepted: 05/28/2021] [Indexed: 12/02/2022] Open
Abstract
Background: Cancer cell lines are frequently used in research as in-vitro tumor models. Genomic data and large-scale drug screening have accelerated the right drug selection for cancer patients. Accuracy in drug response prediction is crucial for success. Due to data-type diversity and big data volume, few methods can integrative and efficiently find the principal low-dimensional manifold of the high-dimensional cancer multi-omics data to predict drug response in precision medicine. Method: A novelty k-means Ensemble Support Vector Regression (kESVR) is developed to predict each drug response values for single patient based on cell-line gene expression data. The kESVR is a blend of supervised and unsupervised learning methods and is entirely data driven. It utilizes embedded clustering (Principal Component Analysis and k-means clustering) and local regression (Support Vector Regression) to predict drug response and obtain the global pattern while overcoming missing data and outliers’ noise. Results: We compared the efficiency and accuracy of kESVR to 4 standard machine learning regression models: (1) simple linear regression, (2) support vector regression (3) random forest (quantile regression forest) and (4) back propagation neural network. Our results, which based on drug response across 610 cancer cells from Cancer Cell Line Encyclopedia (CCLE) and Cancer Therapeutics Response Portal (CTRP v2), proved to have the highest accuracy (smallest mean squared error (MSE) measure). We next compared kESVR with existing 17 drug response prediction models based a varied range of methods such as regression, Bayesian inference, matrix factorization and deep learning. After ranking the 18 models based on their accuracy of prediction, kESVR ranks first (best performing) in majority (74%) of the time. As for the remaining (26%) cases, kESVR still ranked in the top five performing models. Conclusion: In this paper we introduce a novel model (kESVR) for drug response prediction using high dimensional cell-line gene expression data. This model outperforms current existing prediction models in terms of prediction accuracy and speed and overcomes overfitting. This can be used in future to develop a robust drug response prediction system for cancer patients using the cancer cell-lines guidance and multi-omics data.
Collapse
Affiliation(s)
- Abhishek Majumdar
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA; (A.M.); (S.W.)
| | - Yueze Liu
- The Grainger College of Engineering, The University of Illinois Urbana-Champaign, Urbana and Champaign, Champaign, IL 61801, USA;
| | - Yaoqin Lu
- Department of Occupational and Environmental Health, School of Public Health, XinJiang Medical University, Urumqi 830011, China;
| | - Shaofeng Wu
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA; (A.M.); (S.W.)
| | - Lijun Cheng
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA; (A.M.); (S.W.)
- Correspondence:
| |
Collapse
|
47
|
Tanoli Z, Vähä-Koskela M, Aittokallio T. Artificial intelligence, machine learning, and drug repurposing in cancer. Expert Opin Drug Discov 2021; 16:977-989. [PMID: 33543671 DOI: 10.1080/17460441.2021.1883585] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Introduction: Drug repurposing provides a cost-effective strategy to re-use approved drugs for new medical indications. Several machine learning (ML) and artificial intelligence (AI) approaches have been developed for systematic identification of drug repurposing leads based on big data resources, hence further accelerating and de-risking the drug development process by computational means.Areas covered: The authors focus on supervised ML and AI methods that make use of publicly available databases and information resources. While most of the example applications are in the field of anticancer drug therapies, the methods and resources reviewed are widely applicable also to other indications including COVID-19 treatment. A particular emphasis is placed on the use of comprehensive target activity profiles that enable a systematic repurposing process by extending the target profile of drugs to include potent off-targets with therapeutic potential for a new indication.Expert opinion: The scarcity of clinical patient data and the current focus on genetic aberrations as primary drug targets may limit the performance of anticancer drug repurposing approaches that rely solely on genomics-based information. Functional testing of cancer patient cells exposed to a large number of targeted therapies and their combinations provides an additional source of repurposing information for tissue-aware AI approaches.
Collapse
Affiliation(s)
- Ziaurrehman Tanoli
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLife, University of Helsinki, Helsinki, Finland
| | - Markus Vähä-Koskela
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLife, University of Helsinki, Helsinki, Finland
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLife, University of Helsinki, Helsinki, Finland.,Institute for Cancer Research, Department of Cancer Genetics, Oslo University Hospital, Oslo, Norway.,Centre for Biostatistics and Epidemiology (OCBE), Faculty of Medicine, University of Oslo, Oslo, Norway
| |
Collapse
|
48
|
Daoud S, Mdhaffar A, Jmaiel M, Freisleben B. Q-Rank: Reinforcement Learning for Recommending Algorithms to Predict Drug Sensitivity to Cancer Therapy. IEEE J Biomed Health Inform 2020; 24:3154-3161. [PMID: 32750950 DOI: 10.1109/jbhi.2020.3004663] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
In personalized medicine, a challenging task is to identify the most effective treatment for a patient. In oncology, several computational models have been developed to predict the response of drugs to therapy. However, the performance of these models depends on multiple factors. This paper presents a new approach, called Q-Rank, to predict the sensitivity of cell lines to anti-cancer drugs. Q-Rank integrates different prediction algorithms and identifies a suitable algorithm for a given application. Q-Rank is based on reinforcement learning methods to rank prediction algorithms on the basis of relevant features (e.g., omics characterization). The best-ranked algorithm is recommended and used to predict the response of drugs to therapy. Our experimental results indicate that Q-Rank outperforms the integrated models in predicting the sensitivity of cell lines to different drugs.
Collapse
|
49
|
Kim YA, Sarto Basso R, Wojtowicz D, Liu AS, Hochbaum DS, Vandin F, Przytycka TM. Identifying Drug Sensitivity Subnetworks with NETPHIX. iScience 2020; 23:101619. [PMID: 33089107 PMCID: PMC7566085 DOI: 10.1016/j.isci.2020.101619] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Revised: 09/08/2020] [Accepted: 09/24/2020] [Indexed: 12/29/2022] Open
Abstract
Phenotypic heterogeneity in cancer is often caused by different patterns of genetic alterations. Understanding such phenotype-genotype relationships is fundamental for the advance of personalized medicine. We develop a computational method, named NETPHIX (NETwork-to-PHenotype association with eXclusivity) to identify subnetworks of genes whose genetic alterations are associated with drug response or other continuous cancer phenotypes. Leveraging interaction information among genes and properties of cancer mutations such as mutual exclusivity, we formulate the problem as an integer linear program and solve it optimally to obtain a subnetwork of associated genes. Applied to a large-scale drug screening dataset, NETPHIX uncovered gene modules significantly associated with drug responses. Utilizing interaction information, NETPHIX modules are functionally coherent and can thus provide important insights into drug action. In addition, we show that modules identified by NETPHIX together with their association patterns can be leveraged to suggest drug combinations.
Collapse
Affiliation(s)
- Yoo-Ah Kim
- National Center of Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD 20894, USA
| | - Rebecca Sarto Basso
- Department of Industrial Engineering and Operations Research, University of California at Berkeley, Berkeley, CA 94709, USA
| | - Damian Wojtowicz
- National Center of Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD 20894, USA
| | - Amanda S Liu
- Montgomery Blair High School, Silver Spring, MD 20901, USA
| | - Dorit S Hochbaum
- Department of Industrial Engineering and Operations Research, University of California at Berkeley, Berkeley, CA 94709, USA
| | - Fabio Vandin
- Department of Information Engineering, University of Padova, Padova 35131, Italy
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD 20894, USA
| |
Collapse
|