1
|
Matboli M, Al-Amodi HS, Khaled A, Khaled R, Ali M, Kamel HFM, Hamid MSAEL, ELsawi HA, Habib EK, Youssef I. Integrating molecular, biochemical, and immunohistochemical features as predictors of hepatocellular carcinoma drug response using machine-learning algorithms. Front Mol Biosci 2024; 11:1430794. [PMID: 39479501 PMCID: PMC11521808 DOI: 10.3389/fmolb.2024.1430794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Accepted: 09/27/2024] [Indexed: 11/02/2024] Open
Abstract
Introduction Liver cancer, particularly Hepatocellular carcinoma (HCC), remains a significant global health concern due to its high prevalence and heterogeneous nature. Despite the existence of approved drugs for HCC treatment, the scarcity of predictive biomarkers limits their effective utilization. Integrating diverse data types to revolutionize drug response prediction, ultimately enabling personalized HCC management. Method In this study, we developed multiple supervised machine learning models to predict treatment response. These models utilized classifiers such as logistic regression (LR), k-nearest neighbors (kNN), neural networks (NN), support vector machines (SVM), and random forests (RF) using a comprehensive set of molecular, biochemical, and immunohistochemical features as targets of three drugs: Pantoprazole, Cyanidin 3-glycoside (Cyan), and Hesperidin. A set of performance metrics for the complete and reduced models were reported including accuracy, precision, recall (sensitivity), specificity, and the Matthews Correlation Coefficient (MCC). Results and Discussion Notably, (NN) achieved the best prediction accuracy where the combined model using molecular and biochemical features exhibited exceptional predictive power, achieving solid accuracy of 0.9693 ∓ 0.0105 and average area under the ROC curve (AUC) of 0.94 ∓ 0.06 coming from three cross-validation iterations. Also, found seven molecular features, seven biochemical features, and one immunohistochemistry feature as promising biomarkers of treatment response. This comprehensive method has the potential to significantly advance personalized HCC therapy by allowing for more precise drug response estimation and assisting in the identification of effective treatment strategies.
Collapse
Affiliation(s)
- Marwa Matboli
- Medical Biochemistry and Molecular Biology Department, Faculty of Medicine, Ain Shams University, Cairo, Egypt
- Faculty of Oral and Dental Medicine, Misr International University (MIU), Cairo, Egypt
| | - Hiba S. Al-Amodi
- Biochemistry Department, Faculty of Medicine, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Abdelrahman Khaled
- Bioinformatics Group, Center of Informatics Sciences (CIS), School of Information Technology and Computer Sciences, Nile University, Giza, Egypt
| | - Radwa Khaled
- Biotechnology/Biomolecular Chemistry Department, Faculty of Science, Cairo University, Giza, Egypt
| | - Marwa Ali
- Medical Biochemistry and Molecular Biology Department, Faculty of Medicine, Ain Shams University, Cairo, Egypt
| | - Hala F. M. Kamel
- Medical Biochemistry and Molecular Biology Department, Faculty of Medicine, Ain Shams University, Cairo, Egypt
- Biochemistry Department, Faculty of Medicine, Umm Al-Qura University, Makkah, Saudi Arabia
| | | | - Hind A. ELsawi
- Department of Internal Medicine, Badr University in Cairo, Badr, Egypt
| | - Eman K. Habib
- Department of Anatomy and Cell Biology, Faculty of Medicine, Ain Shams University, Cairo, Egypt
- Department of Anatomy and Cell Biology, Faculty of Medicine, Galala University, Suez, Egypt
| | - Ibrahim Youssef
- Systems and Biomedical Engineering Department, Faculty of Engineering, Cairo University, Giza, Egypt
| |
Collapse
|
2
|
Cao C, Zhao H, Wang J. BANDRP: a bilinear attention network for anti-cancer drug response prediction based on fingerprint and multi-omics. Brief Bioinform 2024; 25:bbae493. [PMID: 39406520 PMCID: PMC11479717 DOI: 10.1093/bib/bbae493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Revised: 09/04/2024] [Accepted: 09/23/2024] [Indexed: 10/20/2024] Open
Abstract
Predicting anti-cancer drug response can help with personalized cancer treatment and is an important topic in modern oncology research. Although some methods have been used for anti-cancer drug response prediction, how to effectively integrate various features related to cancer cell lines, drugs, and their known responses is still affected by the redundant information of input features and the complex interactions between features. In this study, we propose a bilinear attention model, named BANDRP, based on multiple omics data of cancer cell lines and multiple molecular fingerprints of drugs to predict potential anti-cancer drug responses. Compared with existing models, BANDRP uses gene expression data to calculate pathway enrichment scores to enrich the features of cancer cell lines and can automatically learn the interactive information of cancer cell lines and drugs through bilinear attention networks. Benchmarking and independent tests demonstrate that BANDRP surpasses baseline models and exhibits robust generalization performance. Ablation experiments affirm the optimality of the current model architecture and feature selection scheme for our prediction task. Furthermore, analytical experiments and case studies on unknown anti-cancer drug response predictions underscore BANDRP's potential as a potent and reliable framework for predicting anti-cancer drug response.
Collapse
Affiliation(s)
- Cheng Cao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Haochen Zhao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
3
|
Lenhof K, Eckhart L, Rolli LM, Lenhof HP. Trust me if you can: a survey on reliability and interpretability of machine learning approaches for drug sensitivity prediction in cancer. Brief Bioinform 2024; 25:bbae379. [PMID: 39101498 PMCID: PMC11299037 DOI: 10.1093/bib/bbae379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 07/08/2024] [Accepted: 07/19/2024] [Indexed: 08/06/2024] Open
Abstract
With the ever-increasing number of artificial intelligence (AI) systems, mitigating risks associated with their use has become one of the most urgent scientific and societal issues. To this end, the European Union passed the EU AI Act, proposing solution strategies that can be summarized under the umbrella term trustworthiness. In anti-cancer drug sensitivity prediction, machine learning (ML) methods are developed for application in medical decision support systems, which require an extraordinary level of trustworthiness. This review offers an overview of the ML landscape of methods for anti-cancer drug sensitivity prediction, including a brief introduction to the four major ML realms (supervised, unsupervised, semi-supervised, and reinforcement learning). In particular, we address the question to what extent trustworthiness-related properties, more specifically, interpretability and reliability, have been incorporated into anti-cancer drug sensitivity prediction methods over the previous decade. In total, we analyzed 36 papers with approaches for anti-cancer drug sensitivity prediction. Our results indicate that the need for reliability has hardly been addressed so far. Interpretability, on the other hand, has often been considered for model development. However, the concept is rather used intuitively, lacking clear definitions. Thus, we propose an easily extensible taxonomy for interpretability, unifying all prevalent connotations explicitly or implicitly used within the field.
Collapse
Affiliation(s)
- Kerstin Lenhof
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, D-66123 Saarbrücken, Saarland, Germany
| | - Lea Eckhart
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, D-66123 Saarbrücken, Saarland, Germany
| | - Lisa-Marie Rolli
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, D-66123 Saarbrücken, Saarland, Germany
| | - Hans-Peter Lenhof
- Center for Bioinformatics, Chair for Bioinformatics, Saarland Informatics Campus (E2.1) Saarland University, Campus, D-66123 Saarbrücken, Saarland, Germany
| |
Collapse
|
4
|
Eckhart L, Lenhof K, Rolli LM, Lenhof HP. A comprehensive benchmarking of machine learning algorithms and dimensionality reduction methods for drug sensitivity prediction. Brief Bioinform 2024; 25:bbae242. [PMID: 38797968 PMCID: PMC11128483 DOI: 10.1093/bib/bbae242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 04/05/2024] [Accepted: 05/06/2024] [Indexed: 05/29/2024] Open
Abstract
A major challenge of precision oncology is the identification and prioritization of suitable treatment options based on molecular biomarkers of the considered tumor. In pursuit of this goal, large cancer cell line panels have successfully been studied to elucidate the relationship between cellular features and treatment response. Due to the high dimensionality of these datasets, machine learning (ML) is commonly used for their analysis. However, choosing a suitable algorithm and set of input features can be challenging. We performed a comprehensive benchmarking of ML methods and dimension reduction (DR) techniques for predicting drug response metrics. Using the Genomics of Drug Sensitivity in Cancer cell line panel, we trained random forests, neural networks, boosting trees and elastic nets for 179 anti-cancer compounds with feature sets derived from nine DR approaches. We compare the results regarding statistical performance, runtime and interpretability. Additionally, we provide strategies for assessing model performance compared with a simple baseline model and measuring the trade-off between models of different complexity. Lastly, we show that complex ML models benefit from using an optimized DR strategy, and that standard models-even when using considerably fewer features-can still be superior in performance.
Collapse
Affiliation(s)
- Lea Eckhart
- Center for Bioinformatics, Saarland Informatics Campus, Saarland University, 66123, Saarland, Germany
| | - Kerstin Lenhof
- Center for Bioinformatics, Saarland Informatics Campus, Saarland University, 66123, Saarland, Germany
| | - Lisa-Marie Rolli
- Center for Bioinformatics, Saarland Informatics Campus, Saarland University, 66123, Saarland, Germany
| | - Hans-Peter Lenhof
- Center for Bioinformatics, Saarland Informatics Campus, Saarland University, 66123, Saarland, Germany
| |
Collapse
|
5
|
Chen L, Wang X, Ban T, Usman M, Liu S, Lyu D, Chen H. Research Ideas Discovery via Hierarchical Negative Correlation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:1639-1650. [PMID: 35767488 DOI: 10.1109/tnnls.2022.3184498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
A new research idea may be inspired by the connections of keywords. Link prediction discovers potential nonexisting links in an existing graph and has been applied in many applications. This article explores a method of discovering new research ideas based on link prediction, which predicts the possible connections of different keywords by analyzing the topological structure of the keyword graph. The patterns of links between keywords may be diversified due to different domains and different habits of authors. Therefore, it is often difficult for a single learner to extract diverse patterns of different research domains. To address this issue, groups of learners are organized with negative correlation to encourage the diversity of sublearners. Moreover, a hierarchical negative correlation mechanism is proposed to extract subgraph features in different order subgraphs, which improves the diversity by explicitly supervising the negative correlation on each layer of sublearners. Experiments are conducted to illustrate the effectiveness of the proposed model to discover new research ideas. Under the premise of ensuring the performance of the model, the proposed method consumes less time and computational cost compared with other ensemble methods.
Collapse
|
6
|
Li Y, Guo Z, Gao X, Wang G. MMCL-CDR: enhancing cancer drug response prediction with multi-omics and morphology images contrastive representation learning. Bioinformatics 2023; 39:btad734. [PMID: 38070154 PMCID: PMC10756335 DOI: 10.1093/bioinformatics/btad734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 11/09/2023] [Indexed: 12/30/2023] Open
Abstract
MOTIVATION Cancer is a complex disease that results in a significant number of global fatalities. Treatment strategies can vary among patients, even if they have the same type of cancer. The application of precision medicine in cancer shows promise for treating different types of cancer, reducing healthcare expenses, and improving recovery rates. To achieve personalized cancer treatment, machine learning models have been developed to predict drug responses based on tumor and drug characteristics. However, current studies either focus on constructing homogeneous networks from single data source or heterogeneous networks from multiomics data. While multiomics data have shown potential in predicting drug responses in cancer cell lines, there is still a lack of research that effectively utilizes insights from different modalities. Furthermore, effectively utilizing the multimodal knowledge of cancer cell lines poses a challenge due to the heterogeneity inherent in these modalities. RESULTS To address these challenges, we introduce MMCL-CDR (Multimodal Contrastive Learning for Cancer Drug Responses), a multimodal approach for cancer drug response prediction that integrates copy number variation, gene expression, morphology images of cell lines, and chemical structure of drugs. The objective of MMCL-CDR is to align cancer cell lines across different data modalities by learning cell line representations from omic and image data, and combined with structural drug representations to enhance the prediction of cancer drug responses (CDR). We have carried out comprehensive experiments and show that our model significantly outperforms other state-of-the-art methods in CDR prediction. The experimental results also prove that the model can learn more accurate cell line representation by integrating multiomics and morphological data from cell lines, thereby improving the accuracy of CDR prediction. In addition, the ablation study and qualitative analysis also confirm the effectiveness of each part of our proposed model. Last but not least, MMCL-CDR opens up a new dimension for cancer drug response prediction through multimodal contrastive learning, pioneering a novel approach that integrates multiomics and multimodal drug and cell line modeling. AVAILABILITY AND IMPLEMENTATION MMCL-CDR is available at https://github.com/catly/MMCL-CDR.
Collapse
Affiliation(s)
- Yang Li
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150006, China
| | - Zihou Guo
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150006, China
| | - Xin Gao
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150006, China
| |
Collapse
|
7
|
Liu Y, Lyu X, Yang B, Fang Z, Hu D, Shi L, Wu B, Tian Y, Zhang E, Yang Y. Early Triage of Critically Ill Adult Patients With Mushroom Poisoning: Machine Learning Approach. JMIR Form Res 2023; 7:e44666. [PMID: 36943366 PMCID: PMC10131621 DOI: 10.2196/44666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 02/23/2023] [Accepted: 02/23/2023] [Indexed: 03/23/2023] Open
Abstract
BACKGROUND Early triage of patients with mushroom poisoning is essential for administering precise treatment and reducing mortality. To our knowledge, there has been no established method to triage patients with mushroom poisoning based on clinical data. OBJECTIVE The purpose of this work was to construct a triage system to identify patients with mushroom poisoning based on clinical indicators using several machine learning approaches and to assess the prediction accuracy of these strategies. METHODS In all, 567 patients were collected from 5 primary care hospitals and facilities in Enshi, Hubei Province, China, and divided into 2 groups; 322 patients from 2 hospitals were used as the training cohort, and 245 patients from 3 hospitals were used as the test cohort. Four machine learning algorithms were used to construct the triage model for patients with mushroom poisoning. Performance was assessed using the area under the receiver operating characteristic curve (AUC), decision curve, sensitivity, specificity, and other representative statistics. Feature contributions were evaluated using Shapley additive explanations. RESULTS Among several machine learning algorithms, extreme gradient boosting (XGBoost) showed the best discriminative ability in 5-fold cross-validation (AUC=0.83, 95% CI 0.77-0.90) and the test set (AUC=0.90, 95% CI 0.83-0.96). In the test set, the XGBoost model had a sensitivity of 0.93 (95% CI 0.81-0.99) and a specificity of 0.79 (95% CI 0.73-0.85), whereas the physicians' assessment had a sensitivity of 0.86 (95% CI 0.72-0.95) and a specificity of 0.66 (95% CI 0.59-0.73). CONCLUSIONS The 14-factor XGBoost model for the early triage of mushroom poisoning can rapidly and accurately identify critically ill patients and will possibly serve as an important basis for the selection of treatment options and referral of patients, potentially reducing patient mortality and improving clinical outcomes.
Collapse
Affiliation(s)
- Yuxuan Liu
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China
| | - Xiaoguang Lyu
- Department of Gastroenterology, Renmin Hospital of Wuhan University, Wuhan, China
| | - Bo Yang
- Department of Internal Medicine, Renmin Hospital of Xianfeng, Enshi, China
| | - Zhixiang Fang
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China
| | - Dejun Hu
- Department of Internal Medicine, Renmin Hospital of Xianfeng, Enshi, China
| | - Lei Shi
- Department of Nephrology, Minda Hospital of Hubei Minzu University, Enshi, China
| | - Bisheng Wu
- Department of General Surgery, Renmin Hospital of Xianfeng, Enshi, China
| | - Yong Tian
- Department of Internal Medicine, Renmin Hospital of Laifeng, Enshi, China
| | - Enli Zhang
- Department of General Surgery, Central Hospital of Hefeng, Enshi, China
| | - YuanChao Yang
- Department of Gastroenterology, Renmin Hospital of Xuanen, Enshi, China
| |
Collapse
|
8
|
Lee K, Cho D, Jang J, Choi K, Jeong HO, Seo J, Jeong WK, Lee S. RAMP: response-aware multi-task learning with contrastive regularization for cancer drug response prediction. Brief Bioinform 2023; 24:6865135. [PMID: 36460623 DOI: 10.1093/bib/bbac504] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 10/13/2022] [Accepted: 10/24/2022] [Indexed: 12/05/2022] Open
Abstract
The accurate prediction of cancer drug sensitivity according to the multiomics profiles of individual patients is crucial for precision cancer medicine. However, the development of prediction models has been challenged by the complex crosstalk of input features and the resistance-dominant drug response information contained in public databases. In this study, we propose a novel multidrug response prediction framework, response-aware multitask prediction (RAMP), via a Bayesian neural network and restrict it by soft-supervised contrastive regularization. To utilize network embedding vectors as representation learning features for heterogeneous networks, we harness response-aware negative sampling, which applies cell line-drug response information to the training of network embeddings. RAMP overcomes the prediction accuracy limitation induced by the imbalance of trained response data based on the comprehensive selection and utilization of drug response features. When trained on the Genomics of Drug Sensitivity in Cancer dataset, RAMP achieved an area under the receiver operating characteristic curve > 89%, an area under the precision-recall curve > 59% and an $\textrm{F}_1$ score > 52% and outperformed previously developed methods on both balanced and imbalanced datasets. Furthermore, RAMP predicted many missing drug responses that were not included in the public databases. Our results showed that RAMP will be suitable for the high-throughput prediction of cancer drug sensitivity and will be useful for guiding cancer drug selection processes. The Python implementation for RAMP is available at https://github.com/hvcl/RAMP.
Collapse
Affiliation(s)
- Kanggeun Lee
- Department of Computer Science and Engineering at Korea University
| | - Dongbin Cho
- Department of Computer Science at Hanyang University
| | - Jinho Jang
- Department of Biomedical Engineering at UNIST
| | - Kang Choi
- Department of Computer Science at Hanyang University
| | | | - Jiwon Seo
- Department of Computer Science at Hanyang University
| | - Won-Ki Jeong
- Department of Computer Science and Engineering at Korea University
| | - Semin Lee
- Department of Biomedical Engineering at UNIST
| |
Collapse
|
9
|
Wang S, Wang S, Wang Z. A survey on multi-omics-based cancer diagnosis using machine learning with the potential application in gastrointestinal cancer. Front Med (Lausanne) 2023; 9:1109365. [PMID: 36703893 PMCID: PMC9871466 DOI: 10.3389/fmed.2022.1109365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Accepted: 12/28/2022] [Indexed: 01/12/2023] Open
Abstract
Gastrointestinal cancer is becoming increasingly common, which leads to over 3 million deaths every year. No typical symptoms appear in the early stage of gastrointestinal cancer, posing a significant challenge in the diagnosis and treatment of patients with gastrointestinal cancer. Many patients are in the middle and late stages of gastrointestinal cancer when they feel uncomfortable, unfortunately, most of them will die of gastrointestinal cancer. Recently, various artificial intelligence techniques like machine learning based on multi-omics have been presented for cancer diagnosis and treatment in the era of precision medicine. This paper provides a survey on multi-omics-based cancer diagnosis using machine learning with potential application in gastrointestinal cancer. Particularly, we make a comprehensive summary and analysis from the perspective of multi-omics datasets, task types, and multi-omics-based integration methods. Furthermore, this paper points out the remaining challenges of multi-omics-based cancer diagnosis using machine learning and discusses future topics.
Collapse
Affiliation(s)
- Suixue Wang
- School of Information and Communication Engineering, Hainan University, Haikou, China
| | - Shuling Wang
- Department of Neurology, Affiliated Haikou Hospital of Xiangya School of Medicine, Central South University, Haikou, China
| | - Zhengxia Wang
- School of Computer Science and Technology, Hainan University, Haikou, China
| |
Collapse
|
10
|
Chen YH, Shih YT, Chien CS, Tsai CS. Predicting adverse drug effects: A heterogeneous graph convolution network with a multi-layer perceptron approach. PLoS One 2022; 17:e0266435. [PMID: 36516131 PMCID: PMC9750037 DOI: 10.1371/journal.pone.0266435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Accepted: 11/19/2022] [Indexed: 12/15/2022] Open
Abstract
We apply a heterogeneous graph convolution network (GCN) combined with a multi-layer perceptron (MLP) denoted by GCNMLP to explore the potential side effects of drugs. Here the SIDER, OFFSIDERS, and FAERS are used as the datasets. We integrate the drug information with similar characteristics from the datasets of known drugs and side effect networks. The heterogeneous graph networks explore the potential side effects of drugs by inferring the relationship between similar drugs and related side effects. This novel in silico method will shorten the time spent in uncovering the unseen side effects within routine drug prescriptions while highlighting the relevance of exploring drug mechanisms from well-documented drugs. In our experiments, we inquire about the drugs Vancomycin, Amlodipine, Cisplatin, and Glimepiride from a trained model, where the parameters are acquired from the dataset SIDER after training. Our results show that the performance of the GCNMLP on these three datasets is superior to the non-negative matrix factorization method (NMF) and some well-known machine learning methods with respect to various evaluation scales. Moreover, new side effects of drugs can be obtained using the GCNMLP.
Collapse
Affiliation(s)
- Y.-H. Chen
- Dept. of Nephrology, Taichung Tzu Chi Hospital, Taichung, Taiwan
- School of Medicine, Tzu Chi University, Hualien, Taiwan
| | - Y.-T. Shih
- Dept. of Applied Mathematics, National Chung Hsing University, Taichung, Taiwan
| | - C.-S. Chien
- Dept. of Applied Mathematics, National Chung Hsing University, Taichung, Taiwan
| | - C.-S. Tsai
- Dept. of Management of Information Systems, National Chung Hsing University, Taichung, Taiwan
| |
Collapse
|
11
|
Xie M, Lei X, Zhong J, Ouyang J, Li G. Drug response prediction using graph representation learning and Laplacian feature selection. BMC Bioinformatics 2022; 23:532. [PMID: 36494630 PMCID: PMC9733001 DOI: 10.1186/s12859-022-05080-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Accepted: 11/22/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Knowing the responses of a patient to drugs is essential to make personalized medicine practical. Since the current clinical drug response experiments are time-consuming and expensive, utilizing human genomic information and drug molecular characteristics to predict drug responses is of urgent importance. Although a variety of computational drug response prediction methods have been proposed, their effectiveness is still not satisfying. RESULTS In this study, we propose a method called LGRDRP (Learning Graph Representation for Drug Response Prediction) to predict cell line-drug responses. At first, LGRDRP constructs a heterogeneous network integrating multiple kinds of information: cell line miRNA expression profiles, drug chemical structure similarity, gene-gene interaction, cell line-gene interaction and known cell line-drug responses. Then, for each cell line, learning graph representation and Laplacian feature selection are combined to obtain network topology features related to the cell line. The learning graph representation method learns network topology structure features, and the Laplacian feature selection method further selects out some most important ones from them. Finally, LGRDRP trains an SVM model to predict drug responses based on the selected features of the known cell line-drug responses. Our five-fold cross-validation results show that LGRDRP is significantly superior to the art-of-the-state methods in the measures of the average area under the receiver operating characteristics curve, the average area under the precision-recall curve and the recall rate of top-k predicted sensitive cell lines. CONCLUSIONS Our results demonstrated that the usage of multiple types of information about cell lines and drugs, the learning graph representation method, and the Laplacian feature selection is useful to the improvement of performance in predicting drug responses. We believe that such an approach would be easily extended to similar problems such as miRNA-disease relationship inference.
Collapse
Affiliation(s)
- Minzhu Xie
- grid.411427.50000 0001 0089 3695College of Information Science and Engineering, Hunan Normal University, Changsha, China ,grid.411427.50000 0001 0089 3695Key Laboratory of Computing and Stochastic Mathematics (LCSM) (Ministry of Education), School of Mathematics and Statistics, Hunan Normal University, Changsha, China
| | - Xiaowen Lei
- grid.411427.50000 0001 0089 3695College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Jianchen Zhong
- grid.411427.50000 0001 0089 3695College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Jianxing Ouyang
- grid.411427.50000 0001 0089 3695College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Guijing Li
- grid.411427.50000 0001 0089 3695College of Information Science and Engineering, Hunan Normal University, Changsha, China
| |
Collapse
|
12
|
Hiort P, Hugo J, Zeinert J, Müller N, Kashyap S, Rajapakse JC, Azuaje F, Renard BY, Baum K. DrDimont: explainable drug response prediction from differential analysis of multi-omics networks. Bioinformatics 2022; 38:ii113-ii119. [PMID: 36124784 PMCID: PMC9486584 DOI: 10.1093/bioinformatics/btac477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION While it has been well established that drugs affect and help patients differently, personalized drug response predictions remain challenging. Solutions based on single omics measurements have been proposed, and networks provide means to incorporate molecular interactions into reasoning. However, how to integrate the wealth of information contained in multiple omics layers still poses a complex problem. RESULTS We present DrDimont, Drug response prediction from Differential analysis of multi-omics networks. It allows for comparative conclusions between two conditions and translates them into differential drug response predictions. DrDimont focuses on molecular interactions. It establishes condition-specific networks from correlation within an omics layer that are then reduced and combined into heterogeneous, multi-omics molecular networks. A novel semi-local, path-based integration step ensures integrative conclusions. Differential predictions are derived from comparing the condition-specific integrated networks. DrDimont's predictions are explainable, i.e. molecular differences that are the source of high differential drug scores can be retrieved. We predict differential drug response in breast cancer using transcriptomics, proteomics, phosphosite and metabolomics measurements and contrast estrogen receptor positive and receptor negative patients. DrDimont performs better than drug prediction based on differential protein expression or PageRank when evaluating it on ground truth data from cancer cell lines. We find proteomic and phosphosite layers to carry most information for distinguishing drug response. AVAILABILITY AND IMPLEMENTATION DrDimont is available on CRAN: https://cran.r-project.org/package=DrDimont. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pauline Hiort
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| | - Julian Hugo
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| | - Justus Zeinert
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| | - Nataniel Müller
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| | - Spoorthi Kashyap
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| | - Jagath C Rajapakse
- School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore
| | | | - Bernhard Y Renard
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| | | |
Collapse
|
13
|
Hu C, Xu Y, Li F, Mi W, Yu H, Wang X, Wen X, Chen S, Li X, Xu Y, Zhang Y. Identifying and characterizing drug sensitivity-related lncRNA-TF-gene regulatory triplets. Brief Bioinform 2022; 23:6675752. [PMID: 36007239 PMCID: PMC9487635 DOI: 10.1093/bib/bbac366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 06/19/2022] [Accepted: 08/06/2022] [Indexed: 11/15/2022] Open
Abstract
Recently, many studies have shown that lncRNA can mediate the regulation of TF-gene in drug sensitivity. However, there is still a lack of systematic identification of lncRNA-TF-gene regulatory triplets for drug sensitivity. In this study, we propose a novel analytic approach to systematically identify the lncRNA-TF-gene regulatory triplets related to the drug sensitivity by integrating transcriptome data and drug sensitivity data. Totally, 1570 drug sensitivity-related lncRNA-TF-gene triplets were identified, and 16 307 relationships were formed between drugs and triplets. Then, a comprehensive characterization was performed. Drug sensitivity-related triplets affect a variety of biological functions including drug response-related pathways. Phenotypic similarity analysis showed that the drugs with many shared triplets had high similarity in their two-dimensional structures and indications. In addition, Network analysis revealed the diverse regulation mechanism of lncRNAs in different drugs. Also, survival analysis indicated that lncRNA-TF-gene triplets related to the drug sensitivity could be candidate prognostic biomarkers for clinical applications. Next, using the random walk algorithm, the results of which we screen therapeutic drugs for patients across three cancer types showed high accuracy in the drug-cell line heterogeneity network based on the identified triplets. Besides, we developed a user-friendly web interface-DrugSETs (http://bio-bigdata.hrbmu.edu.cn/DrugSETs/) available to explore 1570 lncRNA-TF-gene triplets relevant with 282 drugs. It can also submit a patient’s expression profile to predict therapeutic drugs conveniently. In summary, our research may promote the study of lncRNAs in the drug resistance mechanism and improve the effectiveness of treatment.
Collapse
Affiliation(s)
- Congxue Hu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yingqi Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Feng Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Wanqi Mi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - He Yu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Xinran Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Xin Wen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Shuaijun Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Xia Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.,Key Laboratory of Tropical Translational Medicine of Ministry of Education, College of Biomedical Information and Engineering, Hainan Medical University, Haikou 571199, China
| | - Yanjun Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yunpeng Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| |
Collapse
|
14
|
Strybol PP, Larmuseau M, de Schaetzen van Brienen L, Van den Bulcke T, Marchal K. Extracting functional insights from loss-of-function screens using deep link prediction. CELL REPORTS METHODS 2022; 2:100171. [PMID: 35474966 PMCID: PMC9017186 DOI: 10.1016/j.crmeth.2022.100171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Revised: 12/09/2021] [Accepted: 01/25/2022] [Indexed: 11/10/2022]
Abstract
We present deep link prediction (DLP), a method for the interpretation of loss-of-function screens. Our approach uses representation-based link prediction to reprioritize phenotypic readouts by integrating screening experiments with gene-gene interaction networks. We validate on 2 different loss-of-function technologies, RNAi and CRISPR, using datasets obtained from DepMap. Extensive benchmarking shows that DLP-DeepWalk outperforms other methods in recovering cell-specific dependencies, achieving an average precision well above 90% across 7 different cancer types and on both RNAi and CRISPR data. We show that the genes ranked highest by DLP-DeepWalk are appreciably more enriched in drug targets compared to the ranking based on original screening scores. Interestingly, this enrichment is more pronounced on RNAi data compared to CRISPR data, consistent with the greater inherent noise of RNAi screens. Finally, we demonstrate how DLP-DeepWalk can infer the molecular mechanism through which putative targets trigger cell line mortality.
Collapse
Affiliation(s)
- Pieter-Paul Strybol
- Department of Plant Biotechnology and Bioinformatics, Department of Information Technology, IDLab, imec, iGent Toren, 9000 Gent, Belgium
| | - Maarten Larmuseau
- Department of Plant Biotechnology and Bioinformatics, Department of Information Technology, IDLab, imec, iGent Toren, 9000 Gent, Belgium
| | - Louise de Schaetzen van Brienen
- Department of Plant Biotechnology and Bioinformatics, Department of Information Technology, IDLab, imec, iGent Toren, 9000 Gent, Belgium
| | | | - Kathleen Marchal
- Department of Plant Biotechnology and Bioinformatics, Department of Information Technology, IDLab, imec, iGent Toren, 9000 Gent, Belgium
| |
Collapse
|
15
|
Abstract
Multi-omics data analysis is an important aspect of cancer molecular biology studies and has led to ground-breaking discoveries. Many efforts have been made to develop machine learning methods that automatically integrate omics data. Here, we review machine learning tools categorized as either general-purpose or task-specific, covering both supervised and unsupervised learning for integrative analysis of multi-omics data. We benchmark the performance of five machine learning approaches using data from the Cancer Cell Line Encyclopedia, reporting accuracy on cancer type classification and mean absolute error on drug response prediction, and evaluating runtime efficiency. This review provides recommendations to researchers regarding suitable machine learning method selection for their specific applications. It should also promote the development of novel machine learning methodologies for data integration, which will be essential for drug discovery, clinical trial design, and personalized treatments.
Collapse
Affiliation(s)
- Zhaoxiang Cai
- ProCan®, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, 214 Hawkesbury Rd, Westmead, NSW 2145, Australia
| | - Rebecca C. Poulos
- ProCan®, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, 214 Hawkesbury Rd, Westmead, NSW 2145, Australia
| | - Jia Liu
- ProCan®, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, 214 Hawkesbury Rd, Westmead, NSW 2145, Australia
- Faculty of Medicine, Western Sydney University, Campbelltown, NSW, Australia
| | - Qing Zhong
- ProCan®, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, 214 Hawkesbury Rd, Westmead, NSW 2145, Australia
| |
Collapse
|
16
|
Pouryahya M, Oh JH, Mathews JC, Belkhatir Z, Moosmüller C, Deasy JO, Tannenbaum AR. Pan-Cancer Prediction of Cell-Line Drug Sensitivity Using Network-Based Methods. Int J Mol Sci 2022; 23:ijms23031074. [PMID: 35163005 PMCID: PMC8835038 DOI: 10.3390/ijms23031074] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 01/15/2022] [Accepted: 01/17/2022] [Indexed: 01/02/2023] Open
Abstract
The development of reliable predictive models for individual cancer cell lines to identify an optimal cancer drug is a crucial step to accelerate personalized medicine, but vast differences in cancer cell lines and drug characteristics make it quite challenging to develop predictive models that result in high predictive power and explain the similarity of cell lines or drugs. Our study proposes a novel network-based methodology that breaks the problem into smaller, more interpretable problems to improve the predictive power of anti-cancer drug responses in cell lines. For the drug-sensitivity study, we used the GDSC database for 915 cell lines and 200 drugs. The theory of optimal mass transport was first used to separately cluster cell lines and drugs, using gene-expression profiles and extensive cheminformatic drug features, represented in a form of data networks. To predict cell-line specific drug responses, random forest regression modeling was separately performed for each cell-line drug cluster pair. Post-modeling biological analysis was further performed to identify potential biological correlates associated with drug responses. The network-based clustering method resulted in 30 distinct cell-line drug cluster pairs. Predictive modeling on each cell-line-drug cluster outperformed alternative computational methods in predicting drug responses. We found that among the four drugs top-ranked with respect to prediction performance, three targeted the PI3K/mTOR signaling pathway. Predictive modeling on clustered subsets of cell lines and drugs improved the prediction accuracy of cell-line specific drug responses. Post-modeling analysis identified plausible biological processes associated with drug responses.
Collapse
Affiliation(s)
- Maryam Pouryahya
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; (M.P.); (J.C.M.); (J.O.D.)
| | - Jung Hun Oh
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; (M.P.); (J.C.M.); (J.O.D.)
- Correspondence:
| | - James C. Mathews
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; (M.P.); (J.C.M.); (J.O.D.)
| | - Zehor Belkhatir
- School of Engineering and Sustainable Development, De Montfort University, Leicester LE1 9BH, UK;
| | - Caroline Moosmüller
- Department of Mathematics, University of California at San Diego, La Jolla, CA 92093, USA;
| | - Joseph O. Deasy
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; (M.P.); (J.C.M.); (J.O.D.)
| | - Allen R. Tannenbaum
- Departments of Computer Science and Applied Mathematics & Statistics, Stony Brook University, Stony Brook, NY 11794, USA;
| |
Collapse
|
17
|
Firoozbakht F, Yousefi B, Schwikowski B. An overview of machine learning methods for monotherapy drug response prediction. Brief Bioinform 2022; 23:bbab408. [PMID: 34619752 PMCID: PMC8769705 DOI: 10.1093/bib/bbab408] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 08/25/2021] [Accepted: 09/06/2021] [Indexed: 12/11/2022] Open
Abstract
For an increasing number of preclinical samples, both detailed molecular profiles and their responses to various drugs are becoming available. Efforts to understand, and predict, drug responses in a data-driven manner have led to a proliferation of machine learning (ML) methods, with the longer term ambition of predicting clinical drug responses. Here, we provide a uniquely wide and deep systematic review of the rapidly evolving literature on monotherapy drug response prediction, with a systematic characterization and classification that comprises more than 70 ML methods in 13 subclasses, their input and output data types, modes of evaluation, and code and software availability. ML experts are provided with a fundamental understanding of the biological problem, and how ML methods are configured for it. Biologists and biomedical researchers are introduced to the basic principles of applicable ML methods, and their application to the problem of drug response prediction. We also provide systematic overviews of commonly used data sources used for training and evaluation methods.
Collapse
Affiliation(s)
- Farzaneh Firoozbakht
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
| | - Behnam Yousefi
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
- Sorbonne Université, École Doctorale Complexite du Vivant, Paris, France
| | - Benno Schwikowski
- Systems Biology Group, Department of Computational Biology, Institut Pasteur, Paris, France
| |
Collapse
|
18
|
Emdadi A, Eslahchi C. Clinical drug response prediction from preclinical cancer cell lines by logistic matrix factorization approach. J Bioinform Comput Biol 2021; 20:2150035. [PMID: 34923927 DOI: 10.1142/s0219720021500359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Predicting tumor drug response using cancer cell line drug response values for a large number of anti-cancer drugs is a significant challenge in personalized medicine. Predicting patient response to drugs from data obtained from preclinical models is made easier by the availability of different knowledge on cell lines and drugs. This paper proposes the TCLMF method, a predictive model for predicting drug response in tumor samples that was trained on preclinical samples and is based on the logistic matrix factorization approach. The TCLMF model is designed based on gene expression profiles, tissue type information, the chemical structure of drugs and drug sensitivity (IC 50) data from cancer cell lines. We use preclinical data from the Genomics of Drug Sensitivity in Cancer dataset (GDSC) to train the proposed drug response model, which we then use to predict drug sensitivity of samples from the Cancer Genome Atlas (TCGA) dataset. The TCLMF approach focuses on identifying successful features of cell lines and drugs in order to calculate the probability of the tumor samples being sensitive to drugs. The closest cell line neighbours for each tumor sample are calculated using a description of similarity between tumor samples and cell lines in this study. The drug response for a new tumor is then calculated by averaging the low-rank features obtained from its neighboring cell lines. We compare the results of the TCLMF model with the results of the previously proposed methods using two databases and two approaches to test the model's performance. In the first approach, 12 drugs with enough known clinical drug response, considered in previous methods, are studied. For 7 drugs out of 12, the TCLMF can significantly distinguish between patients that are resistance to these drugs and the patients that are sensitive to them. These approaches are converted to classification models using a threshold in the second approach, and the results are compared. The results demonstrate that the TCLMF method provides accurate predictions across the results of the other algorithms. Finally, we accurately classify tumor tissue type using the latent vectors obtained from TCLMF's logistic matrix factorization process. These findings demonstrate that the TCLMF approach produces effective latent vectors for tumor samples. The source code of the TCLMF method is available in https://github.com/emdadi/TCLMF.
Collapse
Affiliation(s)
- Akram Emdadi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran
| | - Changiz Eslahchi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran.,School of Biological Sciences, Institute for Research in Fundamental Sciences(IPM), Tehran, Iran
| |
Collapse
|
19
|
An X, Chen X, Yi D, Li H, Guan Y. Representation of molecules for drug response prediction. Brief Bioinform 2021; 23:6375515. [PMID: 34571534 DOI: 10.1093/bib/bbab393] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 08/28/2021] [Accepted: 08/30/2021] [Indexed: 12/18/2022] Open
Abstract
The rapid development of machine learning and deep learning algorithms in the recent decade has spurred an outburst of their applications in many research fields. In the chemistry domain, machine learning has been widely used to aid in drug screening, drug toxicity prediction, quantitative structure-activity relationship prediction, anti-cancer synergy score prediction, etc. This review is dedicated to the application of machine learning in drug response prediction. Specifically, we focus on molecular representations, which is a crucial element to the success of drug response prediction and other chemistry-related prediction tasks. We introduce three types of commonly used molecular representation methods, together with their implementation and application examples. This review will serve as a brief introduction of the broad field of molecular representations.
Collapse
Affiliation(s)
- Xin An
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Xi Chen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Daiyao Yi
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Hongyang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
20
|
Miranda SP, Baião FA, Fleck JL, Piccolo SR. Predicting drug sensitivity of cancer cells based on DNA methylation levels. PLoS One 2021; 16:e0238757. [PMID: 34506489 PMCID: PMC8432830 DOI: 10.1371/journal.pone.0238757] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Accepted: 06/28/2021] [Indexed: 01/22/2023] Open
Abstract
Cancer cell lines, which are cell cultures derived from tumor samples, represent one of the least expensive and most studied preclinical models for drug development. Accurately predicting drug responses for a given cell line based on molecular features may help to optimize drug-development pipelines and explain mechanisms behind treatment responses. In this study, we focus on DNA methylation profiles as one type of molecular feature that is known to drive tumorigenesis and modulate treatment responses. Using genome-wide, DNA methylation profiles from 987 cell lines in the Genomics of Drug Sensitivity in Cancer database, we used machine-learning algorithms to evaluate the potential to predict cytotoxic responses for eight anti-cancer drugs. We compared the performance of five classification algorithms and four regression algorithms representing diverse methodologies, including tree-, probability-, kernel-, ensemble-, and distance-based approaches. We artificially subsampled the data to varying degrees, aiming to understand whether training based on relatively extreme outcomes would yield improved performance. When using classification or regression algorithms to predict discrete or continuous responses, respectively, we consistently observed excellent predictive performance when the training and test sets consisted of cell-line data. Classification algorithms performed best when we trained the models using cell lines with relatively extreme drug-response values, attaining area-under-the-receiver-operating-characteristic-curve values as high as 0.97. The regression algorithms performed best when we trained the models using the full range of drug-response values, although this depended on the performance metrics we used. Finally, we used patient data from The Cancer Genome Atlas to evaluate the feasibility of classifying clinical responses for human tumors based on models derived from cell lines. Generally, the algorithms were unable to identify patterns that predicted patient responses reliably; however, predictions by the Random Forests algorithm were significantly correlated with Temozolomide responses for low-grade gliomas.
Collapse
Affiliation(s)
- Sofia P. Miranda
- Department of Industrial Engineering, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil
| | - Fernanda A. Baião
- Department of Industrial Engineering, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil
| | - Julia L. Fleck
- Mines Saint-Etienne, Univ Clermont Auvergne, CNRS, UMR 6158 LIMOS, Centre CIS, Saint-Etienne, France
| | - Stephen R. Piccolo
- Department of Biology, Brigham Young University, Provo, Utah, United States of America
| |
Collapse
|
21
|
Feng F, Shen B, Mou X, Li Y, Li H. Large-scale pharmacogenomic studies and drug response prediction for personalized cancer medicine. J Genet Genomics 2021; 48:540-551. [PMID: 34023295 DOI: 10.1016/j.jgg.2021.03.007] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 03/26/2021] [Accepted: 03/28/2021] [Indexed: 12/26/2022]
Abstract
The response rate of most anti-cancer drugs is limited because of the high heterogeneity of cancer and the complex mechanism of drug action. Personalized treatment that stratifies patients into subgroups using molecular biomarkers is promising to improve clinical benefit. With the accumulation of preclinical models and advances in computational approaches of drug response prediction, pharmacogenomics has made great success over the last 20 years and is increasingly used in the clinical practice of personalized cancer medicine. In this article, we first summarize FDA-approved pharmacogenomic biomarkers and large-scale pharmacogenomic studies of preclinical cancer models such as patient-derived cell lines, organoids, and xenografts. Furthermore, we comprehensively review the recent developments of computational methods in drug response prediction, covering network, machine learning, and deep learning technologies and strategies to evaluate immunotherapy response. In the end, we discuss challenges and propose possible solutions for further improvement.
Collapse
Affiliation(s)
- Fangyoumin Feng
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Bihan Shen
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Xiaoqin Mou
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yixue Li
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China; Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 330106, China
| | - Hong Li
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| |
Collapse
|
22
|
Coşkun M, Koyutürk M. Node Similarity Based Graph Convolution for Link Prediction in Biological Networks. Bioinformatics 2021; 37:4501-4508. [PMID: 34152393 PMCID: PMC8652026 DOI: 10.1093/bioinformatics/btab464] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 05/20/2021] [Accepted: 06/17/2021] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Link prediction is an important and well-studied problem in network biology. Recently, graph representation learning methods, including Graph Convolutional Network (GCN)-based node embedding have drawn increasing attention in link prediction. MOTIVATION An important component of GCN-based network embedding is the convolution matrix, which is used to propagate features across the network. Existing algorithms use the degree-normalized adjacency matrix for this purpose, as this matrix is closely related to the graph Laplacian, capturing the spectral properties of the network. In parallel, it has been shown that GCNs with a single layer can generate more robust embeddings by reducing the number of parameters. Laplacian-based convolution is not well suited to single layered GCNs, as it limits the propagation of information to immediate neighbors of a node. RESULTS Capitalizing on the rich literature on unsupervised link prediction, we propose using node similarity based convolution matrices in GCNs to compute node embeddings for link prediction. We consider eight representative node similarity measures (Common Neighbors, Jaccard Index, Adamic-Adar, Resource Allocation, Hub Depressed Index, Hub Promoted Index, Sorenson Index, Salton Index) for this purpose. We systematically compare the performance of the resulting algorithms against GCNs that use the degree-normalized adjacency matrix for convolution, as well as other link prediction algorithms. In our experiments, we use three link prediction tasks involving biomedical networks: drug-disease association (DDA) prediction, drug-drug interaction (DDI) prediction, protein-protein interaction (PPI) prediction. Our results show that node similarity-based convolution matrices significantly improve the link prediction performance of GCN-based embeddings. CONCLUSION As sophisticated machine learning frameworks are increasingly employed in biological applications, historically well-established methods can be useful in making a head-start. AVAILABILITY Our method, SiGraC, is implemented as a Python library and is freely available at https://github.com/mustafaCoskunAgu/SiGraC. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mustafa Coşkun
- Department of Computer Engineering, Abdullah Gül University.,Hakkari University, Kayseri, 38080, Turkey
| | - Mehmet Koyutürk
- Department of Computer and Data Sciences.,Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH, 44106, USA
| |
Collapse
|
23
|
Tan X, Yu Y, Duan K, Zhang J, Sun P, Sun H. Current Advances and Limitations of Deep Learning in Anticancer Drug Sensitivity Prediction. Curr Top Med Chem 2021; 20:1858-1867. [PMID: 32648840 DOI: 10.2174/1568026620666200710101307] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 04/02/2020] [Accepted: 04/14/2020] [Indexed: 02/06/2023]
Abstract
Anticancer drug screening can accelerate drug discovery to save the lives of cancer patients, but cancer heterogeneity makes this screening challenging. The prediction of anticancer drug sensitivity is useful for anticancer drug development and the identification of biomarkers of drug sensitivity. Deep learning, as a branch of machine learning, is an important aspect of in silico research. Its outstanding computational performance means that it has been used for many biomedical purposes, such as medical image interpretation, biological sequence analysis, and drug discovery. Several studies have predicted anticancer drug sensitivity based on deep learning algorithms. The field of deep learning has made progress regarding model performance and multi-omics data integration. However, deep learning is limited by the number of studies performed and data sources available, so it is not perfect as a pre-clinical approach for use in the anticancer drug screening process. Improving the performance of deep learning models is a pressing issue for researchers. In this review, we introduce the research of anticancer drug sensitivity prediction and the use of deep learning in this research area. To provide a reference for future research, we also review some common data sources and machine learning methods. Lastly, we discuss the advantages and disadvantages of deep learning, as well as the limitations and future perspectives regarding this approach.
Collapse
Affiliation(s)
- Xian Tan
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Yang Yu
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Kaiwen Duan
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Jingbo Zhang
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Pingping Sun
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Hui Sun
- College of Humanities and Sciences of Northeast Normal University, Changchun 130117, China
| |
Collapse
|
24
|
Huang X, Yu Z, Bu S, Lin Z, Hao X, He W, Yu P, Wang Z, Gao F, Zhang J, Chen J. An Ensemble Model for Prediction of Vancomycin Trough Concentrations in Pediatric Patients. DRUG DESIGN DEVELOPMENT AND THERAPY 2021; 15:1549-1559. [PMID: 33883878 PMCID: PMC8053786 DOI: 10.2147/dddt.s299037] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Accepted: 03/18/2021] [Indexed: 01/22/2023]
Abstract
Purpose This study aimed to establish an optimal model to predict vancomycin trough concentrations by using machine learning. Patients and Methods We enrolled 407 pediatric patients (age < 18 years) who received vancomycin intravenously and underwent therapeutic drug monitoring from June 2013 to April 2020 at Xinhua Hospital affiliated to Shanghai Jiaotong University School of Medicine. The median (interquartile range) age and weight of the patients were 2 (0.63–5) years and 12 (7.8–19) kg. Vancomycin trough concentrations were considered as the target variable, and eight different algorithms were used for predictive performance comparison. The whole dataset (407 cases) was divided into training group and testing group at the ratio of 80%: 20%, which were 325 and 82 cases, respectively. Results Ultimately, five algorithms (XGBoost, GBRT, Bagging, ExtraTree and decision tree) with high R2 (0.657, 0.514, 0.468, 0.425 and 0.450, respectively) were selected and further ensembled to establish the final model and achieve an optimal result. For missing data, through filling the missing values and model ensemble, we obtained R2=0.614, MAE=3.32, MSE=24.39, RMSE=4.94 and a prediction accuracy of 51.22% (predicted trough concentration within ±30% of the actual trough concentration). In comparison with the pharmacokinetic models (R2=0.3), the machine learning model works better in model fitting and has better prediction accuracy. Conclusion Therefore, the ensemble model is useful for the vancomycin concentration prediction, especially in the population of children with great individual variation. As machine learning methods evolve, the clinical value of the ensemble model will be demonstrated in the clinical practice.
Collapse
Affiliation(s)
- Xiaohui Huang
- Department of Pharmacy, Xinhua Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, People's Republic of China
| | - Ze Yu
- Beijing Medicinovo Technology Co. Ltd., Beijing, People's Republic of China
| | - Shuhong Bu
- Department of Pharmacy, Xinhua Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, People's Republic of China
| | - Zhiyan Lin
- Department of Pharmacy, Xinhua Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, People's Republic of China
| | - Xin Hao
- Dalian Medicinovo Technology Co. Ltd., Dalian, Liaoning Province, People's Republic of China
| | - Wenjun He
- Beijing Medicinovo Technology Co. Ltd., Beijing, People's Republic of China
| | - Peng Yu
- Beijing Medicinovo Technology Co. Ltd., Beijing, People's Republic of China
| | - Zeyuan Wang
- Beijing Medicinovo Technology Co. Ltd., Beijing, People's Republic of China
| | - Fei Gao
- Beijing Medicinovo Technology Co. Ltd., Beijing, People's Republic of China
| | - Jian Zhang
- Department of Pharmacy, Xinhua Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, People's Republic of China
| | - Jihui Chen
- Department of Pharmacy, Xinhua Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, People's Republic of China
| |
Collapse
|
25
|
Auslander N, Gussow AB, Koonin EV. Incorporating Machine Learning into Established Bioinformatics Frameworks. Int J Mol Sci 2021; 22:2903. [PMID: 33809353 PMCID: PMC8000113 DOI: 10.3390/ijms22062903] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 03/08/2021] [Accepted: 03/10/2021] [Indexed: 12/23/2022] Open
Abstract
The exponential growth of biomedical data in recent years has urged the application of numerous machine learning techniques to address emerging problems in biology and clinical research. By enabling the automatic feature extraction, selection, and generation of predictive models, these methods can be used to efficiently study complex biological systems. Machine learning techniques are frequently integrated with bioinformatic methods, as well as curated databases and biological networks, to enhance training and validation, identify the best interpretable features, and enable feature and model investigation. Here, we review recently developed methods that incorporate machine learning within the same framework with techniques from molecular evolution, protein structure analysis, systems biology, and disease genomics. We outline the challenges posed for machine learning, and, in particular, deep learning in biomedicine, and suggest unique opportunities for machine learning techniques integrated with established bioinformatics approaches to overcome some of these challenges.
Collapse
Affiliation(s)
| | | | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA;
| |
Collapse
|
26
|
Huang LC, Yeung W, Wang Y, Cheng H, Venkat A, Li S, Ma P, Rasheed K, Kannan N. Quantitative Structure-Mutation-Activity Relationship Tests (QSMART) model for protein kinase inhibitor response prediction. BMC Bioinformatics 2020; 21:520. [PMID: 33183223 PMCID: PMC7664030 DOI: 10.1186/s12859-020-03842-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2020] [Accepted: 10/27/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Protein kinases are a large family of druggable proteins that are genomically and proteomically altered in many human cancers. Kinase-targeted drugs are emerging as promising avenues for personalized medicine because of the differential response shown by altered kinases to drug treatment in patients and cell-based assays. However, an incomplete understanding of the relationships connecting genome, proteome and drug sensitivity profiles present a major bottleneck in targeting kinases for personalized medicine. RESULTS In this study, we propose a multi-component Quantitative Structure-Mutation-Activity Relationship Tests (QSMART) model and neural networks framework for providing explainable models of protein kinase inhibition and drug response ([Formula: see text]) profiles in cell lines. Using non-small cell lung cancer as a case study, we show that interaction terms that capture associations between drugs, pathways, and mutant kinases quantitatively contribute to the response of two EGFR inhibitors (afatinib and lapatinib). In particular, protein-protein interactions associated with the JNK apoptotic pathway, associations between lung development and axon extension, and interaction terms connecting drug substructures and the volume/charge of mutant residues at specific structural locations contribute significantly to the observed [Formula: see text] values in cell-based assays. CONCLUSIONS By integrating multi-omics data in the QSMART model, we not only predict drug responses in cancer cell lines with high accuracy but also identify features and explainable interaction terms contributing to the accuracy. Although we have tested our multi-component explainable framework on protein kinase inhibitors, it can be extended across the proteome to investigate the complex relationships connecting genotypes and drug sensitivity profiles.
Collapse
Affiliation(s)
- Liang-Chin Huang
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
| | - Wayland Yeung
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
| | - Ye Wang
- Department of Statistics, University of Georgia, 310 Herty Drive, Athens, GA 30602 USA
| | - Huimin Cheng
- Department of Statistics, University of Georgia, 310 Herty Drive, Athens, GA 30602 USA
| | - Aarya Venkat
- Department of Biochemistry and Molecular Biology, 120 Green St., Athens, GA 30602 USA
| | - Sheng Li
- Department of Computer Science, 415 Boyd Graduate Studies Research Center, Athens, GA 30602 USA
| | - Ping Ma
- Department of Statistics, University of Georgia, 310 Herty Drive, Athens, GA 30602 USA
| | - Khaled Rasheed
- Department of Computer Science, 415 Boyd Graduate Studies Research Center, Athens, GA 30602 USA
| | - Natarajan Kannan
- Institute of Bioinformatics, University of Georgia, 120 Green St., Athens, GA 30602 USA
- Department of Biochemistry and Molecular Biology, 120 Green St., Athens, GA 30602 USA
| |
Collapse
|
27
|
Yu L, Zhou D, Gao L, Zha Y. Prediction of drug response in multilayer networks based on fusion of multiomics data. Methods 2020; 192:85-92. [PMID: 32798653 DOI: 10.1016/j.ymeth.2020.08.006] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 06/22/2020] [Accepted: 08/09/2020] [Indexed: 12/14/2022] Open
Abstract
Predicting the response of each individual patient to a drug is a key issue assailing personalized medicine. Our study predicted drug response based on the fusion of multiomics data with low-dimensional feature vector representation on a multilayer network model. We named this new method DREMO (Drug Response prEdiction based on MultiOmics data fusion). DREMO fuses similarities between cell lines and similarities between drugs, thereby improving the ability to predict the response of cancer cell lines to therapeutic agents. First, a multilayer similarity network related to cell lines and drugs was constructed based on gene expression profiles, somatic mutation, copy number variation (CNV), drug chemical structures, and drug targets. Next, low-dimensional feature vector representation was used to fuse the biological information in the multilayer network. Then, a machine learning model was applied to predict new drug-cell line associations. Finally, our results were validated using the well-established GDSC/CCLE databases, literature, and the functional pathway database. Furthermore, a comparison was made between DREMO and other methods. Results of the comparison showed that DREMO improves predictive capabilities significantly.
Collapse
Affiliation(s)
- Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China.
| | - Dandan Zhou
- School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China
| | - Yunhong Zha
- Department of Neurology, Institute of Neural Regeneration and Repair, Three Gorges University College of Medicine, The First Hospital of Yichang, Yichang, China.
| |
Collapse
|
28
|
Liu C, Wei D, Xiang J, Ren F, Huang L, Lang J, Tian G, Li Y, Yang J. An Improved Anticancer Drug-Response Prediction Based on an Ensemble Method Integrating Matrix Completion and Ridge Regression. MOLECULAR THERAPY. NUCLEIC ACIDS 2020; 21:676-686. [PMID: 32759058 PMCID: PMC7403773 DOI: 10.1016/j.omtn.2020.07.003] [Citation(s) in RCA: 62] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Revised: 06/10/2020] [Accepted: 07/06/2020] [Indexed: 12/16/2022]
Abstract
In this study, we proposed an ensemble learning method, simultaneously integrating a low-rank matrix completion model and a ridge regression model to predict anticancer drug response on cancer cell lines. The model was applied to two benchmark datasets, including the Cancer Cell Line Encyclopedia (CCLE) and the Genomics of Drug Sensitivity in Cancer (GDSC). As previous studies suggest, the dual-layer integrated cell line-drug network model was one of the best models by far and outperformed most state-of-the-art models. Thus, we performed a head-to-head comparison between the dual-layer integrated cell line-drug network model and our model by a 10-fold crossvalidation study. For the CCLE dataset, our model has a higher Pearson correlation coefficient between predicted and observed drug responses than that of the dual-layer integrated cell line-drug network model in 18 out of 23 drugs. For the GDSC dataset, our model is better in 26 out of 28 drugs in the phosphatidylinositol 3-kinase (PI3K) pathway and 26 out of 30 drugs in the extracellular signal-regulated kinase (ERK) signaling pathway, respectively. Based on the prediction results, we carried out two types of case studies, which further verified the effectiveness of the proposed model on the drug-response prediction. In addition, our model is more biologically interpretable than the compared method, since it explicitly outputs the genes involved in the prediction, which are enriched in functions, like transcription, Src homology 2/3 (SH2/3) domain, cell cycle, ATP binding, and zinc finger.
Collapse
Affiliation(s)
- Chuanying Liu
- School of Science, Yanshan University, Qinhuangdao, Hebei 066004, China
| | - Dong Wei
- School of Science, Yanshan University, Qinhuangdao, Hebei 066004, China
| | - Ju Xiang
- College of Information Engineering, Changsha Medical University, Changsha, Hunan 410219, China; School of Information Science and Engineering, Central South University, Changsha 410083, China
| | - Fuquan Ren
- School of Science, Yanshan University, Qinhuangdao, Hebei 066004, China
| | - Li Huang
- Tianhang Experiment School, Hangzhou, Zhejiang 310004, China
| | - Jidong Lang
- Geneis Beijing Co., Ltd., Beijing 100102, China
| | - Geng Tian
- Geneis Beijing Co., Ltd., Beijing 100102, China
| | - Yushuang Li
- School of Science, Yanshan University, Qinhuangdao, Hebei 066004, China.
| | - Jialiang Yang
- College of Information Engineering, Changsha Medical University, Changsha, Hunan 410219, China; Geneis Beijing Co., Ltd., Beijing 100102, China.
| |
Collapse
|
29
|
Wang W, Lv H, Zhao Y, Liu D, Wang Y, Zhang Y. DLS: A Link Prediction Method Based on Network Local Structure for Predicting Drug-Protein Interactions. Front Bioeng Biotechnol 2020; 8:330. [PMID: 32391341 PMCID: PMC7193019 DOI: 10.3389/fbioe.2020.00330] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 03/25/2020] [Indexed: 12/22/2022] Open
Abstract
The studies on drug-protein interactions (DPIs) had significant for drug repositioning, drug discovery, and clinical medicine. The biochemical experimentation (in vitro) requires a long time and high cost to be confirmed because it is difficult to estimate. Therefore, a feasible solution is to predict DPIs efficiently with computers. We propose a link prediction method based on drug-protein interaction (DPI) local structural similarity (DLS) for predicting the DPIs. The DLS method combines link prediction and binary network structure to predict DPIs. The ten-fold cross-validation method was applied in the experiment. After comparing the predictive capability of DLS with the improved similarity-based network prediction method, the results of DLS on the test set are significantly better. Moreover, several candidate proteins were predicted for three approved drugs, namely captopril, desferrioxamine and losartan, and these predictions are further validated by the literature. In addition, the combination of the Common Neighborhood (CN) method and the DLS method provides a new idea for the integrated application of the link prediction method.
Collapse
Affiliation(s)
- Wei Wang
- Department of Computer Science and Technology, College of Computer and Information Engineering, Henan Normal University, Xinxiang, China.,Big Data Engineering Laboratory for Teaching Resources and Assessment of Education Quality, Xinxiang, China
| | - Hehe Lv
- Department of Computer Science and Technology, College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
| | - Yuan Zhao
- Department of Computer Science and Technology, College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
| | - Dong Liu
- Department of Computer Science and Technology, College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
| | - Yongqing Wang
- Department of Computer Science and Technology, College of Computer and Information Engineering, Henan Normal University, Xinxiang, China.,Big Data Engineering Laboratory for Teaching Resources and Assessment of Education Quality, Xinxiang, China
| | - Yu Zhang
- Department of Computer Science and Technology, College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
| |
Collapse
|
30
|
Wang S, Li J. Modular within and between score for drug response prediction in cancer cell lines. Mol Omics 2020; 16:31-38. [PMID: 31802092 DOI: 10.1039/c9mo00162j] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Drug response prediction in cancer cell lines is vital to discover new anticancer drugs. However, it's still a challenging task to accurately predict drug responses in cancer cell lines. In this study, we presented a novel computational approach, named as MSDRP (modular within and between score for drug response prediction), to predict drug responses in cell lines. The method is based on a constructed heterogeneous drug-cell line network with multiple information. Compared with other state-of-the-art methods, MSDRP acquired better predictive performance, and identified potential associations between drugs and cell lines, which have been confirmed by the published literature. The source code of MSDRP is freely available at https://github.com/shimingwang1994/MSDRP.git.
Collapse
Affiliation(s)
- Shiming Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
| | | |
Collapse
|
31
|
Cowman T, Coşkun M, Grama A, Koyutürk M. Integrated querying and version control of context-specific biological networks. Database (Oxford) 2020; 2020:baaa018. [PMID: 32294194 PMCID: PMC7158887 DOI: 10.1093/database/baaa018] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Revised: 01/13/2020] [Accepted: 02/21/2020] [Indexed: 01/26/2023]
Abstract
MOTIVATION Biomolecular data stored in public databases is increasingly specialized to organisms, context/pathology and tissue type, potentially resulting in significant overhead for analyses. These networks are often specializations of generic interaction sets, presenting opportunities for reducing storage and computational cost. Therefore, it is desirable to develop effective compression and storage techniques, along with efficient algorithms and a flexible query interface capable of operating on compressed data structures. Current graph databases offer varying levels of support for network integration. However, these solutions do not provide efficient methods for the storage and querying of versioned networks. RESULTS We present VerTIoN, a framework consisting of novel data structures and associated query mechanisms for integrated querying of versioned context-specific biological networks. As a use case for our framework, we study network proximity queries in which the user can select and compose a combination of tissue-specific and generic networks. Using our compressed version tree data structure, in conjunction with state-of-the-art numerical techniques, we demonstrate real-time querying of large network databases. CONCLUSION Our results show that it is possible to support flexible queries defined on heterogeneous networks composed at query time while drastically reducing response time for multiple simultaneous queries. The flexibility offered by VerTIoN in composing integrated network versions opens significant new avenues for the utilization of ever increasing volume of context-specific network data in a broad range of biomedical applications. AVAILABILITY AND IMPLEMENTATION VerTIoN is implemented as a C++ library and is available at http://compbio.case.edu/omics/software/vertion and https://github.com/tjcowman/vertion. CONTACT tyler.cowman@case.edu.
Collapse
Affiliation(s)
- Tyler Cowman
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Mustafa Coşkun
- Department of Computer Engineering, Abdullah Gül University, Kayseri 38080, Turkey
| | - Ananth Grama
- Department of Computer Science, Purdue University, West Lafayette, IN 47906, USA
| | - Mehmet Koyutürk
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, USA
- Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH 44106, USA
| |
Collapse
|
32
|
Güvenç Paltun B, Mamitsuka H, Kaski S. Improving drug response prediction by integrating multiple data sources: matrix factorization, kernel and network-based approaches. Brief Bioinform 2019; 22:346-359. [PMID: 31838491 PMCID: PMC7820853 DOI: 10.1093/bib/bbz153] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 11/01/2019] [Accepted: 11/04/2019] [Indexed: 12/17/2022] Open
Abstract
Predicting the response of cancer cell lines to specific drugs is one of the central problems in personalized medicine, where the cell lines show diverse characteristics. Researchers have developed a variety of computational methods to discover associations between drugs and cell lines, and improved drug sensitivity analyses by integrating heterogeneous biological data. However, choosing informative data sources and methods that can incorporate multiple sources efficiently is the challenging part of successful analysis in personalized medicine. The reason is that finding decisive factors of cancer and developing methods that can overcome the problems of integrating data, such as differences in data structures and data complexities, are difficult. In this review, we summarize recent advances in data integration-based machine learning for drug response prediction, by categorizing methods as matrix factorization-based, kernel-based and network-based methods. We also present a short description of relevant databases used as a benchmark in drug response prediction analyses, followed by providing a brief discussion of challenges faced in integrating and interpreting data from multiple sources. Finally, we address the advantages of combining multiple heterogeneous data sources on drug sensitivity analysis by showing an experimental comparison. Contact: betul.guvenc@aalto.fi
Collapse
Affiliation(s)
- Betül Güvenç Paltun
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Helsinki, Finland
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| | - Samuel Kaski
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| |
Collapse
|
33
|
A Deep Learning Model for Cell Growth Inhibition IC50 Prediction and Its Application for Gastric Cancer Patients. Int J Mol Sci 2019; 20:ijms20246276. [PMID: 31842404 PMCID: PMC6941066 DOI: 10.3390/ijms20246276] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Revised: 12/09/2019] [Accepted: 12/10/2019] [Indexed: 02/07/2023] Open
Abstract
Heterogeneity in intratumoral cancers leads to discrepancies in drug responsiveness, due to diverse genomics profiles. Thus, prediction of drug responsiveness is critical in precision medicine. So far, in drug responsiveness prediction, drugs’ molecular “fingerprints”, along with mutation statuses, have not been considered. Here, we constructed a 1-dimensional convolution neural network model, DeepIC50, to predict three drug responsiveness classes, based on 27,756 features including mutation statuses and various drug molecular fingerprints. As a result, DeepIC50 showed better cell viability IC50 prediction accuracy in pan-cancer cell lines over two independent cancer cell line datasets. Gastric cancer (GC) is not only one of the lethal cancer types in East Asia, but also a heterogeneous cancer type. Currently approved targeted therapies in GC are only trastuzumab and ramucirumab. Responsive GC patients for the drugs are limited, and more drugs should be developed in GC. Due to the importance of GC, we applied DeepIC50 to a real GC patient dataset. Drug responsiveness prediction in the patient dataset by DeepIC50, when compared to the other models, were comparable to responsiveness observed in GC cell lines. DeepIC50 could possibly accurately predict drug responsiveness, to new compounds, in diverse cancer cell lines, in the drug discovery process.
Collapse
|
34
|
Manica M, Oskooei A, Born J, Subramanian V, Sáez-Rodríguez J, Rodríguez Martínez M. Toward Explainable Anticancer Compound Sensitivity Prediction via Multimodal Attention-Based Convolutional Encoders. Mol Pharm 2019; 16:4797-4806. [DOI: 10.1021/acs.molpharmaceut.9b00520] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
| | | | - Jannis Born
- IBM Research, 8803 Zürich, Switzerland
- ETH Zürich, 8092 Zürich, Switzerland
- University of Zürich, 8006 Zürich, Switzerland
| | | | | | | |
Collapse
|
35
|
Zitnik M, Nguyen F, Wang B, Leskovec J, Goldenberg A, Hoffman MM. Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities. AN INTERNATIONAL JOURNAL ON INFORMATION FUSION 2019; 50:71-91. [PMID: 30467459 PMCID: PMC6242341 DOI: 10.1016/j.inffus.2018.09.012] [Citation(s) in RCA: 262] [Impact Index Per Article: 43.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include myriad properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Computer Science, Stanford University,
Stanford, CA, USA
| | - Francis Nguyen
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
| | - Bo Wang
- Hikvision Research Institute, Santa Clara, CA, USA
| | - Jure Leskovec
- Department of Computer Science, Stanford University,
Stanford, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Anna Goldenberg
- Genetics & Genome Biology, SickKids Research Institute,
Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| | - Michael M. Hoffman
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| |
Collapse
|
36
|
Guan NN, Zhao Y, Wang CC, Li JQ, Chen X, Piao X. Anticancer Drug Response Prediction in Cell Lines Using Weighted Graph Regularized Matrix Factorization. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 17:164-174. [PMID: 31265947 PMCID: PMC6610642 DOI: 10.1016/j.omtn.2019.05.017] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 05/17/2019] [Accepted: 05/20/2019] [Indexed: 12/14/2022]
Abstract
Precision medicine has become a novel and rising concept, which depends much on the identification of individual genomic signatures for different patients. The cancer cell lines could reflect the “omic” diversity of primary tumors, based on which many works have been carried out to study the cancer biology and drug discovery both in experimental and computational aspects. In this work, we presented a novel method to utilize weighted graph regularized matrix factorization (WGRMF) for inferring anticancer drug response in cell lines. We constructed a p-nearest neighbor graph to sparsify drug similarity matrix and cell line similarity matrix, respectively. Using the sparsified matrices in the graph regularization terms, we performed matrix factorization to generate the latent matrices for drug and cell line. The graph regularization terms including neighbor information could help to exclude the noisy ingredient and improve the prediction accuracy. The 10-fold cross-validation was implemented, and the Pearson correlation coefficient (PCC), root-mean-square error (RMSE), PCCsr, and RMSEsr averaged over all drugs were calculated to evaluate the performance of WGRMF. The results on the Genomics of Drug Sensitivity in Cancer (GDSC) dataset are 0.64 ± 0.16, 1.37 ± 0.35, 0.73 ± 0.14, and 1.71 ± 0.44 for PCC, RMSE, PCCsr, and RMSEsr in turn. And for the Cancer Cell Line Encyclopedia (CCLE) dataset, WGRMF got results of 0.72 ± 0.09, 0.56 ± 0.19, 0.79 ± 0.07, and 0.69 ± 0.19, respectively. The results showed the superiority of WGRMF compared with previous methods. Besides, based on the prediction results using the GDSC dataset, three types of case studies were carried out. The results from both cross-validation and case studies have shown the effectiveness of WGRMF on the prediction of drug response in cell lines.
Collapse
Affiliation(s)
- Na-Na Guan
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
| | - Yan Zhao
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Chun-Chun Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Jian-Qiang Li
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China.
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China.
| | - Xue Piao
- School of Medical Informatics, Xuzhou Medical University, Xuzhou 221004, China.
| |
Collapse
|
37
|
Estimating genome-wide off-target effects for pyrrole-imidazole polyamide binding by a pathway-based expression profiling approach. PLoS One 2019; 14:e0215247. [PMID: 30964912 PMCID: PMC6456183 DOI: 10.1371/journal.pone.0215247] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2018] [Accepted: 03/28/2019] [Indexed: 01/17/2023] Open
Abstract
In the search for new pharmaceutical leads, especially with DNA-binding molecules or genome editing methods, the issue of side and off-target effects have always been thorny in nature. A particular case is the investigation into the off-target effects of N-methylpyrrole-N-methylimidazole polyamides, a naturally inspired class of DNA binders with strong affinity to the minor-groove and sequence specificity, but at < 20 bases, their relatively short motifs also insinuate the possibility of non-unique genomic binding. Binding at non-intended loci potentially lead to the rise of off-target effects, issues that very few approaches are able to address to-date. We here report an analytical method to infer off-target binding, via expression profiling, based on probing the relative impact to various biochemical pathways; we also proposed an accompanying side effect prediction engine for the systematic screening of candidate polyamides. This method marks the first attempt in PI polyamide research to identify elements in biochemical pathways that are sensitive to the treatment of a candidate polyamide as an approach to infer possible off-target effects. Expression changes were then considered to assess possible outward phenotypic changes, manifested as side effects, should the same PI polyamide candidate be administered clinically. We validated some of these effects with a series of animal experiments, and found agreeable corroboration in certain side effects, such as changes in aspartate transaminase levels in ICR and nude mice post-administration.
Collapse
|
38
|
Wei D, Liu C, Zheng X, Li Y. Comprehensive anticancer drug response prediction based on a simple cell line-drug complex network model. BMC Bioinformatics 2019; 20:44. [PMID: 30670007 PMCID: PMC6341656 DOI: 10.1186/s12859-019-2608-9] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2018] [Accepted: 01/04/2019] [Indexed: 12/11/2022] Open
Abstract
Background Accurate prediction of anticancer drug responses in cell lines is a crucial step to accomplish the precision medicine in oncology. Although many popular computational models have been proposed towards this non-trivial issue, there is still room for improving the prediction performance by combining multiple types of genome-wide molecular data. Results We first demonstrated an observation on the CCLE and GDSC datasets, i.e., genetically similar cell lines always exhibit higher response correlations to structurally related drugs. Based on this observation we built a cell line-drug complex network model, named CDCN model. It captures different contributions of all available cell line-drug responses through cell line similarities and drug similarities. We executed anticancer drug response prediction on CCLE and GDSC independently. The result is significantly superior to that of some existing studies. More importantly, our model could predict the response of new drug to new cell line with considerable performance. We also divided all possible cell lines into “sensitive” and “resistant” groups by their response values to a given drug, the prediction accuracy, sensitivity, specificity and goodness of fit are also very promising. Conclusion CDCN model is a comprehensive tool to predict anticancer drug responses. Compared with existing methods, it is able to provide more satisfactory prediction results with less computational consumption. Electronic supplementary material The online version of this article (10.1186/s12859-019-2608-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Dong Wei
- School of Science, Yanshan University, Qinhuangdao, 066004, China
| | - Chuanying Liu
- School of Science, Yanshan University, Qinhuangdao, 066004, China
| | - Xiaoqi Zheng
- Department of Mathematics, Shanghai Normal University, Shanghai, 200234, China.
| | - Yushuang Li
- School of Science, Yanshan University, Qinhuangdao, 066004, China.
| |
Collapse
|
39
|
Kapadia P, Khare S, Priyadarshini P, Das B. Predicting Protein-Protein Interaction in Multi-layer Blood Cell PPI Networks. COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE 2019. [DOI: 10.1007/978-981-15-0111-1_22] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
40
|
Handelman GS, Kok HK, Chandra RV, Razavi AH, Lee MJ, Asadi H. eDoctor: machine learning and the future of medicine. J Intern Med 2018; 284:603-619. [PMID: 30102808 DOI: 10.1111/joim.12822] [Citation(s) in RCA: 471] [Impact Index Per Article: 67.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Machine learning (ML) is a burgeoning field of medicine with huge resources being applied to fuse computer science and statistics to medical problems. Proponents of ML extol its ability to deal with large, complex and disparate data, often found within medicine and feel that ML is the future for biomedical research, personalized medicine, computer-aided diagnosis to significantly advance global health care. However, the concepts of ML are unfamiliar to many medical professionals and there is untapped potential in the use of ML as a research tool. In this article, we provide an overview of the theory behind ML, explore the common ML algorithms used in medicine including their pitfalls and discuss the potential future of ML in medicine.
Collapse
Affiliation(s)
| | - H K Kok
- Interventional Radiology Service, Northern Hospital Radiology, Epping, Vic, Australia
| | - R V Chandra
- Interventional Neuroradiology Service, Monash Imaging, Monash Health, Clayton, Vic, Australia.,Faculty of Medicine, Nursing and Health Sciences, Monash University, Clayton, Vic, Australia
| | - A H Razavi
- School of Information Technology and Engineering, University of Ottawa, Ottawa, ON, Canada.,BCE Corporate Security, Ottawa, ON, Canada
| | - M J Lee
- Department of Radiology, Beaumont Hospital and Royal College of Surgeons in Ireland, Dublin, Ireland
| | - H Asadi
- Interventional Neuroradiology Service, Monash Imaging, Monash Health, Clayton, Vic, Australia.,Department of Radiology, Interventional Neuroradiology Service, Austin Health, Heidelberg, Vic, Australia.,School of Medicine, Faculty of Health, Deakin University, Waurn Ponds, Vic, Australia
| |
Collapse
|
41
|
Yang J, Li A, Li Y, Guo X, Wang M. A novel approach for drug response prediction in cancer cell lines via network representation learning. Bioinformatics 2018; 35:1527-1535. [DOI: 10.1093/bioinformatics/bty848] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Revised: 09/09/2018] [Accepted: 10/09/2018] [Indexed: 11/13/2022] Open
Affiliation(s)
- Jianghong Yang
- School of Information Science and Technology, University of Science and Technology of China, Hefei AH230037, China
| | - Ao Li
- School of Information Science and Technology, University of Science and Technology of China, Hefei AH230037, China
- Centers for Biomedical Engineering, University of Science and Technology of China, Hefei AH230037, China
| | - Yongqiang Li
- Department of Preventive Medicine, Institute of Biomedical Informatics, Cell Signal Transduction Laboratory, School of Basic Medical Sciences, Henan University, Kaifeng, China
| | - Xiangqian Guo
- Department of Preventive Medicine, Institute of Biomedical Informatics, Cell Signal Transduction Laboratory, School of Basic Medical Sciences, Henan University, Kaifeng, China
| | - Minghui Wang
- School of Information Science and Technology, University of Science and Technology of China, Hefei AH230037, China
- Centers for Biomedical Engineering, University of Science and Technology of China, Hefei AH230037, China
| |
Collapse
|
42
|
Liu H, Zhao Y, Zhang L, Chen X. Anti-cancer Drug Response Prediction Using Neighbor-Based Collaborative Filtering with Global Effect Removal. MOLECULAR THERAPY. NUCLEIC ACIDS 2018; 13:303-311. [PMID: 30321817 PMCID: PMC6197792 DOI: 10.1016/j.omtn.2018.09.011] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2018] [Revised: 09/17/2018] [Accepted: 09/18/2018] [Indexed: 02/06/2023]
Abstract
Patients of the same cancer may differ in their responses to a specific medical therapy. Identification of predictive molecular features for drug sensitivity holds the key in the era of precision medicine. Human cell lines have harbored most of the same genetic changes found in patients’ tumors and thus are widely used in the research of drug response. In this work, we formulated drug-response prediction as a recommender system problem and then adopted a neighbor-based collaborative filtering with global effect removal (NCFGER) method to estimate anti-cancer drug responses of cell lines by integrating cell-line similarity networks and drug similarity networks based on the fact that similar cell lines and similar drugs exhibit similar responses. Specifically, we removed the global effect in the available responses and shrunk the similarity score for each cell line pair as well as each drug pair. We then used the K most similar neighbors (hybrid of cell-line-oriented and drug-oriented) in the available responses to predict the unknown ones. Through 10-fold cross-validation, this approach was shown to reach accurate and reproducible outcomes of drug sensitivity. We also discussed the biological outcomes based on the newly predicted response values.
Collapse
Affiliation(s)
- Hui Liu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Yan Zhao
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China.
| |
Collapse
|
43
|
Zhang L, Chen X, Guan NN, Liu H, Li JQ. A Hybrid Interpolation Weighted Collaborative Filtering Method for Anti-cancer Drug Response Prediction. Front Pharmacol 2018; 9:1017. [PMID: 30258362 PMCID: PMC6143790 DOI: 10.3389/fphar.2018.01017] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Accepted: 08/22/2018] [Indexed: 12/16/2022] Open
Abstract
Individualized therapies ask for the most effective regimen for each patient, while the patients' response may differ from each other. However, it is impossible to clinically evaluate each patient's response due to the large population. Human cell lines have harbored most of the same genetic changes found in patients' tumors, thus are widely used to help understand initial responses of drugs. Based on the more credible assumption that similar cell lines and similar drugs exhibit similar responses, we formulated drug response prediction as a recommender system problem, and then adopted a hybrid interpolation weighted collaborative filtering (HIWCF) method to predict anti-cancer drug responses of cell lines by incorporating cell line similarity and drug similarity shown from gene expression profiles, drug chemical structure as well as drug response similarity. Specifically, we estimated the baseline based on the available responses and shrunk the similarity score for each cell line pair as well as each drug pair. The similarity scores were then shrunk and weighted by the correlation coefficients drawn from the know response between each pair. Before used to find the K most similar neighbors for further prediction, they went through the case amplification strategy to emphasize high similarity and neglect low similarity. In the last step for prediction, cell line-oriented and drug-oriented collaborative filtering models were carried out, and the average of predicted values from both models was used as the final predicted sensitivity. Through 10-fold cross validation, this approach was shown to reach accurate and reproducible outcome for those missing drug sensitivities. We also found that the drug response similarity between cell lines or drugs may play important role in the prediction. Finally, we discussed the biological outcomes based on the newly predicted response values in GDSC dataset.
Collapse
Affiliation(s)
- Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Na-Na Guan
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - Hui Liu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Jian-Qiang Li
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| |
Collapse
|
44
|
Tan M, Özgül OF, Bardak B, Ekşioğlu I, Sabuncuoğlu S. Drug response prediction by ensemble learning and drug-induced gene expression signatures. Genomics 2018; 111:1078-1088. [PMID: 31533900 DOI: 10.1016/j.ygeno.2018.07.002] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2018] [Revised: 06/12/2018] [Accepted: 07/03/2018] [Indexed: 12/14/2022]
Abstract
Chemotherapeutic response of cancer cells to a given compound is one of the most fundamental information one requires to design anti-cancer drugs. Recently, considerable amount of drug-induced gene expression data has become publicly available, in addition to cytotoxicity databases. These large sets of data provided an opportunity to apply machine learning methods to predict drug activity. However, due to the complexity of cancer drug mechanisms, none of the existing methods is perfect. In this paper, we propose a novel ensemble learning method to predict drug response. In addition, we attempt to use the drug screen data together with two novel signatures produced from the drug-induced gene expression profiles of cancer cell lines. Finally, we evaluate predictions by in vitro experiments in addition to the tests on data sets. The predictions of the methods, the signatures and the software are available from http://mtan.etu.edu.tr/drug-response-prediction/.
Collapse
Affiliation(s)
- Mehmet Tan
- Department of Computer Engineering, TOBB University of Economics and Technology, Ankara, Turkey.
| | - Ozan Fırat Özgül
- Department of Computer Engineering, TOBB University of Economics and Technology, Ankara, Turkey
| | - Batuhan Bardak
- Department of Computer Engineering, TOBB University of Economics and Technology, Ankara, Turkey
| | - Işıksu Ekşioğlu
- Department of Computer Engineering, TOBB University of Economics and Technology, Ankara, Turkey
| | - Suna Sabuncuoğlu
- Department of Toxicology, Faculty of Pharmacy, Hacettepe University, Ankara, Turkey
| |
Collapse
|
45
|
Abstract
BACKGROUND A significant problem in precision medicine is the prediction of drug sensitivity for individual cancer cell lines. Predictive models such as Random Forests have shown promising performance while predicting from individual genomic features such as gene expressions. However, accessibility of various other forms of data types including information on multiple tested drugs necessitates the examination of designing predictive models incorporating the various data types. RESULTS We explore the predictive performance of model stacking and the effect of stacking on the predictive bias and squared error. In addition we discuss the analytical underpinnings supporting the advantages of stacking in reducing squared error and inherent bias of random forests in prediction of outliers. The framework is tested on a setup including gene expression, drug target, physical properties and drug response information for a set of drugs and cell lines. CONCLUSION The performance of individual and stacked models are compared. We note that stacking models built on two heterogeneous datasets provide superior performance to stacking different models built on the same dataset. It is also noted that stacking provides a noticeable reduction in the bias of our predictors when the dominant eigenvalue of the principle axis of variation in the residuals is significantly higher than the remaining eigenvalues.
Collapse
Affiliation(s)
- Kevin Matlock
- Department of Electrical and Computer Engineering, Texas Tech University, 1012 Boston Ave, Lubbock, 79409 TX USA
| | - Carlos De Niz
- Department of Electrical and Computer Engineering, Texas Tech University, 1012 Boston Ave, Lubbock, 79409 TX USA
| | - Raziur Rahman
- Department of Electrical and Computer Engineering, Texas Tech University, 1012 Boston Ave, Lubbock, 79409 TX USA
| | - Souparno Ghosh
- Department of Mathematics and Statistics, Texas Tech University, 1108 Memorial Circle, Lubbock, 79409 TX USA
| | - Ranadip Pal
- Department of Electrical and Computer Engineering, Texas Tech University, 1012 Boston Ave, Lubbock, 79409 TX USA
| |
Collapse
|
46
|
A novel heterogeneous network-based method for drug response prediction in cancer cell lines. Sci Rep 2018; 8:3355. [PMID: 29463808 PMCID: PMC5820329 DOI: 10.1038/s41598-018-21622-4] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 02/06/2018] [Indexed: 02/01/2023] Open
Abstract
An enduring challenge in personalized medicine lies in selecting a suitable drug for each individual patient. Here we concentrate on predicting drug responses based on a cohort of genomic, chemical structure, and target information. Therefore, a recently study such as GDSC has provided an unprecedented opportunity to infer the potential relationships between cell line and drug. While existing approach rely primarily on regression, classification or multiple kernel learning to predict drug responses. Synthetic approach indicates drug target and protein-protein interaction could have the potential to improve the prediction performance of drug response. In this study, we propose a novel heterogeneous network-based method, named as HNMDRP, to accurately predict cell line-drug associations through incorporating heterogeneity relationship among cell line, drug and target. Compared to previous study, HNMDRP can make good use of above heterogeneous information to predict drug responses. The validity of our method is verified not only by plotting the ROC curve, but also by predicting novel cell line-drug sensitive associations which have dependable literature evidences. This allows us possibly to suggest potential sensitive associations among cell lines and drugs. Matlab and R codes of HNMDRP can be found at following https://github.com/USTC-HIlab/HNMDRP.
Collapse
|
47
|
Wang T, He XS, Zhou MY, Fu ZQ. Link Prediction in Evolving Networks Based on Popularity of Nodes. Sci Rep 2017; 7:7147. [PMID: 28769053 PMCID: PMC5540936 DOI: 10.1038/s41598-017-07315-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2017] [Accepted: 06/26/2017] [Indexed: 01/26/2023] Open
Abstract
Link prediction aims to uncover the underlying relationship behind networks, which could be utilized to predict missing edges or identify the spurious edges. The key issue of link prediction is to estimate the likelihood of potential links in networks. Most classical static-structure based methods ignore the temporal aspects of networks, limited by the time-varying features, such approaches perform poorly in evolving networks. In this paper, we propose a hypothesis that the ability of each node to attract links depends not only on its structural importance, but also on its current popularity (activeness), since active nodes have much more probability to attract future links. Then a novel approach named popularity based structural perturbation method (PBSPM) and its fast algorithm are proposed to characterize the likelihood of an edge from both existing connectivity structure and current popularity of its two endpoints. Experiments on six evolving networks show that the proposed methods outperform state-of-the-art methods in accuracy and robustness. Besides, visual results and statistical analysis reveal that the proposed methods are inclined to predict future edges between active nodes, rather than edges between inactive nodes.
Collapse
Affiliation(s)
- Tong Wang
- Department of Electronic Science and Technology, University of Science and Technology of China, Hefei, 230027, P. R. China
| | - Xing-Sheng He
- Department of Electronic Science and Technology, University of Science and Technology of China, Hefei, 230027, P. R. China
| | - Ming-Yang Zhou
- Guangdong Province Key Laboratory of Popular High Performance Computers, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, P. R. China. .,Physics Department, University of Fribourg, Chemin du Musée 3, Fribourg, CH-1700, Switzerland.
| | - Zhong-Qian Fu
- Department of Electronic Science and Technology, University of Science and Technology of China, Hefei, 230027, P. R. China
| |
Collapse
|
48
|
Schork NJ, Nazor K. Integrated Genomic Medicine: A Paradigm for Rare Diseases and Beyond. ADVANCES IN GENETICS 2017; 97:81-113. [PMID: 28838357 PMCID: PMC6383766 DOI: 10.1016/bs.adgen.2017.06.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Individualized medicine, or the tailoring of therapeutic interventions to a patient's unique genetic, biochemical, physiological, exposure and behavioral profile, has been enhanced, if not enabled, by modern biomedical technologies such as high-throughput DNA sequencing platforms, induced pluripotent stem cell assays, biomarker discovery protocols, imaging modalities, and wireless monitoring devices. Despite successes in the isolated use of these technologies, however, it is arguable that their combined and integrated use in focused studies of individual patients is the best way to not only tailor interventions for those patients, but also shed light on treatment strategies for patients with similar conditions. This is particularly true for individuals with rare diseases since, by definition, they will require study without recourse to other individuals, or at least without recourse to many other individuals. Such integration and focus will require new biomedical scientific paradigms and infrastructure, including the creation of databases harboring study results, the formation of dedicated multidisciplinary research teams and new training programs. We consider the motivation and potential for such integration, point out areas in need of improvement, and argue for greater emphasis on improving patient health via technological innovations, not merely improving the technologies themselves. We also argue that the paradigm described can, in theory, be extended to the study of individuals with more common diseases.
Collapse
Affiliation(s)
- Nicholas J. Schork
- The Translational Genomics Research Institute, 445 North Fifth Street, Phoenix, AZ 85004, , 858-794-4054
| | - Kristopher Nazor
- MYi Diagnostics and Discovery, 5310 Eastgate Mall, San Diego, CA 92121, , 858-458-9305
| |
Collapse
|
49
|
Abstract
Classification problems from different domains vary in complexity, size, and imbalance of the number of samples from different classes. Although several classification models have been proposed, selecting the right model and parameters for a given classification task to achieve good performance is not trivial. Therefore, there is a constant interest in developing novel robust and efficient models suitable for a great variety of data. Here, we propose OmniGA, a framework for the optimization of omnivariate decision trees based on a parallel genetic algorithm, coupled with deep learning structure and ensemble learning methods. The performance of the OmniGA framework is evaluated on 12 different datasets taken mainly from biomedical problems and compared with the results obtained by several robust and commonly used machine-learning models with optimized parameters. The results show that OmniGA systematically outperformed these models for all the considered datasets, reducing the F1 score error in the range from 100% to 2.25%, compared to the best performing model. This demonstrates that OmniGA produces robust models with improved performance. OmniGA code and datasets are available at www.cbrc.kaust.edu.sa/omniga/.
Collapse
Affiliation(s)
- Arturo Magana-Mora
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center, Thuwal, 23955-6900, Saudi Arabia
| | - Vladimir B Bajic
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center, Thuwal, 23955-6900, Saudi Arabia.
| |
Collapse
|