1
|
Ogunleye A, Piyawajanusorn C, Ghislat G, Ballester PJ. Large-Scale Machine Learning Analysis Reveals DNA Methylation and Gene Expression Response Signatures for Gemcitabine-Treated Pancreatic Cancer. HEALTH DATA SCIENCE 2024; 4:0108. [PMID: 38486621 PMCID: PMC10904073 DOI: 10.34133/hds.0108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Accepted: 12/08/2023] [Indexed: 03/17/2024]
Abstract
Background: Gemcitabine is a first-line chemotherapy for pancreatic adenocarcinoma (PAAD), but many PAAD patients do not respond to gemcitabine-containing treatments. Being able to predict such nonresponders would hence permit the undelayed administration of more promising treatments while sparing gemcitabine life-threatening side effects for those patients. Unfortunately, the few predictors of PAAD patient response to this drug are weak, none of them exploiting yet the power of machine learning (ML). Methods: Here, we applied ML to predict the response of PAAD patients to gemcitabine from the molecular profiles of their tumors. More concretely, we collected diverse molecular profiles of PAAD patient tumors along with the corresponding clinical data (gemcitabine responses and clinical features) from the Genomic Data Commons resource. From systematically combining 8 tumor profiles with 16 classification algorithms, each of the resulting 128 ML models was evaluated by multiple 10-fold cross-validations. Results: Only 7 of these 128 models were predictive, which underlines the importance of carrying out such a large-scale analysis to avoid missing the most predictive models. These were here random forest using 4 selected mRNAs [0.44 Matthews correlation coefficient (MCC), 0.785 receiver operating characteristic-area under the curve (ROC-AUC)] and XGBoost combining 12 DNA methylation probes (0.32 MCC, 0.697 ROC-AUC). By contrast, the hENT1 marker obtained much worse random-level performance (practically 0 MCC, 0.5 ROC-AUC). Despite not being trained to predict prognosis (overall and progression-free survival), these ML models were also able to anticipate this patient outcome. Conclusions: We release these promising ML models so that they can be evaluated prospectively on other gemcitabine-treated PAAD patients.
Collapse
Affiliation(s)
- Adeolu Ogunleye
- Department of Organismal Biology,
Uppsala University, Uppsala, Sweden
| | | | - Ghita Ghislat
- Department of Life Sciences,
Imperial College London, London, UK
| | | |
Collapse
|
2
|
Nguyen LC, Naulaerts S, Bruna A, Ghislat G, Ballester PJ. Predicting Cancer Drug Response In Vivo by Learning an Optimal Feature Selection of Tumour Molecular Profiles. Biomedicines 2021; 9:biomedicines9101319. [PMID: 34680436 PMCID: PMC8533095 DOI: 10.3390/biomedicines9101319] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 09/22/2021] [Accepted: 09/23/2021] [Indexed: 12/17/2022] Open
Abstract
(1) Background: Inter-tumour heterogeneity is one of cancer’s most fundamental features. Patient stratification based on drug response prediction is hence needed for effective anti-cancer therapy. However, single-gene markers of response are rare and/or may fail to achieve a significant impact in the clinic. Machine Learning (ML) is emerging as a particularly promising complementary approach to precision oncology. (2) Methods: Here we leverage comprehensive Patient-Derived Xenograft (PDX) pharmacogenomic data sets with dimensionality-reducing ML algorithms with this purpose. (3) Results: Combining multiple gene alterations via ML leads to better discrimination between sensitive and resistant PDXs in 19 of the 26 analysed cases. Highly predictive ML models employing concise gene lists were found for three cases: paclitaxel (breast cancer), binimetinib (breast cancer) and cetuximab (colorectal cancer). Interestingly, each of these multi-gene ML models identifies some treatment-responsive PDXs not harbouring the best actionable mutation for that case. Thus, ML multi-gene predictors generally have much fewer false negatives than the corresponding single-gene marker. (4) Conclusions: As PDXs often recapitulate clinical outcomes, these results suggest that many more patients could benefit from precision oncology if ML algorithms were also applied to existing clinical pharmacogenomics data, especially those algorithms generating classifiers combining data-selected gene alterations.
Collapse
Affiliation(s)
- Linh C. Nguyen
- Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France;
- Institut Paoli-Calmettes, F-13009 Marseille, France
- Aix-Marseille Université UM105, F-13009 Marseille, France
- CNRS UMR7258, F-13009 Marseille, France
- Department of Life Sciences, University of Science and Technology of Hanoi, Vietnam Academy of Science and Technology, Hanoi 100803, Vietnam
| | - Stefan Naulaerts
- Ludwig Institute for Cancer Research, 1200 Brussels, Belgium;
- Duve Institute, UCLouvain, 1200 Brussels, Belgium
| | | | - Ghita Ghislat
- Centre d’Immunologie de Marseille-Luminy, INSERM U1104, CNRS UMR7280, F-13009 Marseille, France;
| | - Pedro J. Ballester
- Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France;
- Institut Paoli-Calmettes, F-13009 Marseille, France
- Aix-Marseille Université UM105, F-13009 Marseille, France
- CNRS UMR7258, F-13009 Marseille, France
- Correspondence: ; Tel.: + 33-(0)4-8697-7201
| |
Collapse
|
3
|
A Methodological Framework to Discover Pharmacogenomic Interactions Based on Random Forests. Genes (Basel) 2021; 12:genes12060933. [PMID: 34207374 PMCID: PMC8235396 DOI: 10.3390/genes12060933] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 06/15/2021] [Accepted: 06/16/2021] [Indexed: 01/01/2023] Open
Abstract
The identification of genomic alterations in tumor tissues, including somatic mutations, deletions, and gene amplifications, produces large amounts of data, which can be correlated with a diversity of therapeutic responses. We aimed to provide a methodological framework to discover pharmacogenomic interactions based on Random Forests. We matched two databases from the Cancer Cell Line Encyclopaedia (CCLE) project, and the Genomics of Drug Sensitivity in Cancer (GDSC) project. For a total of 648 shared cell lines, we considered 48,270 gene alterations from CCLE as input features and the area under the dose-response curve (AUC) for 265 drugs from GDSC as the outcomes. A three-step reduction to 501 alterations was performed, selecting known driver genes and excluding very frequent/infrequent alterations and redundant ones. For each model, we used the concordance correlation coefficient (CCC) for assessing the predictive performance, and permutation importance for assessing the contribution of each alteration. In a reasonable computational time (56 min), we identified 12 compounds whose response was at least fairly sensitive (CCC > 20) to the alteration profiles. Some diversities were found in the sets of influential alterations, providing clues to discover significant drug-gene interactions. The proposed methodological framework can be helpful for mining pharmacogenomic interactions.
Collapse
|
4
|
Li Y, Umbach DM, Krahn JM, Shats I, Li X, Li L. Predicting tumor response to drugs based on gene-expression biomarkers of sensitivity learned from cancer cell lines. BMC Genomics 2021; 22:272. [PMID: 33858332 PMCID: PMC8048084 DOI: 10.1186/s12864-021-07581-7] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Accepted: 04/04/2021] [Indexed: 02/07/2023] Open
Abstract
Background Human cancer cell line profiling and drug sensitivity studies provide valuable information about the therapeutic potential of drugs and their possible mechanisms of action. The goal of those studies is to translate the findings from in vitro studies of cancer cell lines into in vivo therapeutic relevance and, eventually, patients’ care. Tremendous progress has been made. Results In this work, we built predictive models for 453 drugs using data on gene expression and drug sensitivity (IC50) from cancer cell lines. We identified many known drug-gene interactions and uncovered several potentially novel drug-gene associations. Importantly, we further applied these predictive models to ~ 17,000 bulk RNA-seq samples from The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) database to predict drug sensitivity for both normal and tumor tissues. We created a web site for users to visualize and download our predicted data (https://manticore.niehs.nih.gov/cancerRxTissue). Using trametinib as an example, we showed that our approach can faithfully recapitulate the known tumor specificity of the drug. Conclusions We demonstrated that our approach can predict drugs that 1) are tumor-type specific; 2) elicit higher sensitivity from tumor compared to corresponding normal tissue; 3) elicit differential sensitivity across breast cancer subtypes. If validated, our prediction could have relevance for preclinical drug testing and in phase I clinical design. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07581-7.
Collapse
Affiliation(s)
- Yuanyuan Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, 111 T.W. Alexander Dr., Research Triangle Park, MD A3-03, Durham, NC, 27709, USA
| | - David M Umbach
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, 111 T.W. Alexander Dr., Research Triangle Park, MD A3-03, Durham, NC, 27709, USA
| | - Juno M Krahn
- Genome Integrity & Structural Biology Laboratory, Research Triangle Park, Durham, NC, 27709, USA
| | - Igor Shats
- Signal Transduction Laboratory, National Institute of Environmental Health Sciences, Research Triangle Park, Durham, NC, 27709, USA
| | - Xiaoling Li
- Signal Transduction Laboratory, National Institute of Environmental Health Sciences, Research Triangle Park, Durham, NC, 27709, USA
| | - Leping Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, 111 T.W. Alexander Dr., Research Triangle Park, MD A3-03, Durham, NC, 27709, USA.
| |
Collapse
|
5
|
Itzhacky N, Sharan R. Prediction of cancer dependencies from expression data using deep learning. Mol Omics 2020; 17:66-71. [PMID: 33135031 DOI: 10.1039/d0mo00042f] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Detecting cancer dependencies is key to disease treatment. Recent efforts have mapped gene dependencies and drug sensitivities in hundreds of cancer cell lines. These data allow us to learn for the first time models of tumor vulnerabilities and apply them to suggest novel drug targets. Here we devise novel deep learning methods for predicting gene dependencies and drug sensitivities from gene expression measurements. By combining dimensionality reduction strategies, we are able to learn accurate models that outperform simpler neural networks or linear models.
Collapse
Affiliation(s)
- Nitay Itzhacky
- School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel.
| | | |
Collapse
|
6
|
Silencing KIF18B enhances radiosensitivity: identification of a promising therapeutic target in sarcoma. EBioMedicine 2020; 61:103056. [PMID: 33038765 PMCID: PMC7648128 DOI: 10.1016/j.ebiom.2020.103056] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Revised: 09/16/2020] [Accepted: 09/16/2020] [Indexed: 12/20/2022] Open
Abstract
Background Sarcomas are rare heterogeneous tumours, derived from primitive mesenchymal stem cells, with more than 100 distinct subtypes. Radioresistance remains a major clinical challenge for sarcomas, demanding urgent for effective biomarkers of radiosensitivity. Methods The radiosensitive gene Kinesin family member 18B (KIF18B) was mined through bioinformatics with integrating of 15 Gene Expression Omnibus (GEO) datasets and The Cancer Genome Atlas (TCGA) database. We used radiotherapy-sh-KIF18B combination to observe the anti-tumour effect in sarcoma cells and subcutaneous or orthotopic xenograft models. The KIF18B-sensitive drug T0901317 (T09) was further mined to act as radiosensitizer using the Genomics of Drug Sensitivity in Cancer (GDSC) database. Findings KIF18B mRNA was significantly up-regulated in most of the subtypes of bone and soft tissue sarcoma. Multivariate Cox regression analysis showed that KIF18B high expression was an independent risk factor for prognosis in sarcoma patients with radiotherapy. Silencing KIF18B or using T09 significantly improved the radiosensitivity of sarcoma cells, delayed tumour growth in subcutaneous and orthotopic xenograft model, and elongated mice survival time. Furthermore, we predicted that T09 might bind to the structural region of KIF18B to exert radiosensitization. Interpretation These results indicated that sarcomas with low expression of KIF18B may benefit from radiotherapy. Moreover, the radiosensitivity of sarcomas with overexpressed KIF18B could be effectively improved by silencing KIF18B or using T09, which may provide promising strategies for radiotherapy treatment of sarcoma. Fundings A full list of funding can be found in the Funding Sources section.
Collapse
|
7
|
Mun J, Choi G, Lim B. A guide for bioinformaticians: 'omics-based drug discovery for precision oncology. Drug Discov Today 2020; 25:S1359-6446(20)30335-4. [PMID: 32828947 DOI: 10.1016/j.drudis.2020.08.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Revised: 07/19/2020] [Accepted: 08/13/2020] [Indexed: 02/07/2023]
Abstract
Bioinformatics-centric drug development is inevitable in the era of precision medicine. Clinical 'omics information, including genomics, epigenomics, transcriptomics, and proteomics, provides the most comprehensive molecular landscape in which each patient's pathological history is delineated. Hence, the capability of bioinformaticians to manage integrative 'omics data is crucial to current drug development. Bioinformatics can accelerate drug development from initial time-consuming discoveries to the clinical stage by providing information-guided solutions. However, many bioinformaticians do not have opportunities to participate in drug discovery programs. As a starting point for bioinformaticians with no prior drug development experience, here we discuss bioinformatics applications during drug development with a focus on working-level omics-based methodologies.
Collapse
Affiliation(s)
- Jihyeob Mun
- Center for Supercomputing Applications, Division of National Supercomputing R&D, Korea Institute of Science and Technology Information (KISTI), Daejeon, Republic of Korea
| | - Gildon Choi
- Research Center for Drug Discovery Technology, Division of Drug Discovery Research, Korea Research Institute of Chemical Technology, Daejeon, Republic of Korea.
| | - Byungho Lim
- Research Center for Drug Discovery Technology, Division of Drug Discovery Research, Korea Research Institute of Chemical Technology, Daejeon, Republic of Korea.
| |
Collapse
|
8
|
Naulaerts S, Menden MP, Ballester PJ. Concise Polygenic Models for Cancer-Specific Identification of Drug-Sensitive Tumors from Their Multi-Omics Profiles. Biomolecules 2020; 10:E963. [PMID: 32604779 PMCID: PMC7356608 DOI: 10.3390/biom10060963] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Revised: 06/20/2020] [Accepted: 06/22/2020] [Indexed: 12/15/2022] Open
Abstract
In silico models to predict which tumors will respond to a given drug are necessary for Precision Oncology. However, predictive models are only available for a handful of cases (each case being a given drug acting on tumors of a specific cancer type). A way to generate predictive models for the remaining cases is with suitable machine learning algorithms that are yet to be applied to existing in vitro pharmacogenomics datasets. Here, we apply XGBoost integrated with a stringent feature selection approach, which is an algorithm that is advantageous for these high-dimensional problems. Thus, we identified and validated 118 predictive models for 62 drugs across five cancer types by exploiting four molecular profiles (sequence mutations, copy-number alterations, gene expression, and DNA methylation). Predictive models were found in each cancer type and with every molecular profile. On average, no omics profile or cancer type obtained models with higher predictive accuracy than the rest. However, within a given cancer type, some molecular profiles were overrepresented among predictive models. For instance, CNA profiles were predictive in breast invasive carcinoma (BRCA) cell lines, but not in small cell lung cancer (SCLC) cell lines where gene expression (GEX) and DNA methylation profiles were the most predictive. Lastly, we identified the best XGBoost model per cancer type and analyzed their selected features. For each model, some of the genes in the selected list had already been found to be individually linked to the response to that drug, providing additional evidence of the usefulness of these models and the merits of the feature selection scheme.
Collapse
Affiliation(s)
- Stefan Naulaerts
- Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France;
- Institut Paoli-Calmettes, F-13009 Marseille, France
- Aix-Marseille Université, F-13284 Marseille, France
- CNRS UMR7258, F-13009 Marseille, France
- Ludwig Institute for Cancer Research, de Duve Institute, Université catholique de Louvain, 1200 Brussels, Belgium
| | - Michael P. Menden
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, 85764 Neuherberg, Germany;
- Department of Biology, Ludwig-Maximilians University Munich, 82152 Planegg-Martinsried, Germany
- German Centre for Diabetes Research (DZD e.V.), 85764 Neuherberg, Germany
| | - Pedro J. Ballester
- Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France;
- Institut Paoli-Calmettes, F-13009 Marseille, France
- Aix-Marseille Université, F-13284 Marseille, France
- CNRS UMR7258, F-13009 Marseille, France
| |
Collapse
|
9
|
Cramer D, Mazur J, Espinosa O, Schlesner M, Hübschmann D, Eils R, Staub E. Genetic Interactions and Tissue Specificity Modulate the Association of Mutations with Drug Response. Mol Cancer Ther 2019; 19:927-936. [PMID: 31826931 DOI: 10.1158/1535-7163.mct-19-0045] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Revised: 06/21/2019] [Accepted: 12/04/2019] [Indexed: 11/16/2022]
Abstract
In oncology, biomarkers are widely used to predict subgroups of patients that respond to a given drug. Although clinical decisions often rely on single gene biomarkers, machine learning approaches tend to generate complex multi-gene biomarkers that are hard to interpret. Models predicting drug response based on multiple altered genes often assume that the effects of single alterations are independent. We asked whether the association of cancer driver mutations with drug response is modulated by other driver mutations or the tissue of origin. We developed an analytic framework based on linear regression to study interactions in pharmacogenomic data from two large cancer cell line panels. Starting from a model with only covariates, we included additional variables only if they significantly improved simpler models. This allows to systematically assess interactions in small, easily interpretable models. Our results show that including mutation-mutation interactions in drug response prediction models tends to improve model performance and robustness. For example, we found that TP53 mutations decrease sensitivity to BRAF inhibitors in BRAF-mutated cell lines and patient tumors, suggesting a therapeutic benefit of combining inhibition of oncogenic BRAF with reactivation of the tumor suppressor TP53. Moreover, we identified tissue-specific mutation-drug associations and synthetic lethal triplets where the simultaneous mutation of two genes sensitizes cells to a drug. In summary, our interaction-based approach contributes to a holistic view on the determining factors of drug response.
Collapse
Affiliation(s)
- Dina Cramer
- Division of Theoretical Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany. .,Faculty of Biosciences, Heidelberg University, Heidelberg, Germany.,Oncology Bioinformatics, Merck KGaA, Darmstadt, Germany
| | - Johanna Mazur
- Oncology Bioinformatics, Merck KGaA, Darmstadt, Germany
| | | | - Matthias Schlesner
- Division of Theoretical Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany.,Bioinformatics and Omics Data Analytics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Daniel Hübschmann
- Division of Theoretical Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany.,Pediatric Immunology, Hematology and Oncology, University Hospital Heidelberg, Heidelberg, Germany.,Division of Stem Cells and Cancer, German Cancer Research Center (DKFZ), Heidelberg, Germany.,Heidelberg Institute for Stem Cell Technology and Experimental Medicine (HI-STEM gGmbH), Heidelberg, Germany
| | - Roland Eils
- Division of Theoretical Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany.,Health Data Science Unit, Bioquant, Medical Faculty, Heidelberg University, Heidelberg, Germany.,Center for Digital Health, Berlin Institute of Health and Charité Universitätsmedizin Berlin, Berlin, Germany
| | - Eike Staub
- Oncology Bioinformatics, Merck KGaA, Darmstadt, Germany
| |
Collapse
|
10
|
Parca L, Pepe G, Pietrosanto M, Galvan G, Galli L, Palmeri A, Sciandrone M, Ferrè F, Ausiello G, Helmer-Citterich M. Modeling cancer drug response through drug-specific informative genes. Sci Rep 2019; 9:15222. [PMID: 31645597 PMCID: PMC6811538 DOI: 10.1038/s41598-019-50720-0] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Accepted: 09/06/2019] [Indexed: 12/18/2022] Open
Abstract
Recent advances in pharmacogenomics have generated a wealth of data of different types whose analysis have helped in the identification of signatures of different cellular sensitivity/resistance responses to hundreds of chemical compounds. Among the different data types, gene expression has proven to be the more successful for the inference of drug response in cancer cell lines. Although effective, the whole transcriptome can introduce noise in the predictive models, since specific mechanisms are required for different drugs and these realistically involve only part of the proteins encoded in the genome. We analyzed the pharmacogenomics data of 961 cell lines tested with 265 anti-cancer drugs and developed different machine learning approaches for dissecting the genome systematically and predict drug responses using both drug-unspecific and drug-specific genes. These methodologies reach better response predictions for the vast majority of the screened drugs using tens to few hundreds genes specific to each drug instead of the whole genome, thus allowing a better understanding and interpretation of drug-specific response mechanisms which are not necessarily restricted to the drug known targets.
Collapse
Affiliation(s)
- Luca Parca
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Gerardo Pepe
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Marco Pietrosanto
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Giulio Galvan
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Leonardo Galli
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Antonio Palmeri
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
- Celgene Institute for Translational Research Europe, Sevilla, Spain
| | - Marco Sciandrone
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Fabrizio Ferrè
- Department of Pharmacy and Biotechnology, University of Bologna Alma Mater, Bologna, Italy
| | - Gabriele Ausiello
- Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | | |
Collapse
|
11
|
Ali M, Aittokallio T. Machine learning and feature selection for drug response prediction in precision oncology applications. Biophys Rev 2018; 11:31-39. [PMID: 30097794 PMCID: PMC6381361 DOI: 10.1007/s12551-018-0446-z] [Citation(s) in RCA: 113] [Impact Index Per Article: 16.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Accepted: 07/22/2018] [Indexed: 02/07/2023] Open
Abstract
In-depth modeling of the complex interplay among multiple omics data measured from cancer cell lines or patient tumors is providing new opportunities toward identification of tailored therapies for individual cancer patients. Supervised machine learning algorithms are increasingly being applied to the omics profiles as they enable integrative analyses among the high-dimensional data sets, as well as personalized predictions of therapy responses using multi-omics panels of response-predictive biomarkers identified through feature selection and cross-validation. However, technical variability and frequent missingness in input "big data" require the application of dedicated data preprocessing pipelines that often lead to some loss of information and compressed view of the biological signal. We describe here the state-of-the-art machine learning methods for anti-cancer drug response modeling and prediction and give our perspective on further opportunities to make better use of high-dimensional multi-omics profiles along with knowledge about cancer pathways targeted by anti-cancer compounds when predicting their phenotypic responses.
Collapse
Affiliation(s)
- Mehreen Ali
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, FI-00290, Helsinki, Finland.,Helsinki Institute for Information Technology (HIIT), Aalto University, FI-02150, Espoo, Finland
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, FI-00290, Helsinki, Finland. .,Helsinki Institute for Information Technology (HIIT), Aalto University, FI-02150, Espoo, Finland. .,Department of Mathematics and Statistics, University of Turku, FI-20014, Turku, Finland.
| |
Collapse
|
12
|
Naulaerts S, Dang CC, Ballester PJ. Precision and recall oncology: combining multiple gene mutations for improved identification of drug-sensitive tumours. Oncotarget 2017; 8:97025-97040. [PMID: 29228590 PMCID: PMC5722542 DOI: 10.18632/oncotarget.20923] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Accepted: 08/14/2017] [Indexed: 02/07/2023] Open
Abstract
Cancer drug therapies are only effective in a small proportion of patients. To make things worse, our ability to identify these responsive patients before administering a treatment is generally very limited. The recent arrival of large-scale pharmacogenomic data sets, which measure the sensitivity of molecularly profiled cancer cell lines to a panel of drugs, has boosted research on the discovery of drug sensitivity markers. However, no systematic comparison of widely-used single-gene markers with multi-gene machine-learning markers exploiting genomic data has been so far conducted. We therefore assessed the performance offered by these two types of models in discriminating between sensitive and resistant cell lines to a given drug. This was carried out for each of 127 considered drugs using genomic data characterising the cell lines. We found that the proportion of cell lines predicted to be sensitive that are actually sensitive (precision) varies strongly with the drug and type of model used. Furthermore, the proportion of sensitive cell lines that are correctly predicted as sensitive (recall) of the best single-gene marker was lower than that of the multi-gene marker in 118 of the 127 tested drugs. We conclude that single-gene markers are only able to identify those drug-sensitive cell lines with the considered actionable mutation, unlike multi-gene markers that can in principle combine multiple gene mutations to identify additional sensitive cell lines. We also found that cell line sensitivities to some drugs (e.g. Temsirolimus, 17-AAG or Methotrexate) are better predicted by these machine-learning models.
Collapse
Affiliation(s)
- Stefan Naulaerts
- Computational Biology and Drug Design, Cancer Research Center of Marseille, INSERM U1068, Marseille, France.,Institut Paoli-Calmettes, Marseille, France.,Aix-Marseille Université, Marseille, France.,CNRS UMR7258, Marseille, France
| | - Cuong C Dang
- Faculty of Information Technology, VNU University of Engineering and Technology, Hanoi, Vietnam
| | - Pedro J Ballester
- Computational Biology and Drug Design, Cancer Research Center of Marseille, INSERM U1068, Marseille, France.,Institut Paoli-Calmettes, Marseille, France.,Aix-Marseille Université, Marseille, France.,CNRS UMR7258, Marseille, France
| |
Collapse
|